How to free a type erased field?

Hello.

I have a big non-generic struct that has one data field that is ?*anyopaque. The field itself is used by casting the pointer to a known type then using it.

However, when deinitializing the big struct I do not know what the type is.

My best current idea is to add another field next to data called free_data that is a function pointer, and i would assign this function from the same routine that creates the data itself. But I am hoping for something nicer.

If the caller of deinit does not know the type, then either a function pointer or an enum telling you the type is the only way, right?

1 Like

I am really loathe to add another field tho ..

The enum idea is interesting. Is there a way to compare the type the anyopaque is pointing to a list of known types somehow? all with the deinit function.

If you’re asking whether you can figure out the type from the value pointer, no, the type only exists at compile time. You have to add your own type info (like an enum) and map that to the type yourself.

1 Like

You could use pointer tagging to place the enum within unused bits of the anyopaque pointer, but that means that you need to mask away those pointer tagging bits before you can convert the anyopaque pointer back to a pointer to the concrete type wherever that happens.

Depending on alignment and architecture there are different amounts of bits that could be used for pointer tagging, it is a fairly common technique in interpreters.

3 Likes

The code that is allocating that field has to know the type and size to do so, so it should be providing the means to deallocate it.

4 Likes

It is a bit hackish and not for every scenario, but you can prefix the memory with some metadata to indicate its size and/or type. I once did such a thing for to support a custom allocator used by a C library, which simply expected the basic malloc, realloc, and free signatures, which do not play nice with Zig’s model of allocators, which expect a known size that must match the size allocated when freeing it.

I simply allocated an additional few bytes when it called malloc, wrote the size, and then returned the offset pointer after the size. When free or realloc was called, I merely decremented the pointer by the same few bytes, and read the size. This same concept could be extended to store whatever you need, such as an enum value indicating the type, etc.

5 Likes

It’s also pretty easy to make this pointer tagging approach type safe in zig.

If you use the lower bits, the number you can use generally just depends on the alignment of the data you point to and is quite portable. So up to 3 bits for 8 byte aligned pointers.

If you use the upper bits you usually have more space but it’s architecture (and even cpu) dependent how much space you have and sometimes it might not even be possible because some other metadata might be stored there(like PAC and MTE on ARM). Sometimes you also have to mask your data off while in more recent times the cpu does that for you.

On x86_64 you have either 64-48=16 bits or 64-57=7 bits for 4-level and 5-level paging respectively. On ARM this depends on the page granularity and also on the version because at some point they also added another level.

3 Likes

Given that you know the type at some points, ideally you could extend that knowledge to deinit.
If that is not possible then only way is to include it in the runtime data.

There are 2 ways to do that without an extra field (that I can think of)

  1. pointer tagging, that @Sze mentioned
  2. header on the allocation. I suspect you are interacting with a c library, a header is how libc can free allocations despite the language having no way to know the type
3 Likes

Here’s a video going over this trick: https://www.youtube.com/watch?v=gtk3RZHwJUA

1 Like

After I made the post yesterday I went and added the “additional field” to see how it looks like.

So in my struct I added this field:

data: ?*anyopaque = null,
/// to be called only during deinit()
free_data: ?*const fn (*const Self) void = null,

called in deinit() like so (there is an allocator field also, btw)

pub fn deinit(
	self: *const Self,
) void {
	if (self.free_data) |func| func(self);
	// other things
}

and it is assigned to during construction from the same vtable that creates the data

// different `self` here.
if (self.vtable.create_data) |func| {
	p.data = try func(&p);
	p.free_data = self.vtable.free_data;
}

and then creating the vtable functions accordingly. All in all it was

10 files changed, 88 insertions(+), 5 deletions(-)

It works fine, I guess. I am just not entirely happy about the idea of an additional field that amounts to a private method.

I am not comfortable enough with pointer tagging and I dont really want to do architecture specific stuff anyway, as this is a library.

The other idea is as yall indicated just change the type of data to something like

struct {
	kind: enum {
		// insert types here
	},
	data: *anyopaque,
}

or now that I am typing it probably better as a union(enum) .. more type safe .. which is also fine. The only slight problem with that is that the definition and usage of data isnt confined to one file per type any more. Maybe I will try that next and see how it looks.

Thanks for the ideas everyone.

This solution reminds me of how I do ‘deferred type-erased’ destruction for Vulkan objects in my sokol-gfx Vulkan backend (this is in C though).

I have a ‘delete queue’ where the queue items are pairs of a type-erased Vulkan object handle, and a matching destroy function which knows how to destroy this object, e.g.:

typedef void (*_sg_vk_delete_queue_destructor_t)(void* obj);

typedef struct {
    _sg_vk_delete_queue_destructor_t destructor;
    void* obj;
} _sg_vk_delete_queue_item_t;

typedef struct {
    uint32_t index;
    uint32_t num;
    _sg_vk_delete_queue_item_t* items;
} _sg_vk_delete_queue_t;

Adding an item to the delete queue takes a ‘destructor function’ and a void pointer:

void _sg_vk_delete_queue_add(_sg_vk_delete_queue_destructor_t destructor, void* obj)

At the place where this function is called I know the type of the added item, and can pass the correct destructor function, e.g. for a Vulkan buffer object:

_sg_vk_delete_queue_add(_sg_vk_buffer_destructor, (void*)buf->vk.buf);

…and the _sg_vk_buffer_destructor function calls the correct Vulkan function to destroy a buffer:

_SOKOL_PRIVATE void _sg_vk_buffer_destructor(void* obj) {
    SOKOL_ASSERT(_sg.vk.dev && obj);
    vkDestroyBuffer(_sg.vk.dev, (VkBuffer)obj, 0);
}

TL;DR: you could wrap the type-erased anyopaque pointer and matching destroy function in a fat-pointer-like struct which offers a destroy() function.

1 Like

That’s more or less what he was putting forward in his original post, though - accepting a function pointer for freeing the type-erased pointer.

1 Like

Now that I read it again, indeed. IMHO that is the ‘nicest’ solution, the alternative of a tagged union (or other form of ‘type-tagging’ which calls a set of hardwired destructor functions through a switch seems a bit too rigid to me tbh (might be ok though if there’s only a handful different types).

they’re only five types. I think the tagged union ends up being nicer (and more readable) than *anyopaque being thrown around, too.

pub const Data = union(enum) {
	none,
	a: *const a.AP,
	h: *const h.HP,
	i: *const i.IP,
	k: *const k.KP,
	u: *const u.UP,

	pub fn get(self: Data, T: type) *const T {
		if (T == a.AP) return self.a;
		if (T == h.HP) return self.h;
		if (T == i.IP) return self.i;
		if (T == k.KP) return self.k;
		if (T == u.UP) return self.u;
		comptime unreachable;
	}
};

fairly trivial to destroy as well with inline else

5 Likes

Why not? You must have some way of knowing how to cast the pointer and what operations can be performed on it. deinitialization is just another operation.

Also, if deinitialization always comes down to freeing memory, you should consider an arena allocator and not doing individual deinits at all. If it’s not just freeing memory but also releasing other sorts of resources, you might want to maintain a stack of release operations that need to be performed (the bottom of the stack would likely be freeing the arena).

If such approaches don’t work then you should provide a lot more detail about what you’re doing–as it is, it’s about as opaque as that pointer. :smiley:

Here is the function defintion if you’re interested: https://git.sr.ht/~asibahi/juzzmuzz/tree/main/item/src/ot/Shaper.zig#L37

The function pointer used to return a *anyopaque when I started this topic before I changed it to its current form.

Here is an example of where it is used: https://git.sr.ht/~asibahi/juzzmuzz/tree/main/item/src/ot/shaper/arabic.zig#L119

I should update the doc comments but meh

I’m really not. I simply offered some help based on logic, but I’m not about to crawl through your code. You can address my specific query, or not.

Your destructor could do bunch of if statements, comparing the data blob’s function pointer with the set of possible functions, to figure out what to cast the *anyopaque back to.

I’m not saying that’s a good idea. It’s probably a bad idea, there’s not a lot of reason for this code to be type-erased so getting rid of that was probably the right call.

I’m not even completely sure that function pointers are guaranteed to be comparable by address the way they would need to be. In C, yeah, in Zig? Unclear.

But it would probably work :thinking:

What’s so baffling about this problem is: If you’re using an *anyopaque to do anything (not just pass the pointer around) then you must have a table of functions, or a comptime generic set of types with a known set of functions, that are used to do things with the *anyopaque. If so, why can’t you add a deinit to this set of functions, and make the set of functions available to the owner of the *anyopaque? If you need to refactor do that, then refactoring is probably the right solution.