Better understanding of Zig idioms

Hello Zig fellows:

So I’m working on small module that I’m hoping can largely be a well-documented and a first class experience for Zig end users but I have some questions on some Zig related concepts:

  1. Managed vs Unmanaged - Zig is manual memory management based, I feel like these words are a bit of a misnomer. But it’s my understanding that the only difference is whether or not the data structure holds a reference to a std.mem.Allocator internally vs having a memory related methods take an allocator? Is this a correct understanding or is there more to this?

  2. For a first class data structure, would Zig end users usually expect a Managed and Unmanaged variant? What do people typically reach for in everyday coding?

  3. What about “intrusive” types? I see Zig community members also periodically using those types. I’m wondering if I should consider building an intrusive version of my data structure.

My goal is to model my Set collection as if it were part of the standard lib while still largely being a general purpose solution.

Hey @deckarep, first - thanks for your work on creating a general set object. I think it’s a great example of a community contribution to the ecosystem.

Let’s go through your points one at a time.


  • Managed vs Unmanaged

Yes you are correct that there is a misnomer here, but that’s not due to the objects you’re referring to. Fundamentally, it’s because it’s not about objects so I disagree with the framing of “do I have an allocator reference or not”.

Manual memory management means that you think about memory differently. Let’s think about a stack allocator as an example.

As far as the stack is concerned, it doesn’t have a list of foo type objects. It has a block of bytes and those bytes are occupied or not. Now, I as the programmer can imagine that as a group of foo types, but I can also choose to “deallocate” them by just resetting the index of the stack to zero. That’s really what manual memory management is about - I choose to see bytes the way I want to see them.

So how does this play into your original question: what’s the deal with managed vs unmanaged data structures? Let’s take another example of ye ol’ ArenaAllocator to answer this question.

I may have one arena allocator that manages the memory over some number of arbitrary data structures (let’s call it N for short). In this case, I do not want each individual object acting like it’s an island of its own concerns - it’s not. The arena determines the lifetime of the group of N objects. Unmanaged data structures make this very apparent. In that case, why do I want N references to an allocator lying around when they are already handled centrally by one allocator that inherently sees them as a group?

  • First class data structures

Both, honestly. Not sure what else to say here but you can see that pattern played out in the standard library.

  • Intrusive types

I don’t think there is anything particularly special about intrusion in regards to Zig. If you look at the ArenaAllocator again, you’ll see intrusion used there. This makes perfect sense because we don’t want to have a slice of bytes stored with each node because we’d have to make a separate jump to get the actual bytes after getting the node. That said, the size of that allocation is ambiguous so we just store bytes along with an indicator of how many bytes follow the intrusive node.

Ultimately, I’m saying the following - imagine a linked list with a data member of usize. What’s the value of making that “intrusive”? Not much, really.

4 Likes

@AndrewCodeDev - awesome answer, and this is exactly the kind of context I’m hoping to gain a deeper understanding of. I may have some follow-up questions but let me noodle on this.

Super helpful and thanks for the kind words. The Zig community has been very awesome and welcoming.

2 Likes

Even afte @AndrewCodeDev 's answer, I still don’t understand what intrusive types are. Could you guys explain this to us non-type-system-wizard types? (see what I did there? :joy_cat: )

4 Likes

I’ve barely come to learn about them. Honestly before this week I had never heard of them.

My understanding is that for some datastructures instead of creating some outer separate collection type, ie the container type…instead each element has a reference to the container type built right into it. Hence the “intrusive” name.

So think about it like this:

Either you have: a container object that contains each element.

or

You have: Each element has a pointer to the “container”…where now you are able to just lookup where your next item is.

Somewhere I also read: They can be great for cache locality, all the elements can be placed right next to each other in memory. It’s not the most intuitive concept to grasp IMO.

Here’s an example from MitchellH: libxev/src/queue.zig at main · mitchellh/libxev · GitHub

2 Likes

Sure - you embed the “node” into the data structure you want to link-up and you usually cast back up to the thing you’re linking to. A good rough sketch is @fieldParentPtr - you get the “parent” back from the “child” type.

So in the “external linked list” (the typical one we think about) you get something like this:

+----NODE-----+             +----NODE-----+ 
|    data     | ----------> |    data     |---------> ...
+-------------+             +-------------+

But with an intrusive list, it’s the other way around. You embed the “links” inside of the data object and you point to another of the same kind of node in different object.

This is powerful because you can add an arbitrary number of links in these things. They aren’t just “contained” by one list - you can basically make a web if you wanted to.

Good article: In-Depth: Intrusive Lists

6 Likes

Very interesting. Sounds like the things game developers might use for efficiency. Thanks @AndrewCodeDev and @deckarep !

1 Like

From my cursory searching around apparently this pattern is used in the Linux kernel as well.

Yes, you’re welcome @dude_the_builder and also thanks @AndrewCodeDev for the additional context.

1 Like

My understanding is, this pattern is used in case memory is constrained, your data structures are already contiguous and all you need it a way to “arrange” them (element might be removed, the order matters and might change, etc). In that case, no need to allocate specific structures to keep that order, you might as well embed that in the original data struct. But you still want to keep some separation of concern so the specialized code that manage the list is still in a separate module.

1 Like