HAMT and StencilArrays

Sze · November 22, 2024, 4:51am

A while back I was working on an Implementation for HAMTs that uses StencilVectors, eventually I wanted to take a break from this and only continue working on it when I am actually using this data structure in other code and discover issues with it.

However I kept reading things in the forum that made me think, that using parts of this might be useful to people, so I decided to put the code into a package so that people can have a look at it / use it, or point out ways to improve it.

HAMTs are bitwise tries that have a compact memory layout, they are useful data structures that can be used as a key-value mapping, they also can be used as a persistent data structure, which means that old versions keep working and different versions can share nodes, this can be useful to implement things like undo, or wherever it is helpful to keep old versions of a value, without having to create full copies manually.

This implementation contains dynamic variants for runtime usage with both pointer and index support, imperative and persistent (old versions stay valid) behavior based on user configuration.

It also has simplified static implementations that can be modified at comptime without needing an allocator and then frozen and queried at runtime.

If you have ideas for improvements, feel free to open an issue.

The docs are here: hamt

I created this topic as a brainstorming topic, because I am not completely satisfied with the library design, I think it contains some interesting ideas and I want to use this topic to discuss those. Maybe we can come up with parts that are good and other parts that could be re-designed differently.

Pointers or Indicies or both?

The HAMTUnmanaged and StencilArray are written in such a way that they can be used with the normal std.mem.Allocator interface and a custom allocator.

I like the idea of having data structures that support using either pointers or indices, but the way I am currently doing it may look more complex, than if the data structure only supported one or the other. So one option might be to instead always have a Pointer and an Index implementation.

The nice thing from the users side is that they can use an API that is agnostic about, whether the data structure uses Pointers or Indices, in some cases it might be nice to be able to switch around between the two without having to change the code that uses the data structure (Only having to change the config that creates the type).

Custom Allocator (Index/Pool Allocator)

I also think the custom allocator itself is interesting, it acts sort of like a memory pool and it returns an index from its alloc function instead of a slice, it then has an additional lookup function that converts the index to a slice.

I came up with this way of organizing it, because I found it to be annoying having to deal with pointer addresses and calculating indices from them.

(I thought that for allocators that can return indices, the index represents the “ground-truth” thing that identifies the thing that was allocated and having to deal with pointers as the more primary thing is annoying in that use-case)

When using indices, pointers are only needed when you are actually accessing the memory, but often you just need the index to store it somewhere else so it can be used later. Additionally going from index to pointer can be easier / more directly supported by slicing syntax, then going from some pointer to calculating an index for it based on some base-address.

variable sized slices of specific sizes

say that 3 times fast

The pool allocator actually uses the index as a handle like this:

const SlabIndex = packed struct {
    slab: u16,
    choice: u6,
    index: u10,
};

This allows it to support 2^6 = 64 specific sizes, it then creates a Slab (chunk) of memory of 1024 slices of that specific size and address them.

I actually noticed right now that the code could be slightly restructured to make the slab field specific to the choice to give each choice a separate u16 possible slabs, which means each size could have a maximum of 2^26 = 67.108.864 instances.

This whole idea of having multiple sizes packed into one handle seems useful for data structures like the HAMT that have nodes with different sizes. SlabIndex is just what I ended up using, there may be a lot of different ways to use the bits to differentiate between different sizes and chunks of allocation.

It is also possible that you could use different index sizes, but u32 seemed most practical for now.

StencilArray

Last but certainly not least: StencilArrays (originally named StencilVectors - Runtime and Compiler Support for HAMTs), are in principle the simple idea of combining a bitmask, with an array that has a size that corresponds to the number of bits that are set to 1 in the mask.

This idea appears in multiple places, but I think it is good to have a name for it, I like StencilArray more then Vector, but I am not quite sure what the best name is.

This for example allows you to create tree- or trie-nodes that have many branches and then only use a single 0 bit for branches that are null. This is used in the HAMT to create a trie that is able to store sparse data without wasting a lot of memory on null pointers.

The HAMT is also based on this paper Ideal Hash Trees (Bagwell, Phil 2001) but only implementing parts of it.

StencilVectors are how Racket implements its functional data-structure / immutable / persistent / hashed key-value mappings and that is where I learned about them and started to like using persistent data structures for some programming problems.

My StencilArray implementation is used like a handle (a struct that only contains a pointer/index) this handle then has methods to de-reference to get to the actual value and read / manipulate it, or create a functional update of it (create an altered copy, leaving the original unchanged).

If specific discussion topics are bigger than others we always can branch off and create dedicated topics.

I especially would be interested to hear if anybody has an interesting use case for these data structures. I think it might be possible to use the HAMT like a database-index, where you can have multi-version concurrency for readers and a single writer, that essentially only updates the index of the root node.

mnemnion · November 22, 2024, 9:30pm

Very cool. Now all we need is a Clojure interpreter to go with it.

This is meant as a prompt: do you see pointers and indices as each having unique strengths here? If so, what are they? Do the respective advantages justify the cost of two implementations?

Not yet, but there’s something later on my roadmap that this might be just right for. HAMTs are a great data structure, persistent maps which perform comparably well to mutable HashMaps are underrated.

I haven’t really taken a look at the code yet, but I think this is a really good data structure to have available and I’m glad you’re building it.

Sze · November 22, 2024, 10:43pm

I think there are 3 advantages for using indices:

less wasted bits that aren’t actually used / relevant for addressing the specific piece of memory (For example often the alignment is bigger than 1 which means some of the least significant bits are always zero thus aren’t strictly needed to denote the memory slot. Additionally often the most significant bits aren’t actually usable as addresses thus, they may not help the data structure grow to a bigger size.)
tighter memory layouts, leading to better cache utilization
position-independent/relocatable data structures (when the data structure uses indices internally it can be moved around to any place in memory, without having to walk the data structure and manually fix pointers, there are data structures that use relative pointers, but those are kind of an interesting trick applied to pointers and require special code handling)

I think position independence could be useful to be able to use these data-structures together with memory mapping and not requiring the user to request a specific address for the mapping (which supposedly only works if you are lucky and the OS hasn’t anything else mapped there already (but I haven’t actually tried)).

I currently don’t know, I think me and potential users need to collect some practical usage experience to decide whether it makes sense to split it.
Currently I can’t see a clear dividing line between the features, that would justify creating the separation, except maybe making the code a bit easier to understand and the pointer usage more direct. The differences between pointers and indices aren’t that big.

Just want to add to that, that they also can be used in an imperative manner, that is currently the default behavior, because Zig is more imperative, this just means that old nodes get deallocated.

If you want persistent behavior you need to opt-in to that and then use a pool allocator to clean up the old nodes, or walk the data structure and remove nodes you no longer need, basically becoming/writing the garbage collector.

Persistent usage definitely could get some support code for garbage collection like behavior that removes nodes that are no longer reachable from a set of roots.

But for imperative use you basically can use the data structure like a mutable hash map, that is just structured differently internally.

I am curious whether there may be corner cases where using HAMTs performs better than using some HashMap that needs to do resizing or scanning/probing, but I currently don’t want to invest the time to create benchmarks. Maybe later when I am using the data structure in some specific situation.

ericlang · November 22, 2024, 11:50pm

Ok… that is very interesting, but I do not know if I have the braincapacity to understand it all (beginner in Zig and not familiar with the structures (yet)). Later on I will have a look here!

mnemnion · November 23, 2024, 7:12pm

Sounds like you favor indices then. Probably makes sense to try and get by without a pointer variant unless or until the need arises.

This is one of the first things I wanted to do when I started using Zig, actually. Write a RootedAllocator which exposes an Allocator interface like arenas do, where free is a no-op. You would use the base struct to call a function allocRoot, and that would go on a root list, and call sweep on it whenever you wanted to, any allocations which aren’t reachable from the root struct(s) get freed.

Might make sense to split allocation from registration, so just have a register call to allow the code which chooses roots to be independent of the part which allocates memory.

I got lost in the weeds pretty fast, the task is not as easy as a one-paragraph description makes it sound. But writing something like this which specializes on the HAMT data structure is a more manageable task, and would expose more of what makes them a good data structure to work with, namely the ability to checkpoint states of the map and return to them when needed.

Doing it with a single data structure makes it practical to preserve the most important invariant, that no pointers inside the rooted allocations lead outside of its control.

Sze · November 23, 2024, 7:26pm

Yes I think having different data structures that have “mini-garbage-collectors” that are specialized to the data structure and then maybe some grouping thing that allows to call the gc for all the specialized mini-gcs could be quite interesting.

Another good part about such usage-specific gcs is that you can just define that cyclic data is not allowed and thus you don’t have to deal with cycles. But writing a general gc that can’t deal with cycles seems silly.

So I like the idea quite a bit, because it seems you would get a garbage collection light version and might be able to get away with lower complexity and also only use it where it really has a good use case. I might explore it in the context of scripting languages eventually.

mnemnion · November 23, 2024, 10:57pm

Cycles pose more of a problem for reference counting than for more classic GC. A rooted mark-and-sweep type collector will collect cycles which aren’t reachable from the root, and uses an intrusive mark, so that when traversing the reachable collection, cycling back on data breaks traversal since the second time the datum is marked. Basically, it knows the pages it allocated, and it knows how to find reachable data, and everything else gets put on the free list. Ignoring some details there.

A mark bit is itself somewhat of a challenge for an all-purpose Zig allocator meant to work this way. The easy way to do it puts the bit somewhere in each data structure, and that would call for adaptation, it wouldn’t really be a drop-in Allocator at that point. There are other approaches, like allocating additional metadata to store the mark before or after the data itself. It’s not an impossible problem to solve, but there are a lot of details to get right.

For nodes on an HAMT, finding somewhere to put a white/black bit would be fairly straightforward. So perhaps this is a promising approach.

Sze · November 23, 2024, 11:14pm

Yes, the current HAMT implementation automatically creates a MetaInt type with bits that aren’t used by the mask when there are bits that can’t be efficiently used to increase the branching factor of the trie and those can be used by the user in any way they want to use them.

The code could always leave a bit of space to add some meta data, there is also support for adding extra data to nodes in the StencilArray, I use that to store the count for the whole trie (for inner nodes) at the end of the node. But in general there are a lot of different ways you can shuffle around bits and organize things differently depending on specific requirements.

Sze · December 15, 2024, 11:43am

A post was split to a new topic: Gaddag optimization