Fast memory and GeneralPurposeAllocator

Sze · November 16, 2024, 9:17pm

I find it confusing that your field is called children but you are saying it is a Directed Acyclic Graph and then you call it a word-tree, I would expect children for a tree and something like next or out or edges or something that actually describes what kind of directed edge is formed to that other node.

So is it a tree, a trie or a DAG?
I think the most specific one that applies matters, my suspicion is that it is a Trie - Wikipedia containing all the words loaded from the file.

I mostly point that out because having an accurate mental model of the needed data-structure makes it much easier to suggest compact memory layouts.

Acylic Directed Graphs are more general than tries and thus are also more difficult to handle because they have more flexibility, for example they could have multiple start nodes (or internal nodes with multiple incoming edges), where trees and tries have a root node and are more constrained.

Naively implemented Tries are nice from an educational perspective, but have very poor memory utilization.

But creating a less naive implementation requires knowledge about the operations that are required to be efficient for where it is used later.

For example it is a big difference whether you just want to know whether a word is present in the data structure or whether you also want to be able to discover similar words that share a common prefix.

So what are the actual operations you need to support?

If you just wanted presence you could use a hash table or a HAMT.
With a hash table you only have reduced potential for key compression (reusing prefix data).
(I am working (on and off) on a HAMT implementation, but there are still a bunch of things I want to clean up and figure out)

A HAMT has the benefit that it still has the trie characteristic of deeper nodes reusing data from nodes that are closer to the root (if there is a lot of similar prefix data that can save memory).

The naive trie implementations save 1 character in a node.
The HAMT instead stores n branches (where n is a power of 2 like usually 8, 16, 32, 64) in a node together with a bitmap that is used to only store the slots which aren’t empty, this avoids saving a lot of bytes for null pointers.

So far I haven’t needed a search trie, but if I did I would probably go with something like this: Bitwise trie with bitmap - Wikipedia and the related paper Fast And Space Efficient Trie Searches - Phil Bagwell
(I just found that paper through the article, I eventually need to read it in detail)

However the AMT and the HAMT are quite related to another, where the latter builds on the former, thus it seems reasonable for a library that supports HAMTs to also support AMTs (so I am considering to support AMTs in my WIP library).

details of my Implementation

My current implementation already supports disabling hashing, in that case it just consumes bits from the key directly, but it doesn’t have functions to iterate over ranges from a start to an end key yet. (It should be possible to build these functions they just don’t make sense for a hashed trie, so I hadn’t thought of that use case yet)

It also would be possible to avoid storing the parts of the key which are encoded in the path and only store the tail, but so far I haven’t bothered implementing that because I didn’t have a good use case for it (and it adds implementation complexity), but in search tries it might be relevant.

My implementation has support for pointer based nodes and index based nodes and a custom allocator for variable sized slices that have a predictable range of sizes (essentially an allocator optimized for allocating hamt-nodes).

The advantage of index based nodes is that you save memory by essentially cutting the u64 pointers in half and storing u32 indices instead.
(I actually use something more complex than an index to support indexing into a bunch of different blocks of memory that contain slices of one particular size (to support incremental allocation and efficient reuse, without wasting space for redundant slice meta data), so what I use is more like a handle but it fits in a u32)

Its support for both index and pointer use is interesting, but also kind of ugly and I am split on whether I want to separate it into two implementations or not.

I wanted to figure out the things about its implementation that I am not satisfied with before publishing it, but I am beginning to think it may make more sense to publish it while I am still not satisfied with it, so that the community can help me find solutions to some of the ugly things about it.

Depending on what you actually want to do with your datastructure, you might also be interested in this post and the topic in general: