TreeList: a POC database of trees, programmed without pointers

Dear Ziggers,

I’ve been hard at work this month, wrapping my head around Andrew Kelley’s programming without pointers, and around @mnemnion’s Zelda Linked Lists.

This got me wondering: if I can have heterogeneous nodes in a linked list, can’t I just shove them all in a list of arrays - one array per backing data type?
Ideally, I’d want the list to be generated at comptime:

const node_types = struct{
  type_1 = type,
  type_2 = type,
  // etc.
};
const Storage = ArrayOfArrays(node_types);
// under the hood:
const Storage = [_]Arrays{
  array_type_1 = [_]type_1,
  arary_type_2 = [_]type_2,
}

Well, there’s the multi array list and its tagged-unions for that.

Yes and no - in some cases, the memory occupation of a tagged-union doesn’t justify the use of a multi-array list. This particularity has badly plagued some of my projects.

So I set out to building this exploratory library: TreeList.

I wrote some blog posts to detail the adventure:
part1: the basic design
part2: how the array pool woks
part3: traversing trees in zig
part4: performance shoot-out with the multi array list

A little example usage from the tests:

// Create TreeList instance
const Tree = TreeList(NodeTypes);
var tree: Tree = .empty;
tree.init();
defer tree.deinit(std.testing.allocator);

// Create and add a root node
const int_loc = try tree.append(Tree.Reg.idFromType(IntNode), .{ .value = 42 }, std.testing.allocator);
try tree.addRoot("root", int_loc, std.testing.allocator);

// Create and add a child node
const float_loc = try tree.append(Tree.Reg.idFromType(FloatNode), .{ .value = 3.14 }, std.testing.allocator);
tree.addChild(int_loc, float_loc);

// Retrieve the root node
const root_loc = tree.getRoot("root") orelse return std.testing.expect(false);
const root_node = tree.getNodeAs(IntNode, root_loc).?;
try std.testing.expectEqual(@as(i32, 42), root_node.value);

// Get the child node by traversing from root
const child_u64 = root_node.child orelse return std.testing.expect(false);
const child_loc = Tree.Loc.fromU64(child_u64);
try std.testing.expectEqual(child_u64, float_loc.toU64());

Hope you like it!

6 Likes

Very cool! Quick question: it looks like idFromType will be called frequently with this API, and it seems to expand to comparing to every type (so you compare to half of them on average). Could mlugg’s typeId idea work here, as it’s really fast? https://github.com/ziglang/zig/issues/19858#issuecomment-2369861301

Hi ! That comptime trick looks pretty impressive, though I’m not sure how it could be integrated.

In the current setup, I have:

  • a list of types in which I need to index
  • an enum/ID to index into the types
  • a tagged union of pointers that depends on the enum/ID

If I use mlugg’s typeIDs to replace the enum/ID values, I lose the constant time mapping of typeFromId, because the values don’t represent indexes into the Types list.

If there’s a solution I’m not seeing, I’d be a glad taker - that inline loop has been nagging me too.

Meanwhile, there’s a benefit to the current solution: insertions are slow because of the inline loop, but reads are faster. That’s not a bad tradeoff for my use case, since I plan on doing a lot more reads than appends into the table.

This looks similar to how the zig compiler stores its AST, ZIR and AIR.

If you haven’t already, check out Andrew Kelley Practical Data Oriented Design (DoD)
and its sequel Data-Oriented Design Revisited: Type Safety in the Zig Compiler - Matthew Lugg

1 Like

I brought the two-way part of this question up on the Zig Discord and got a reply from mlugg. Here’s the code and comments he shared: https://zigbin.io/6c9f41

Note that mlugg has not looked at your code at all, so it may or may not be suitable. I assume you still need to map to array indices (maybe a minimal perfect hash?) - I just thought the general two-way lookup idea was interesting enough to share.

Thanks for looking into it, that’s a great code snippet!

I’m trying a first pass at implementing it, but I’m getting locked into using comptime for the look-ups.

About the hashing: isn’t this what the std.ArrayHashMapUnmanaged(void, void) trick in the compiler’s intern pool is for? I just rewatched Matthew Lugg’s presentation - he mentions it in passing, though I dunno whether anyone has written in detail about it.

There’s this proposal that explains the guts of it: https://github.com/ziglang/zig/issues/23872

Btw, which Zig version does your repo target? I wasn’t able to build with current master, nor v14.x or v13

One gremlin I noticed is if (self.data != undefined ... which is undefined behavior (and 14.x catches this particular one at compile time)

Hey, we were just talking about this!

1 Like

Yeah, been following that one. Very interesting thread.

1 Like

I pushed a fix, thanks for catching that. I built with 0.14.

1 Like

I’m blocked: I’m trying to use the AutoArrayHashMap(void, void) approach to map types to indexes (commit here).

I pulled a fair amount of new tricks out of mlugg’s bag - including his simple comptime allocator.

const Types = {…} // [_]type
const TypeId = {…} // enum that maps indexes to types

const TypeContext = struct {
    TypesList: [Types.len]type,

    pub fn hash(self: @This(), key: type) u32 {
        _ = self;
        return @truncate(std.hash_map.hashString(@typeName(key)));
    }

    pub fn eql(ctx: @This(), key: type, b_void: void, b_map_index: usize) bool {
        _ = b_void;
        return key == ctx.TypesList[b_map_index];
    }
};

const typeMap = blk: {
    var map: std.AutoArrayHashMapUnmanaged(void, void) = .{};

    map.ensureTotalCapacity(comptime_allocator, Types.len) catch unreachable;

    for (Types) |T| {
        const ctx = TypeContext{ .TypesList = Types };
        _ = map.getOrPutAssumeCapacityAdapted(T, ctx);
    }

    break :blk map;
};

And I’m getting royally rejected by the compiler:

/opt/homebrew/Cellar/zig/0.14.0_2/lib/zig/std/array_hash_map.zig:852:47: error: use of undefined value here causes undefined behavior
                    .key_ptr = &keys_array.ptr[index],
                                ~~~~~~~~~~~~~~^~~~~~~
src/TypeRegistry.zig:61:54: note: called from here
                _ = map.getOrPutAssumeCapacityAdapted(T, ctx);
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~

How am I supposed to do this?

side note: @cryptocode , you initially recommended using a minimal perfect hash instead of the void hashmap trick. I’m not familiar with perfect hashes, so I’m delaying resorting to them until I’m hopelessly pinned.

Incidentally, someone shared this relevant post from Andrew Kelley in another Ziggit thread: https://zig.news/andrewrk/how-to-use-hash-map-contexts-to-save-memory-when-doing-a-string-table-3l33

The reason I mentioned perfect minimal hashes is that they can be made to map keys to consecutive integers in the range 0…n-1, which seems useful in your case as the hashes can then be used directly as array indices. But they’re a bit annoying to deal with and may very well not be worth the complexity. Just a thought if nothing else works out.