Fast "set" implementation for pointers


I have a graph traversal where I’m walking nodes in an expression tree. I implementeda “nodes visited” set using std.AutoHashMap(*const Value, bool):

The hashmap is used like below:

const res = self.visited.getOrPut(root) catch unreachable;
if (!res.found_existing) {
     // visit node

I avoid allocations using clearRetainingCapacity() across multiple calls to the traversal routine.

This traversal with the hashmap is dominating the cost of my graph algorithm, and I suspect it’s probably due to using the default hash function. I wonder if anyone has any experience with faster ways of setting up “pointers hashing” than the defaults?

I know that using something like bitset for the "visited’ set would be faster, but I currently don’t have integer IDs for the nodes, only pointers.

@IntFromPtr then you have in

That autohash code looks sus. How many nodes are you talking about?

The fastest, easiest least useful way is to add a seen flag on each node and color that when doing the traversal then reset at the end. Obviously you can only do one traversal at a time.

That implies an unbounded size for the bitset.

Probably unrelated, but the way to create sets in Zig is to use void for the value type.

std.AutoHashMap(*const Value, void);

This makes the value use up 0 bits.

Would be good to know if it’s the growIfNeeded part of getOrPut or the getOrPutAssumeCapacityAdapted part that’s taking up the most runtime. If it’s the growIfNeeded part, then you might want to use a faster allocator (like c_allocator)

Wyhash looks sus to you?

Thanks for the replies @mcadamy and @squeek502!

Coloring: not a bad idea, actually. That’s definitely a more direct way to keep track of visited nodes.

Currently I have some 100k nodes in my graph and traversal was taking roughly 3 ms. I only need to traverse to get a topological sort of the nodes and I noticed that this was costing more than I expected.

Re using void. Oh, I actually did notice this in a blog post after I had posted this. I tried to read through the HashMap code to see what difference it makes. Is it just storage, or could this have a perf advantage also?

BTW I would not expect allocator speed to matter here as I am using clearRetainingCapacity across multiple invocations and I measured performance after the initial allocs had been done. I have noticed though that basically any calls to the GeneralPurposeAllocator kill perf totally, so I usually avoid it. :slight_smile:

I ended up implementing my own minimal linear probing hash for keeping track of the pointer set. I think that’s actually a pretty good fit for this use case, since it can be tuned for exactly this pattern: insert into set, no deletions, no updates of existing values. It’s roughly 2-3x faster now. I’ll switch back to HashMap though if I find an easy way of making that faster.

1 Like

I’d be interested in looking into this a bit more if you’d like to provide some details about how to reproduce your results.

Would it just be something like:

  • Create 100k pointers
  • Create std.AutoHashMap(*const Value, void)
  • hash_map.ensureTotalCapacity(100k)
  • Call hash_map.getOrPut(ptr) for all 100k pointers in a random order


There may be some cache effects too when it’s walking the graph and using the HashMap at the same time. I think I’ll be able to put together a test case for this.

FWIW, 3ms for 100k is not really that slow either, that’s like some 30 ns per operation. :slight_smile:

That’d be great. I’m mostly just curious and want to compare it to AutoArrayHashMap and existing non-Zig hash maps if I can to get a sense of if there’s an issue with AutoHashMap.

Are all your nodes stored or able to be stored in a single buffer? That way, instead of pointers you can use u32’s and you can allocate a bitstring with a number of bits equal to the number of items in the buffer. Then you can get or set a single bit from the bitstring to tell whether a node has been visited, where the position in the bitstring matches the index in the buffer.


That’s an alternative that I also had considered earlier. I didn’t want to go for that as I originally didn’t want to couple the node structure with how they’re allocated (e.g., I favored storing pointers instead of indices), but I think this buffer-based allocation strategy has many benefits.

I put the code up on GitHub: GitHub - nurpax/zigrograd: Micrograd in Zig

The auto hashmap vs my own ptrset can be chosen by flipping the useStdHashMap boolean argument on line:

In addition, you can uncomment benchmarking code below to show nanosecond timing for the part that’s using the hashmap:

The timer code at least on Windows gives high precision as I picked up sokol_time from the Sokol library.

I might announce it as a Showcase as I think the code might be of interest to some. It’s a small reimplementation of Andrej Karpathy’s micrograd. It was a fun exercise to implement a little training loop in pure Zig. A lot of the NN concepts are very concrete when written at this level of abstraction.


This should be unnecessary btw, std.time.Timer uses the same APIs as far as I can tell so it should be just as good for this use case.

1 Like

Thanks, I had missed that from reading std.time API ref! I changed it to std.time.Timer and that indeed works just as well.