Fast "set" implementation for pointers

nurpax · October 31, 2023, 6:10pm

Hi!

I have a graph traversal where I’m walking nodes in an expression tree. I implementeda “nodes visited” set using std.AutoHashMap(*const Value, bool):

The hashmap is used like below:

const res = self.visited.getOrPut(root) catch unreachable;
if (!res.found_existing) {
     // visit node
}

I avoid allocations using clearRetainingCapacity() across multiple calls to the traversal routine.

This traversal with the hashmap is dominating the cost of my graph algorithm, and I suspect it’s probably due to using the default hash function. I wonder if anyone has any experience with faster ways of setting up “pointers hashing” than the defaults?

I know that using something like bitset for the "visited’ set would be faster, but I currently don’t have integer IDs for the nodes, only pointers.

mcadamy · October 31, 2023, 7:28pm

@IntFromPtr then you have in

That autohash code looks sus. How many nodes are you talking about?

The fastest, easiest least useful way is to add a seen flag on each node and color that when doing the traversal then reset at the end. Obviously you can only do one traversal at a time.

nurpax · October 31, 2023, 7:30pm

That implies an unbounded size for the bitset.

squeek502 · October 31, 2023, 9:41pm

Probably unrelated, but the way to create sets in Zig is to use void for the value type.

std.AutoHashMap(*const Value, void);

This makes the value use up 0 bits.

Would be good to know if it’s the growIfNeeded part of getOrPut or the getOrPutAssumeCapacityAdapted part that’s taking up the most runtime. If it’s the growIfNeeded part, then you might want to use a faster allocator (like c_allocator)

Wyhash looks sus to you?

nurpax · October 31, 2023, 10:05pm

Thanks for the replies @mcadamy and @squeek502!

Coloring: not a bad idea, actually. That’s definitely a more direct way to keep track of visited nodes.

Currently I have some 100k nodes in my graph and traversal was taking roughly 3 ms. I only need to traverse to get a topological sort of the nodes and I noticed that this was costing more than I expected.

Re using void. Oh, I actually did notice this in a blog post after I had posted this. I tried to read through the HashMap code to see what difference it makes. Is it just storage, or could this have a perf advantage also?

BTW I would not expect allocator speed to matter here as I am using clearRetainingCapacity across multiple invocations and I measured performance after the initial allocs had been done. I have noticed though that basically any calls to the GeneralPurposeAllocator kill perf totally, so I usually avoid it.

I ended up implementing my own minimal linear probing hash for keeping track of the pointer set. I think that’s actually a pretty good fit for this use case, since it can be tuned for exactly this pattern: insert into set, no deletions, no updates of existing values. It’s roughly 2-3x faster now. I’ll switch back to HashMap though if I find an easy way of making that faster.

squeek502 · October 31, 2023, 10:13pm

I’d be interested in looking into this a bit more if you’d like to provide some details about how to reproduce your results.

Would it just be something like:

Create 100k pointers
Create std.AutoHashMap(*const Value, void)
hash_map.ensureTotalCapacity(100k)
Call hash_map.getOrPut(ptr) for all 100k pointers in a random order

?

nurpax · October 31, 2023, 10:26pm

There may be some cache effects too when it’s walking the graph and using the HashMap at the same time. I think I’ll be able to put together a test case for this.

FWIW, 3ms for 100k is not really that slow either, that’s like some 30 ns per operation.

squeek502 · October 31, 2023, 10:28pm

That’d be great. I’m mostly just curious and want to compare it to AutoArrayHashMap and existing non-Zig hash maps if I can to get a sense of if there’s an issue with AutoHashMap.

Validark · October 31, 2023, 11:27pm

Are all your nodes stored or able to be stored in a single buffer? That way, instead of pointers you can use u32’s and you can allocate a bitstring with a number of bits equal to the number of items in the buffer. Then you can get or set a single bit from the bitstring to tell whether a node has been visited, where the position in the bitstring matches the index in the buffer.

nurpax · November 1, 2023, 8:59am

That’s an alternative that I also had considered earlier. I didn’t want to go for that as I originally didn’t want to couple the node structure with how they’re allocated (e.g., I favored storing pointers instead of indices), but I think this buffer-based allocation strategy has many benefits.

nurpax · November 2, 2023, 5:27pm

I put the code up on GitHub: GitHub - nurpax/zigrograd: Micrograd in Zig

The auto hashmap vs my own ptrset can be chosen by flipping the useStdHashMap boolean argument on line:

github.com

nurpax/zigrograd/blob/03966adddfb04027fd8689ae31f07b69271f0dfa/src/zigrograd.zig#L113


      
                      }
                      return self.hash.insert(v);
                  }
                  pub fn clearRetainingCapacity(self: *@This()) void {
                      self.hash.clearRetainingCapacity();
                  }
              };
          }
          
          pub const Backward = struct {
              const Set = PointerSet(*const Value, false);
              //const Set = PointerSet(*const Value, true);
          
              topo: std.ArrayList(*const Value),
              visited: Set,
          
              pub fn init(alloc: std.mem.Allocator) Backward {
                  return Backward{
                      .topo = std.ArrayList(*const Value).init(alloc),
                      .visited = Set.init(alloc),
                  };

In addition, you can uncomment benchmarking code below to show nanosecond timing for the part that’s using the hashmap: zigrograd/src/zigrograd.zig at 03966adddfb04027fd8689ae31f07b69271f0dfa · nurpax/zigrograd · GitHub

The timer code at least on Windows gives high precision as I picked up sokol_time from the Sokol library.

I might announce it as a Showcase as I think the code might be of interest to some. It’s a small reimplementation of Andrej Karpathy’s micrograd. It was a fun exercise to implement a little training loop in pure Zig. A lot of the NN concepts are very concrete when written at this level of abstraction.

squeek502 · November 2, 2023, 10:20pm

This should be unnecessary btw, std.time.Timer uses the same APIs as far as I can tell so it should be just as good for this use case.

nurpax · November 3, 2023, 4:06pm

Thanks, I had missed that from reading std.time API ref! I changed it to std.time.Timer and that indeed works just as well.