What's the point in `HashMap.getOrPut`?

scubbo · December 28, 2024, 3:05am

Building off this question (and AoC day 8) - what’s the point of a HashMap’s getOrPut? Since it doesn’t accept a value, it seems misleadingly named to me - you still have to call map.put to put a value.

Contrast the two methods used here:

const expect = @import("std").testing.expect;

test "HashMap of Lists" {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var map = std.AutoHashMap(u8, std.ArrayList(u32)).init(allocator);
    var list_for_a = std.ArrayList(u32).init(allocator);
    try list_for_a.append(1);
    try map.put('a', list_for_a);

    if (!map.contains('b')) {
        try map.put('b', std.ArrayList(u32).init(allocator));
    }
    try map.getPtr('b').?.append(2);

    const response = try map.getOrPut('c');
    // Seems that `getOrPut` _doesn't_ put anything - as uncommenting the following line causes a segmentation fault
    // print("DEBUG - content at c is {}\n", .{response.value_ptr});
    if (!response.found_existing) {
        try map.put('c', std.ArrayList(u32).init(allocator));
    }
    // Now that we have manually _put_ something into the map, the previously-breaking line is legal:
    print("DEBUG - content at c is {}\n", .{response.value_ptr});
    try response.value_ptr.append(3);

    try expect(map.get('a').?.items[0] == 1);
    try expect(map.get('b').?.items[0] == 2);
    try expect(map.get('c').?.items[0] == 3);
}

When would one prefer to use the getOrPut approach? Or, is there a better third approach that I’ve missed?

ianprime0509 · December 28, 2024, 3:18am

Welcome to the forum! The advantage of getOrPut is that it’s more efficient than doing contains followed by put, as it only needs to find the (existing or new) position of the key in the map once. The intended way to use getOrPut is not to call put in the case where an existing entry isn’t found, but to initialize the new entry’s value using the returned value_ptr:

if (!response.found_existing) {
    response.value_ptr.* = std.ArrayList(u32).init(allocator);
}

getOrPut does put something: namely, the new entry in the map, but it doesn’t initialize the value for you. That’s where the returned entry comes in, by giving you a pointer to the new entry’s value for you to initialize it.

scubbo · December 28, 2024, 3:58am

Thanks!

Hmm…I still have some thinking to do to get my head around Zig’s low-level operation, then (the majority of my experience is with Java, Python, and TypeScript), since to me “putting an entry with an uninitialized value into the map” is essentially equivalent to “not putting anything into the map”. Would the difference be that map.contains(key) would return true (but map.get(key) would still return null)? Or should I be thinking in terms of allocated/reserved memory (if so - how can getOrPut reserve memory for an ArrayList when that is - if I’ve understood it right - a growable data structure?)?

it only needs to find the (existing or new) position of the key in the map once.

Fascinating. So it sounds like you’re saying that response.value_ptr.* = ... would be faster to execute than map.put(key, ...), because the latter would have to (maybe I’m using the wrong terms here, but hopefully the intention is understandable) traverse the map again to find “where to put” the value corresponding with key? Is that because of the cost of hashing key (I would usually consider that to be negligible; but I guess at the levels we’re dealing with here, it’s relevant?), or because of some other factor?

Thanks for the helpful insight! This is really rewiring my brain to care about a lot of stuff that’s been safely-ignorable before

LucasSantos91 · December 28, 2024, 4:36am

Yes and yes. Everytime you need to search for an element, you need to do the whole shebang: hash it, chase pointers and do comparisons with elements that hash collide.

If they return true or the value that you are looking for, then it’s fine. But returning false or null is not enough, as you’d have to redo all the work to add the value to the map.

The same way that you store any growable data structure anywhere. You store a pointer, and put the data somewhere else.

scubbo · December 28, 2024, 4:48am

Wow - Zig is really chasing efficiency! Makes sense, I guess. Thanks for explaining!

gonzo · December 28, 2024, 11:48am

One other thing: the cost of finding the right spot to put that new element might be sort-of O(1) for a hash table-based map (but what about collisions?) – other map implementations might need more work. So the point of getOrPut() is allowing you to pay that cost only once.

I wish other languages exposed this level of flexibility too! Zig

scubbo · December 28, 2024, 11:42pm

Hmmm - I did some experiments to try to understand this better, and got some very confusing results. Comparisons of the pointer-based and non-pointer-based approaches are inconsistent - neither is consistently faster, though the non-pointer-based approach tends to win more often than not. Can anyone see what mistake I might have made in setting up the experiment?

IntegratedQuantum · December 29, 2024, 12:18am

I cannot see the mistake that caused the segmentation fault. I’d need to see the full stack trace for that. However the fact that it works at numbers less than 10 might suggest that it has something to do with pointer invalidation after container resizing. Not sure about the HashMap, but an ArrayList for example resizes for the first time after 8 elements.

But I do see a few problems with your benchmarking methodology:

You do not use the same seed for both attempts, so they end up doing different operations. Generally we want to avoid additional variance introduce by doing different operations. If you average over many runs this might not be a big deal though.
Are you even measuring the right thing? You are generating two random numbers per loop. Even worse, since it’s unlikely that you get the same key twice you end up allocating once every iteration. And the GPA is not known for being fast.
The order of execution matters. If you do the non-pointer based test first, you might get totally different results. The GPA for example will be a lot slower for the first allocations, since it always requires a syscall, whereas subsequent allocation in the second execution will be a lot faster.
The numbers seem suspiciously high. You are inserting 25 elements, even with the allocation, that should not take 200 µs. Did you maybe forget to compile in release mode?

ianprime0509 · December 29, 2024, 1:00am

Adding on to the good points @IntegratedQuantum mentioned, just to elaborate on this hint:

I got the following segfault on my machine when running your benchmark with higher test run numbers:

⬢ [ian@toolbox tmp]$ zig run bench.zig 
Segmentation fault at address 0x7f7be200f130
/var/home/ian/src/zig/lib/std/array_list.zig:500:32: 0x1043cfc in addOne (bench)
            const newlen = self.items.len + 1;
                               ^
/var/home/ian/src/zig/lib/std/array_list.zig:262:49: 0x103cb8c in append (bench)
            const new_item_ptr = try self.addOne();
                                                ^
/var/home/ian/tmp/bench.zig:69:38: 0x103d60b in timeNonPointerBasedMethod (bench)
        try response.value_ptr.append(value);
                                     ^
/var/home/ian/tmp/bench.zig:30:67: 0x103ddfe in main (bench)
        non_pointer_based_times[i] = try timeNonPointerBasedMethod(rand, allocator);
                                                                  ^
/var/home/ian/src/zig/lib/std/start.zig:656:37: 0x103c0d2 in posixCallMainAndExit (bench)
            const result = root.main() catch |err| {
                                    ^
/var/home/ian/src/zig/lib/std/start.zig:271:5: 0x103bcbd in _start (bench)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x0 in ??? (???)
Aborted (core dumped)

Line 69 pointed to here refers to this part of the non-pointer benchmark code:

const response = try map.getOrPut(key);
if (!response.found_existing) {
    try map.put(key, std.ArrayList(u32).init(allocator));
}
try response.value_ptr.append(value);

This is not correct: there is no guarantee that the response pointers still refer to valid memory after you perform additional mutations (such as map.put) on the map. In this case, calling map.put tries to ensure space for the new entry in the map (before it checks to see if the key is already present), which will resize the map’s internal memory if not enough space is already allocated, invalidating any pointers into the old memory.

scubbo · December 31, 2024, 1:37am

Thank you both for the tips! I’ll do my best to adapt the benchmarking based on this advice in the New Year.

In particular:

Did you maybe forget to compile in release mode?

I was running this with zig run <filename>, as I’ve been doing for all my AoC challenges so far - I’ll try compiling in my next iteration!

ianprime0509 · December 31, 2024, 1:54am

Just to clarify, you can also use zig run -OReleaseFast <filename> to compile and run your program in release mode. The default optimization mode (for both zig run and zig build-exe) is Debug, which is best for development (safety checks enabled, debug info available, few optimizations applied so it’s easier to step through the code in a debugger), but not a good choice for benchmarking.

Of course, if you want to do benchmarking at an executable level, it’s still best to precompile an executable using zig build-exe and use something like poop to get nice timing, memory usage, and other stats using two different commands (e.g. add a command-line option in your benchmarking program to select which approach should be executed).

andrewrk · December 31, 2024, 1:54am

Another tip is to use the pointer stability lock matching the lifetime of the get or put result:

const response = try map.getOrPut(key);
map.lockPointers();
defer map.unlockPointers();
if (!response.found_existing) {
    try map.put(key, std.ArrayList(u32).init(allocator));
}
try response.value_ptr.append(value);

I believe this will catch your bug.

Indeed it does:

thread 939658 panic: reached unreachable code
/home/andy/src/zig/lib/std/debug.zig:405:14: 0x104360d in assert (test)
    if (!ok) unreachable; // assertion failure
             ^
/home/andy/src/zig/lib/std/debug.zig:1540:15: 0x1063a4b in lock (test)
        assert(l.state == .unlocked);
              ^
/home/andy/src/zig/lib/std/hash_map.zig:1351:44: 0x10637ed in getOrPutContextAdapted__anon_5456 (test)
                self.pointer_stability.lock();
                                           ^
/home/andy/src/zig/lib/std/hash_map.zig:1338:56: 0x1046f5a in getOrPutContext (test)
            const gop = try self.getOrPutContextAdapted(allocator, key, ctx, ctx);
                                                       ^
/home/andy/src/zig/lib/std/hash_map.zig:1264:52: 0x1046086 in putContext (test)
            const result = try self.getOrPutContext(allocator, key, ctx);
                                                   ^
/home/andy/src/zig/lib/std/hash_map.zig:552:45: 0x1042afe in put (test)
            return self.unmanaged.putContext(self.allocator, key, value, self.ctx);
                                            ^
/home/andy/tmp/test.zig:25:20: 0x1042592 in test.HashMap of Lists (test)
        try map.put('c', std.ArrayList(u32).init(allocator));
                   ^