Problem with HashMaps

SpinTensor · December 8, 2024, 7:19pm

Hi,

I never used a HashMap before, because my background is C and Fortran. So if I misunderstood anything here, please explain it to me. I want to create a HashMap where I have a set of keys (lets say they are of u8 type), and I want to store several values per key. If I’m not mistaken, this is not how hashmaps usually work. Usually one value per key, correct? Therefore I thought I would simply use an ArrayList as the value. they are resizable, thus I can append to them. Here is a test code I wrote:

const std = @import("std");

const KeyType = u8;
const Point = struct { x: usize, y: usize };
const ValueType = *std.ArrayList(Point);
const MapType = std.AutoHashMap(KeyType, ValueType);

fn addValueToKey(map: *MapType, key: KeyType, point: Point, allocator: std.mem.Allocator) !void {
    const maybelist = map.get(key);
    if (maybelist) |list| {
        try list.append(point);
        std.debug.print("oldlist {?}\n", .{list});
        try map.put(key, list);
    } else {
        var newList = std.ArrayList(Point).init(allocator);
        try newList.append(point);
        std.debug.print("newlist {?}\n", .{newList});
        try map.put(key, &newList);
    }
}

fn getValuesForKey(map: *MapType, key: KeyType) ?ValueType {
    return map.get(key) orelse null;
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var MyMap = MapType.init(allocator);
    defer _ = MyMap.deinit();

    try addValueToKey(&MyMap, 'a', Point{ .x = 1, .y = 2 }, allocator);
    try addValueToKey(&MyMap, 'a', Point{ .x = 3, .y = 4 }, allocator);
    try addValueToKey(&MyMap, 'b', Point{ .x = 5, .y = 6 }, allocator);
    try addValueToKey(&MyMap, 'c', Point{ .x = 7, .y = 8 }, allocator);

    std.debug.print("\n", .{});
    const testkey: u8 = 'a';
    if (MyMap.get(testkey)) |list| {
        for (list.items, 0..) |item, idx| {
            std.debug.print("{c} -> {d}: ({d},{d})\n", .{ testkey, idx, item.x, item.y });
        }
    } else {
        std.debug.print("No values for key \"{c}\"\n", .{testkey});
    }
}

In the function addValueToKey I check if the key is already in the map. If yes, then get the value (a pointer to the array list) and append my new point. If not, just make a new array list and append the point and add the list to the map.

I know that I need to clean up the allocated memory at the end of the program. currently I am cleaning only map. However, long before that the GPA tells me that memory leaks, and sometimes I get a segfault. can someone tell me why? And how to fix it? Also I would appreciate if someone could tell me an easier way to store multiple values per key.
Thank you

IntegratedQuantum · December 8, 2024, 7:28pm

var newList = std.ArrayList(Point).init(allocator);
newList is a variable defined on the stack. It will be invalidated at the end of the scope.
try map.put(key, &newList);
Here you are taking a pointer to it and store it in the map.
One line later the scope ends and the pointer points to invalid memory.

The solution would be to just don’t store a pointer in the map.
Instead you can use map.getPtr to get a pointer to the list stored in the HashMap’s internal memory (you just need to be careful, because such a reference will become invalid after changing the hashmap).

SpinTensor · December 8, 2024, 7:50pm

Thank you, that clears up a lot. One more question do I have. Is it possible to get an iterator over all keys of a map, or do I have to manage that on the side? I would need that to clear the memory properly.

IntegratedQuantum · December 8, 2024, 7:52pm

map.keyIterator()
There is also map.iterator() if you want key-value pairs.

SpinTensor · December 8, 2024, 7:54pm

Thank you again. That helps a lot.

gonzo · December 8, 2024, 8:44pm

There is also map.valueIterator(), in case that comes in useful.

Also, I think you should look into map.getOrPut() – it is perfect for getting the (already existing) value for a key, or if the key is not already in the map, adding a new entry to the map for that key. In your case, you could then do entry.value_ptr.* = std.ArrayList(Foo).init(allocator).

mnemnion · December 10, 2024, 3:52am

Another solution to consider would be using allocator.create to allocate the ArrayLists on the heap, and make *ArrayList the value type. The advantage there is that fetching that pointer with get can’t be later invalidated, because when the hashmap resizes it moves around pointers, not data, so any retrieved pointer will always continue to point at the ArrayList.

It’s an extra level of indirection, and you’d have to allocator.destroy the array list itself after de-initializing it with deinit, to free the backing slice. So you should favor doing it the way @IntegratedQuantum suggested unless you’ll have a situation where using the ArrayList will alternate with using the HashMap. I mention it because that pattern is actually fairly common, and debugging memory corruption issues of the “why is my ArrayList suddendly invalid” sort can be really annoying, since the invalidation and discovering it can happen far away from each other, and you might only see the bug under specific conditions, because putting values into the HashMap only resizes it (invalidating your getPtr pointers) at certain sizes.

So the question to ask is: are you going to be having pointers out in the wild when more data is added to the HashMap? Or is the use pattern “get an ArrayList, do something with it, be done with using it for now, then do HashMap stuff”? That will determine whether a *ArrayList value or an ArrayList value is a better choice for you. There’s no need for the extra indirection and bookkeeping if it doesn’t solve problems for you.

SpinTensor · December 10, 2024, 7:32am

Thank you for this solution.
I needed to construct the Hash-Map and then use it. So there was no way of invalidating while using it, but I can see that such situations might occur.

If situations like this (storing multiple values to a key) appear quite frequently, maybe it might be worthwhile to put something like a std.AutoMultiHashMap(KeyType, ValueType) into the std-lib that does exactly that.