StringHashMap

Still some confusion here with my first real hashmap adventures
(yes i had a lot of memory corruptions but solved them and learned something).

Do I need to free the keys of the hashmap?

var map: StringHashMap(u8) = .empty;

fn add(name: []const u8, value: u8) {
    map.put(name, value);
}

add("hello", 42);
1 Like

Yes.

Maybe. Definitely the hash map will not do it for you.

One reason being that the strings could be in static memory, and that would segfault.

So I would need to make a new copy of the parameter, put that copy into the hashmap and free the key when dropping the hashmap. To be ultra sure…?

You need to decide what owns the slices.

If it’s the StringHashMap, so be it: iterate the keys() and free them, then deinit the HashMap.

If it’s something else, don’t do that. Have the something else do it instead.

Example: diffz has a line-diff mode, and how it works is fairly clever: it hashes every line in both files, and converts them to a list of unique characters (UTF-8 encoded). It then run the Myer’s diff on just the characters, and iterates the character diffs, retrieving the lines from the HashMap. Result: line-based diffs.

That hash map doesn’t own the slices, so it doesn’t free them. They’re just views into the source buffers. The final diffs have copies, although that isn’t strictly necessary, but it means you can free the file buffers once the diffing is done and nothing bad will happen.

Very tricky
You just put in some “hello” parameter and before you know …armageddon

FWIW, for my Advent of Code solutions in zig, I use an interned string table. This is basically:

  • An ArrayList that holds all strings (and they are dynamically allocated copies of the slices passed in); the position in this table becomes the string id.
  • A StringHashMap which maps from each string to its position in the array (its id).

With this in place, I can easily compare two strings by comparing their positions in the ArrayList. When the string table is being deinited, I release the memory used by each string.

This works really well for my use cases. Cheers.

4 Likes

Thx. I was also thinking about that. Currently it is working without mistakes. Better structures for future.

These are great, confirmed. I’m translating a program which uses them, and it’s a lifesaver.

You have independently derived the StringArrayHashMap, more or less. This is my whole interning implementation:

const StrSafe = struct {
    safe: StringArrayHashMap(void),
    allocator: Allocator,

    pub fn init(allocator: Allocator) StrSafe {
        return .{ .allocator = allocator, .safe = .empty };
    }

    pub fn find(strsafe: *StrSafe, key: []const u8) ?[]const u8 {
        return strsafe.safe.getKey(key);
    }

    pub fn intern(strsafe: *StrSafe, k: []const u8) ![]const u8 {
        if (strsafe.safe.getKey(k)) |key| {
            return key;
        }
        const dupe = try strsafe.allocator.dupe(u8, k);
        try strsafe.safe.put(strsafe.allocator, dupe, {});
        return dupe;
    }
};

(The safe field actually uses the Unmanaged variant, I’m phasing in the new default to minimize code changes later)

2 Likes

Not for the first time… It is not a surprise that the solution space for these things quickly converges to one or two good ideas. Can’t claim any flashes of genius, but it is good to get confirmation that others have walked the same path. Cheers!

There also was this old (clever) implementation that was used within the WASM namespace for a while (that still could be used for other things):

The benefit of that variant is that it doesn’t need to use/store any slices internally, it just stores the start position of specific null terminated keys.

Here is a related zig news post

Ahh, now it is still used here:

For short strings (and small maps) we can use at boundedarray as well though take care that they hash correctly.

What @gonzo described is not a StringArrayHashmap, for the purposes of string interning, it is no different from using a StringHashMap, the only difference is how efficient it is to iterate through the stored keys/values due to how it stores that data.

@mnemnion is using the first pointer given to the hash map as the ID, this can be done with a StringHashMap as well.
@gonzo is using an index into an ArrayList of copies of the keys as the ID. This offers the ability to use a smaller type for the ID.
They don’t specify if they are using an ArrayList(u8) or ArrayList([]const u8)

Both are interning strings, but they are different.

I am using an std.ArrayList([]const u8). Here is my code, in case anyone is curious.

1 Like
O M G

all these years, all these puzzels…

2 Likes