Add std.BoundedArray(u5, 32) to hashmap

I construct a lot of “strings” with a std.BoundedArray(u5, 32)
and want to store the unique ones in a hashmap.
AutoHashMap does not work property in this case.
Anyone knows how to do this?

You need to use std.HashMap, and explicitly provide hashing and equality functions through the Context type parameter. The docs for HashMap explains this in more detail.

How you choose to hash your strings depends on your use case, but there are many existing methods in std.hash.

Ok. I get an enormous list of compiler errors when using HashMap.
I’ll have to think of something easier first…

I think I misunderstood your question. When you said “AutoHashMap” does not work properly in this case" I thought you meant that it couldn’t generate an autoHash function for your BoundedArray type, but it can, so that’s not the issue.

Would you mind explaining why std.AutoHashMap(std.BoundedArray(u5, 32), void) doesn’t work for you?

Adding 16 items of which 8 are unique gives 13 hash keys.
I will have to make a separate program to see what is going on :frowning:

define unique, do you mean half of them are duplicates?
if so getting 13 hashes is concerning

True :slight_smile:
I will get back here with proof or maybe proof that I am an idiot.

AutoContext will give you hash and eql functions that act on all elements of the array, regardless of the current len. Instead, you want hash and eql functions that work on the return of slice()/constSlice().

Here’s a test showing the problem (the test will fail):

test "bounded array hash map" {
    var map: std.AutoHashMap(std.BoundedArray(u5, 32), void) = .init(std.testing.allocator);
    defer map.deinit();

    var arr = try std.BoundedArray(u5, 32).fromSlice(&[_]u5{ 1, 2, 3, 4 });

    // Intentionally set some element outside of the populated slice
    arr.buffer[arr.buffer.len - 1] = 1;
    try map.putNoClobber(arr, {});

    // Now change the element that should be irrelevant for hash/eql
    arr.buffer[arr.buffer.len - 1] = 2;
    const gop_result = try map.getOrPut(arr);

    // Should still find the previously added BoundedArray key
    try std.testing.expect(gop_result.found_existing);
}

Something like this would work:

fn BoundedArrayContext(comptime T: type, comptime buffer_capacity: usize) type {
    return struct {
        pub fn hash(self: @This(), key: std.BoundedArray(T, buffer_capacity)) u64 {
            _ = self;
            const slice = key.constSlice();
            if (std.meta.hasUniqueRepresentation(T)) {
                return std.hash.Wyhash.hash(0, std.mem.asBytes(&key));
            } else {
                var hasher = std.hash.Wyhash.init(0);
                for (slice) |v| {
                    std.hash.autoHash(&hasher, v);
                }
                return hasher.final();
            }
        }

        pub fn eql(self: @This(), a: std.BoundedArray(T, buffer_capacity), b: std.BoundedArray(T, buffer_capacity)) bool {
            _ = self;
            return std.mem.eql(T, a.constSlice(), b.constSlice());
        }
    };
}

Changing the map initialization in the test code above to this would get it to pass:

    var map: std.HashMap(
        std.BoundedArray(u5, 32),
        void,
        BoundedArrayContext(u5, 32),
        std.hash_map.default_max_load_percentage,
    ) = .init(std.testing.allocator);
3 Likes

BoundedArrays are just a buffer with a comptime known capacity plus a runtime known length. The buffer is initialized to undefined:

Let’s peek at the AutoContext function (which powers the AutoArrayHashMap function).

You probably aren’t getting the first branch of getAutoHashFn, since u5 will be stored in a u8 by default:

So we must now look at autoHash.

std.hash.autoHash will start wit the std.BoundedArray struct and hash the buffer field, and then the len field:

The buffer field will be hashed by the hashArray function:

And this is where the disconnect happens. std.hash.autoHash is hashing the entirety of the buffer field, where you are only expecting it to hash the portion of the buffer that is specified by the len field.

Theoretically, std could support this use case by having some kind of hash function, similar to the current format function. However it is probably better to just implement your own hashmap context that will make sure to only use the values specified by len.

2 Likes

AutoContext should be changed to use hash and eql methods that have the correct signature on the given type.
And types like BoundedArray should use that, It’s reasonable to expect types in std to behave with each other.

My main hesitation would be the amount of duck typing this would introduce. What is the correct signature for hash? What is the correct signature for eql? Depending on how the std library handles it, this would also result in the names hash and eql being off limits unless they exactly match what std expects. All of these concerns can be answered, but they would have to be answered.

Other than that, I agree that this would be nice to have in std.

In any case current code will have to work around the current limitations.

On second thought a HashMapContext fn would be better

Thank you both!
I was suspecting the autohash system would include the unused data beyond “len” in the hash.
BTW: I learned another thing from your code. We can use .init()

1 Like

Interesting…
Indeed I notitced when initializing to:
var suffix : std.BoundedArray(u5, 32) = .{};
(it is a local var anyway)
the autohash seems to work correctly.

Not sure I quite understand what you mean, but note that autoHash seeming to ‘work’ could be a coincidence due to undefined being set to 0xaa in debug mode (or some unknown value in other modes), so the unused bytes of buffer may just so happen to compare equal.

The test case I posted above is a more reliable indicator of whether or not a hash/eql actually works correctly.

1 Like

True, I was wondering if
std.BoundedArray(u5, 32) = .{};
initializes to 32 zero’s… (with len = 0)
if not how?

buffer has a default value of undefined:

See the undefined docs:

https://ziglang.org/documentation/master/#undefined

Yes I saw the undefined in the code…

So it initializes the bytes of buffer to whatever memory was there before, i.e. it doesn’t initialize it to any particular value at all. In the future, usage of autoHash with BoundedArray may trigger a panic since it could (incorrectly) be using the parts of buffer that are still undefined: