How to idiomatically move values from one "Set" to another, and how should I think about `keyIterator` values under-the-hood?

scubbo · January 20, 2025, 2:34am

I’m abusing the keys of an AutoHashMap to act as a Set (I know that this exists, but I’m trying to solve AoC after-the-fact without any dependencies in an attempt to learn the language).

I first tried this:

const std = @import("std");
const print = std.debug.print;

const expect = @import("std").testing.expect;

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var first_set = std.AutoHashMap(u32, bool).init(allocator);
    var second_set = std.AutoHashMap(u32, bool).init(allocator);
    defer first_set.deinit();
    defer second_set.deinit();

    first_set.put(42, true) catch unreachable;
    first_set.put(75, true) catch unreachable;
    first_set.put(180, true) catch unreachable;

    var first_set_key_iter = first_set.keyIterator();
    while (first_set_key_iter.next()) |k| {
        _ = first_set.remove(k.*); // `k.*` necessary because `k` is a pointer, but `.remove()` takes a value
        second_set.put(k.*, true) catch unreachable; // Ditto above
    }

    try expect(second_set.contains(42));
    try expect(second_set.contains(75));
    try expect(second_set.contains(180));
    try expect(first_set.count() == 0);
}

But the .contains tests failed. By adding some debug logging statements, I found that the values being added to second_set were completely different than the values being removes from first_set, even though they were both k.*.

I assume that this is something to do with the fact that k was derived by iterating over first_set, and that first_set.remove changes the set, but I don’t see how that would have any effect - k is a *u32 (i.e. a pointer to a u32 value), so once that value has been retrieved (by going “via” first_set) I don’t see why changes to first_set would affect its value.

That is, the way I (wrongly) imagine while (first_set_key_iter.next()) |k| { working is:

Find the next key of first_set
Identify what u32 it is
Create a pointer to that u32 value
Set k equal to that pointer

Under this (again, wrong) paradigm, modifications to first_set wouldn’t affect the value of k.* - after step 2, the original source of k is irrelevant. Even though (say) 42 was originally the answer to “what is the first key of first_set?”, no modification to first_set will change the fact that k is “a pointer to 42”.

So, my first question: What’s an alternative paradigm that more-closely matches the actual operation? Is it closer to “k is a pointer to the-u32-which-is-the-first-key-of-first_set”? What are some use-cases that that makes easier, smoother, or faster?

And my second question is - what’s a more idiomatic way to achieve what I’ve done above? The code below works, but I suspect there’s a better way:

...
    var first_set_key_iter = first_set.keyIterator();
    while (first_set_key_iter.next()) |k| {
        const persisted_value = k.*; // _this_ will keep its value, even if `first_set` changes
        _ = first_set.remove(persisted_value);
        second_set.put(persisted_value, true) catch unreachable;
    }
...

AndrewKraevskii · January 20, 2025, 4:07am

You can use void as type of value for a HashMap and this is the way to get Set in zig. Code would look like this than:

const std = @import("std");
const print = std.debug.print;

const expect = @import("std").testing.expect;

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var first_set = std.AutoHashMap(u32, void).init(allocator);
    var second_set = std.AutoHashMap(u32, void).init(allocator);
    defer first_set.deinit();
    defer second_set.deinit();

    first_set.put(42, {}) catch unreachable;
    first_set.put(75, {}) catch unreachable;
    first_set.put(180, {}) catch unreachable;

    var first_set_key_iter = first_set.keyIterator();
    while (first_set_key_iter.next()) |k| {
        _ = first_set.remove(k.*); // `k.*` necessary because `k` is a pointer, but `.remove()` takes a value
        second_set.put(k.*, {}) catch unreachable; // Ditto above
    }

    try expect(second_set.contains(42));
    try expect(second_set.contains(75));
    try expect(second_set.contains(180));
    try expect(first_set.count() == 0);
}

But this wont work. We can see why by adding some logging:

var first_set_key_iter = first_set.keyIterator();
while (first_set_key_iter.next()) |k| {
    std.debug.print("before remove: {d}\n", .{k.*});
    _ = first_set.remove(k.*); // `k.*` necessary because `k` is a pointer, but `.remove()` takes a value
    std.debug.print("after  remove: {d}\n", .{k.*});
    second_set.put(k.*, {}) catch unreachable; // Ditto above
}

before remove: 75
after  remove: 2863311530
before remove: 42
after  remove: 2863311530
before remove: 180
after  remove: 2863311530
error: TestUnexpectedResult

As you can see after removing value from map key is replaced by another value. So adding it to second set wont produce desired result. We can fix it by storing key temporary in local variable. This code passes tests.

var first_set_key_iter = first_set.keyIterator();
while (first_set_key_iter.next()) |k| {
    const key_copy = k.*;
    _ = first_set.remove(k.*);
    second_set.put(key_copy, {}) catch unreachable;

    // Or we can just swap operations places.
    // second_set.put(k.*, {}) catch unreachable;
    // _ = first_set.remove(k.*);
}

Even tho it passes test this is also incorrect aproach. From docs of keyIterator:

/// Create an iterator over the keys in the map.
/// The iterator is --> invalidated <-- if the map is modified.
pub fn keyIterator(self: Self) KeyIterator {
    return self.unmanaged.keyIterator();
}

It means we cant iterate over HashMap and remove values from it at the same time (well we can and did it in code before but it sometimes wont work). What we can do instead is just remove all keys after loop is done.

var first_set_key_iter = first_set.keyIterator();
while (first_set_key_iter.next()) |k| {
    second_set.put(k.*, {}) catch unreachable;
}
first_set.clearRetainingCapacity();

AndrewKraevskii · January 20, 2025, 4:31am

Okey I missed question first time)

So to answer first question you can imagine HashMap having big buffer where it stores keys and values but we can ignore values since they are void.

Iterators goes over buffer and returns you pointer to cell where key exists. So K will be a pointer to 0,1,3 cells when iterating.

But what happens if you remove key while still holding pointer to cell? You will get pointer to same place but value it points to is overwriten.

If think I know what makes you confused. In something like python it would be Set of pointers to values. So removing pointer to value won’t affect value itself.

But here Set itself owns value so returned pointer is also pointing inside of a Set

LucasSantos91 · January 20, 2025, 8:44am

remove marks the value as undefined.

fn removeByIndex(self: *Self, idx: usize) void {
            self.metadata.?[idx].remove();
            self.keys()[idx] = undefined;
            self.values()[idx] = undefined;
            self.size -= 1;
            self.available += 1;
        }

The better way to achieve what you want is through fetchRemove.

/// If there is an `Entry` with a matching key, it is deleted from
/// the hash map, and then returned from this function.
pub fn fetchRemove(self: *Self, key: K) ?KV

It’s also more efficient as it avoids rehashing the key.

scubbo · January 20, 2025, 4:54pm

Got it, thanks folks, that’s super-helpful - especially those diagrams, thank you @AndrewKraevskii. So it seems like I really should be thinking of those pointers not as pointing at “a u32 value”, but rather at “whatever value is in the key-cell of the HashMap”. That’s definitely different from my usual mental-model, but I can see how it’s consistent - thank you! And thanks @LucasSantos91 for the suggestion of the built-in that does what I want.