How can this possibly fail?

The code is

const Trie = struct {
    /// if i am root, this will remain undefined
    char: u21 = undefined,
    /// = the amount of a subtrees + 1
    occurances: usize = 0,
    end: bool = false,

    parent: ?*Trie = null,
    children: std.AutoArrayHashMapUnmanaged(u21, Trie) = .{},


    pub fn addString(self: *Trie, allocator: std.mem.Allocator, str: []const u21) !void {
        if (self.hasString(str)) // this does not modify anything
            return error.WordAlreadyExists;

        var current = self;
        for (str) |char| {
            if (!current.children.contains(char)) try current.children
                .putNoClobber(allocator, char, .{ .char = char, .parent = current });
            const child = current.children.getPtr(char).?;
            std.debug.assert(child.parent == current); // this fails
            current = child;
            current.occurances += 1;
        }

        current.end = true;
    }
}

How can this possibly be the case? I have been searching for the bug since like 4 hours in total now and cant find it. There are no other function modifying anything about the trie anywhere else.

Your code doesn’t handle the case when the children hashmap grows beyond its initial capacity, when it does all children that are already inserted will be rearranged within the resized internal storage, this changes their pointer addresses.
This means for example that the old children now have a parent pointer that points to wrong/old memory.

Basically your next line:

const child = current.children.getPtr(char).?;

assumes that the data structure gives you pointer stability, but the used data structure only gives you that until it resizes and it resizes when its capacity was too small to insert the new element.

For a quick check you could set the initial capacity to something ridiculous high that is never exceeded, that should make the errors disappear, but now you are wasting memory.

The easiest way to fix this, is probably to change the hashmap to store pointers instead children: std.AutoArrayHashMapUnmanaged(u21, *Trie) = .{}, which means you would need to allocate the Trie node and initialize it before adding it to the map. (and destroy the child nodes when you remove them / deinit the Trie)

Alternatively you could try to write your code in a careful way, to ensure you know exactly when and where the children hash map grows and when it does, you would have to “fixup” all pointers that get broken by that growing operation, but I don’t really recommend that approach, because it seems error prone and brittle.

Personally my most preferred approach in these situations is to create a pool of Trie nodes and then store indizies for those Trie nodes within children, instead of pointers, that way you avoid the whole issue of pointer stability. Instead you have an explicit index-ing into the pool somewhere in the code.


Here is a relatively minimal example that illustrates the problem you are experiencing:

const std = @import("std");

const Node = struct {
    parent: ?*Node,
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var root = Node{ .parent = null };
    var children: std.AutoArrayHashMapUnmanaged(u16, Node) = .{};
    defer children.deinit(allocator);

    std.debug.print("initial capacity: {d}\n", .{children.capacity()});

    var address_after_insert = std.ArrayList(*Node).init(allocator);
    defer address_after_insert.deinit();

    for (0..20) |i| {
        const key: u16 = @intCast(i);
        try children.putNoClobber(allocator, key, .{ .parent = &root });
        std.debug.print("capacity: {d}\n", .{children.capacity()});
        try address_after_insert.append(children.getPtr(key).?);
    }

    std.debug.print("capacity after inserts: {d}\n", .{children.capacity()});

    // now lets check the addresses
    std.debug.print("check addresses:\n", .{});

    for (address_after_insert.items, 0..20) |insert_address, i| {
        const key: u16 = @intCast(i);
        const current_address = children.getPtr(key).?;
        if (insert_address == current_address) {
            std.debug.print("{d}:    same\n", .{key});
        } else {
            std.debug.print("{d}:    inserted: {*} current: {*}\n", .{ key, insert_address, current_address });
        }
    }
}

The output is:

initial capacity: 0
capacity: 8
capacity: 8
capacity: 8
capacity: 8
capacity: 8
capacity: 8
capacity: 8
capacity: 8
capacity: 19
capacity: 19
capacity: 19
capacity: 19
capacity: 19
capacity: 19
capacity: 19
capacity: 19
capacity: 19
capacity: 19
capacity: 19
capacity: 20
capacity after inserts: 20
check addresses:
0:    inserted: hashmap.Node@7f5fd0f56000 current: hashmap.Node@7f5fd0f5c000
1:    inserted: hashmap.Node@7f5fd0f56008 current: hashmap.Node@7f5fd0f5c008
2:    inserted: hashmap.Node@7f5fd0f56010 current: hashmap.Node@7f5fd0f5c010
3:    inserted: hashmap.Node@7f5fd0f56018 current: hashmap.Node@7f5fd0f5c018
4:    inserted: hashmap.Node@7f5fd0f56020 current: hashmap.Node@7f5fd0f5c020
5:    inserted: hashmap.Node@7f5fd0f56028 current: hashmap.Node@7f5fd0f5c028
6:    inserted: hashmap.Node@7f5fd0f56030 current: hashmap.Node@7f5fd0f5c030
7:    inserted: hashmap.Node@7f5fd0f56038 current: hashmap.Node@7f5fd0f5c038
8:    same
9:    same
10:    same
11:    same
12:    same
13:    same
14:    same
15:    same
16:    same
17:    same
18:    same
19:    same

The second resize from capacity 19 to 20 seems to be an in-place resize, that doesn’t require any reallocation, that is why the indicies from 8 to 18 still are the same.

2 Likes

Thank you so much now that i see it it seems obvious lmao

2 Likes

I managed to fix it, well, I did neither of the two options that you proposed to me, but instead scrapped the entire parent pointer idea and just made the nodeiterator remember entire paths. I though this approach would be harder but it was actually quite pleasant and once I simplified actually shorter.

If you’re interested, this is the commit Removed parent pointer from Trie, changed nodeiter.next logic · 7f9478f8cc - segfault-enjoyer/hangman-solver - Codeberg.org and this is the iterator now hangman-solver/src/main.zig at main - segfault-enjoyer/hangman-solver - Codeberg.org

I have now also replaced the hashmap with an arraylist to reduce data usage and it doesnt seem to be slower because its a max. size of like ~26 anyways