How to make a trie in Zig

I am trying to make a Trie struct, but this question could apply to other situations. The inherent structure of a trie means that you need to allocate memory every time you try to insert a new word into the trie.

const TrieNode = struct {
    children: [26]?*TrieNode,
    is_word: bool = false,
};

const Trie = struct {
    const Self = @This();
    root: *TrieNode,

    pub fn charToIndex(char: u8) usize {
        return char - 'a';
    }

    pub fn insert(self: *Self, word: []const u8) void {
        var node = self.root;
        for (word) |char| {
            if (node.children[charToIndex(char)] == null) {
                var new_node = TrieNode{ .children = [_]?*TrieNode{null} ** 26 };
                node.children[charToIndex(char)] = &new_node;
            }
            node = node.children[charToIndex(char)];
        }
        node.is_word = true;
    }
};

So if you can spot the issue, it’s that the new_node will get popped off the stack after the function is run, meaning that if you try to search the Trie for any inserted word it would always return false. How would one go about implementing this struct that anytime it is called, it actually allocates memory if the word doesn’t exist? Is there a better way to do this? If you use the proposed method of having the function allocate memory every time it’s run, how do you deallocate it?

1 Like

There are two ways:

  1. allocator as function parameter
pub fn insert(self: *Self, allocator: std.mem.Allocator, word: []const u8) !void {
    ...
    var new_node = try allocator.create(TrieNode);
    ...
}
  1. allocator as member
const Trie = struct {
    allocator: std.mem.Allocator,
    ...
}

pub fn insert(self: *Self, word: []const u8) !void {
    ...
    var new_node = try self.allocator.create(TrieNode);
    ...
}

To get the allocator:

var gpa = std.heap.GeneralPurposeAllocator(.{}){};
const allocator = gpa.allocator();

For each allocator.create you need an allocator.destroy to free memory.

A great allocator tutorial is: Learning Zig - Heap Memory & Allocators
Also note: Allocation is not Initialization

2 Likes
    pub fn insert(self: *Self, word: []const u8) void {
        var node = self.root;
        for (word) |char| {
            if (node.children[charToIndex(char)] == null) {
                var new_node = TrieNode{ .children = [_]?*TrieNode{null} ** 26 };
                node.children[charToIndex(char)] = &new_node;
            }
            node = node.children[charToIndex(char)];
        }
        node.is_word = true;
    }

It’s also worth mentioning that var new_node is temporary memory. @DimDin’s solution will move it out of temporary memory.

2 Likes

Given the recursive nature of this data structure, freeing memory can be tricky. Aside from using an std.heap.ArenaAllocator to just free everything all at onece when you call it’s deinit method, you can also add a deinit method to the TrieNode struct:

const TrieNode = struct {
    children: [26]?*TrieNode,
    is_word: bool = false,

    fn deinit(self: *TrieNode, allocator: Allocator) void {
        for (&self.children) |maybe_node| {
            if (maybe_node) |node| node.deinit(allocator);
        }
        allocator.destroy(self);
    }
};

then add a deinit to Trie:

fn deinit(self: *Trie, allocator: Allocator) void {
    self.root.deinit(allocator);
}

And let recursion do its thing. :^)

2 Likes

I tried this but got a Segmentation fault.

const Trie = struct {
    const Self = @This();
    root: *TrieNode,

    pub fn charToIndex(char: u8) usize {
        return char - 'a';
    }

    pub fn insert(self: *Self, word: []const u8, allocator: std.mem.Allocator) !void {
        var node = self.root;
        for (word) |char| {
            if (node.children[charToIndex(char)] == null) {
                var new_node = try allocator.create(TrieNode);
                node.children[charToIndex(char)] = new_node;
            }
            node = node.children[charToIndex(char)].?;
        }
        node.is_word = true;
    }
};


When I test this in the main function I get the error:

Segmentation fault at address 0x0
/home/clinton/zig-linux-x86_64-0.12.0-dev.80+014d88ef6/lib/compiler_rt/memcpy.zig:19:21: 0x2a1620 in memcpy (compiler_rt)
            d[0] = s[0];
                    ^
/home/clinton/Developer/Zig_projects/wordle/src/trie.zig:22:21: 0x220c1d in insert (trie)
            if (node.children[charToIndex(char)] == null) {
                    ^
/home/clinton/Developer/Zig_projects/wordle/src/trie.zig:34:20: 0x2210b9 in main (trie)
    try trie.insert("hello", gpa_allocator);
                   ^
/home/clinton/zig-linux-x86_64-0.12.0-dev.80+014d88ef6/lib/std/start.zig:370:37: 0x220a8e in posixCallMainAndExit (trie)
            var i: usize = 0;
                                    ^
/home/clinton/zig-linux-x86_64-0.12.0-dev.80+014d88ef6/lib/std/start.zig:243:5: 0x220571 in _start (trie)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x0 in ??? (???)
Aborted: oops, process 'zig' core dumped

[Process exited 0]

From that code it seems that you never allocate/initialize the root node.

1 Like

You were right, but for some reason I’m still getting the Seg fault error. I changed the code to:

const TrieNode = struct {
    children: [26]?*TrieNode,
    is_word: bool = false,
};

const Trie = struct {
    const Self = @This();
    root: ?*TrieNode = null,

    pub fn charToIndex(char: u8) usize {
        return char - 'a';
    }

    pub fn insert(self: *Self, word: []const u8, allocator: std.mem.Allocator) !void {
        if (self.root == null) {
            self.root = try allocator.create(TrieNode);
        }
        var node = self.root;
        for (word) |char| {
            if (node.?.children[charToIndex(char)] == null) {
                var new_node = try allocator.create(TrieNode);
                node.?.children[charToIndex(char)] = new_node;
            }
            node = node.?.children[charToIndex(char)].?;
        }
        node.?.is_word = true;
    }
};

so I changed the root type to be a nullable pointer and then set it to null on initialization. Then in the insert operation I check if the root is null, if it is then allocate a new TrieNode and set self.root to that value. However I still get the error:

Segmentation fault at address 0x0
/home/clinton/zig-linux-x86_64-0.12.0-dev.80+014d88ef6/lib/compiler_rt/memcpy.zig:19:21: 0x2a17b0 in memcpy (compiler_rt)
            d[0] = s[0];
                    ^
/home/clinton/Developer/Zig_projects/wordle/src/trie.zig:25:23: 0x220d11 in insert (trie)
            if (node.?.children[charToIndex(char)] == null) {
                      ^
/home/clinton/Developer/Zig_projects/wordle/src/trie.zig:37:20: 0x221249 in main (trie)
    try trie.insert("hello", gpa_allocator);
                   ^
/home/clinton/zig-linux-x86_64-0.12.0-dev.80+014d88ef6/lib/std/start.zig:370:37: 0x220a9e in posixCallMainAndExit (trie)
            var i: usize = 0;
                                    ^
/home/clinton/zig-linux-x86_64-0.12.0-dev.80+014d88ef6/lib/std/start.zig:243:5: 0x220581 in _start (trie)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x0 in ??? (???)
Aborted: oops, process 'zig' core dumped

[Process exited 0]

Yeah, that is not a very enlightening error. It is probably the result of accessing undefined memory: when you call allocator.create it does not initialize the TrieNode struct. Even if it did the children member of the struct has no default value.

You need to do something like

const TrieNode = struct {
    children: [26]?*TrieNode = [1]?*TrieNode{null} ** 26,
    is_word: bool = false,
};
// ...
        if (self.root == null) {
            self.root = try allocator.create(TrieNode);
            self.root.?.* = .{};
        }
// ...
// ...
            if (node.?.children[charToIndex(char)] == null) {
                var new_node = try allocator.create(TrieNode);
                new_node.* = .{};
                node.?.children[charToIndex(char)] = new_node;
            }
 // ...

There is probably better coding style for this, but it seems to fix the segfault.

3 Likes

Thank you for the help and explanation.

This is exactly what’s explained in Allocation is not Initialization .

4 Likes

Sorry for replying to solved topic, I just want to share some my experience with tries.

  • although prefix trees are more complex data structures compared to hash tables, I prefer the former, just because there are no conflicts, each key value leads to unique place.

  • there is a (not so obvious) pitfall in using tries - when you add a key “abcdef” (with some data attached) (to initially empty trie, for example), you automatically add all the keys along the way, i.e. “a”, “ab”, “abc”, “abcd” and “abcde” (but only the final point has some data after adding). And then you naively search for, say, “abc” key, successful, but forgot to check that this node really has some data and ooops, your program crashes, to your surpise.

I plumped into this a couple of times, that was good lesson… so after you found a key, do not forget to check if the node is empty, here is an short excerpt from some of my programs (in C):

        dpa = pta->get(ptkey, xwi->recorders);
        if (NULL == dpa) {
                log_dbg_msg("%s() - node '%s' not found, will add\n", __func__, ptkey);
                dpa = pta->add(ptkey, xwi->recorders);
        }
        rec = *dpa;
        if (NULL == rec) { // <<<<<<<<<<<<<<<<<<<<<<<<
                log_dbg_msg("%s() - node '%s' is empty, will create recorder\n", __func__, ptkey);
1 Like

Thanks for sharing your experience. My use case here is much different though. The trie I’m building is used for a wordle game. So there is always going to be a maximum length for the string which is 5. I will not be looking for prefixes or substrings. I simply just load a wordle dictionary with thousands of 5 letter words into the trie and when the user is guess simply search the Trie for the word and if the word is not present then let the user know that the input is not a word. What I’m building is far too simple to run into the issue you brought up, but it’s still nice to know your experience for future reference.

Wouldn’t a hash map work just as well (if not better), then?

1 Like

It probably will. A trie was just the first thing I thought of, and I was also curious how that particular structure could be implemented in ZIg.

1 Like