Optimizing prefix search

Hello,

The problem

I’m working on providing a dynamic completion engine so that I can press tab to complete some command line utilities.

In short this engine listens to a bunch of processes and parses json and stores the parsed json keys in a trie

So my engine will see some json {"foo": { "bar": 1}, "baz":0}
And collect strings:

  • /foo/bar
  • /baz

I’m trying to do this at scale.

So I’ve been toying with different implementations to see what performs best.

I’d like to try out some profiling.

Profilers.

  1. coz looks cool, but I just spent like 2 hrs trying to get it to work with zig to no affect. Has anyone ever done that before.
  2. poop looks like a better top, but I couldn’t get it to run.
  3. Tracy seems like people use it, but I haven’t looked into it.
  4. Are you aware of any other, preferably graphical, profilers that are zig viable.

It’s fairly obvious to me that the bottleneck is parsing the json, but it would be cool to look at a profiler in general. My 2 wants would be,

  1. ease of use
  2. graphical

Thanks,

On which OS are you working?

For linux I find hotspot really easy to use, but it is based on perf which I think only runs on linux. I don’t know, maybe it could be used through WSL (Windows Subsystem for Linux), but I have no experience with that.

2 Likes

linux. Rocky9

I’ll check that out.

Valgrind with Callgrind + KCachegrind works, but if symbol names aren’t working, see Debug symbols not recognized by Valgrind if `.data` section is omitted · Issue #15254 · ziglang/zig · GitHub

I found the perf top utility to be close enough to what I need.

I’ve discovered I’m using a lot of cpu% making calls to hashmap functions. It’s making me wonder if a linked list/tree would perform better since there are so few entries.

I’ve seen people just use a straight up array of bools to indicate if there is a char at the equivalent of that index. Maybe that will be much faster?

I don’t see a hash map mentioned in the OP. Where is the hash map being used?

hashap is a basic linear probe and has poor hash functions for short keys. everybody i know that has had to use it in prod has had to rewrite it.

first try to just a short very inexpensive hash functions. the ones in std are waaaay too expensive and are basically for hashing things like web pages, not a 10 character word. There is one in my github that for short strings in zig-string if you need inspiration, but you can find equally simples ones online.

if you need more than that, i have a couple hashtables with better collision resolution (a bucketed cuckoo and a hopscotch) that you can see/use/copy if you need something better.

hashmap slowness is kind of known issue for those that have had to use it. I did the billion row challenge, and it was well over 90% of my time.

but if you are trying to do a prefix search you probably want to use something like a prefix tree (aka trie) or a vEB tree.

I have this trie where the Nodes look like

        pub const Node = struct {
            value: T, // will be a u8
            parent: ?*Node, // here but likely won't ever be used
            // investigate linked list
            // TODO: Apparently I'm spending a lot of cpu on these hashmaps
            // perhaps consider something else?
            siblings: std.AutoArrayHashMap(T, *Node), // hashmaps may be pointless waist of memory if hypothetical linked list is sufficently small.
            childrens: std.AutoArrayHashMap(T, *Node), // hashmaps may be pointless waist of memory.
            allocater: std.mem.Allocator, // make this an arena
            string_end: bool, // just because this is a string end doesn't mean it's finished
        }

Then when I’m moving threw the Trie I refer to these maps when needed.
tbf it does work. I’m just trying to squeeze performance for fun. My cpu idle time is still high even under unrealistic loads.

I built a Trie, I just use a hashmap inside the nodes.
I’ll investigate not doing that.

If you are using u8 as the character type, then I believe you could do something as simple as:

pub const Node = struct {
    /// One slot per possible byte value
    children: [256]?*Node = [_]?*Node{null} ** 256,
    is_terminal: bool = false,
};

then you just index into the array using the u8 value (node.children[byte]), and therefore you don’t even have to store the byte value itself anywhere.

I’m sure others will have better suggestions, though.

there’s also a type of tree (can’t remember the name - i want to say ternary, but i dont think that’s right) that is good for this:

each node has 3 children thake the letter you’re current looking at look at the current tree node: if it is less than go to left child, greater to to right child just like a binary search tree, until you find the node that has the letter from the string, then go the next letter in the string and down the middle child.

it basically mimics a forest of trees where the left and right children form a binary search tree on the current letter and the middle child is a BST for the next letter. You can get a close appoximation by taking a regular BST and making its data item be another tree, but there are optimization that cn be done to make a specialized one work better. Some of the trie optimization like path and level compression can work on it too and tend to be easier to implement.

1 Like

I think Ternary Search Trie is what you’re describing.

2 Likes

Yea I’ll do this.

My implementation went

linked list → replace linked list with hashmap → replace hashmap with an array.

fwiw this was out of the dome lmao. The array is probably as simple as it can get.

I still think the veb tree is the best, but a a 16 way trie might help with the excessively amount of nulls.

nothing says he needs to operate on a byte boundary. two 16 way levels instead.

The array method will waste a lot of memory though. You could instead store just the slots you need, and embed a bitstring with 256 bits, and mark which slots you are using in the bitstring in the sparse address space. You would use Phil Bagwell’s [Hash] Array-Mapped Trie trick to map from the bitstring to the compact storage: slots[bitstring & ((1 << index) -% 1)]

1 Like

If you’re interested, I created a data structure for scored prefix completion, paper available here:

Graphical explanation:

https://validark.github.io/DynSDT/demo/

Implementations in Zig and a few other languages:

Didn’t 100% polish the Zig version because I didn’t totally commit to what use case to optimize for and it was my first Zig project. I’ll probably revisit the project once we get this issue solved: introduce labeled continue syntax inside a switch expression · Issue #8220 · ziglang/zig · GitHub

7 Likes

Cool demo. I may look into the paper though this seems very complex for my use case.

Ternary Trie

I implemented the ternary trie today, which gave me a pretty large memory gain. I went from having my “hell” scenario reporting via top %mem 1.2 to 0.4.

Unfortunately, I think I care a bit more about CPU and I’m getting very similar performance.

In the hell scenario:

  • I’m spending about 5% cpu parsing the incoming json (std lib)
    • I’ve looked into simdjzon, but my vm doesn’t support avx. So I think I’m locked in to std lib for now.
  • and I’m spending 5-7% cpu maintaining my trie as all the data comes in.
    • I’ll provide my trie, but perhaps I’m just running into performance max without looking for less casual performance gains.
      • I may be looking into going multithreaded and rotating the work such that I’m Inserting the prior parsed message while parsing the current message.

I’m trying to think of other ideas, but I think that I’m stumped for now.

Here is my trie. Do me a favor and don’t point out spelling mistakes. Everything else is fair game.

const std = @import("std");
const types = @import("types.zig");

pub const RESULT = enum { FOUND, COULD_NOT_BE_FOUND, ADDED, FAILURE };

/// This is a trie data structure.
/// Essentially this is a tree of chars wherein each node slowly builds a string over time.
/// Its a "prefix tree"
pub fn Trie(comptime T: type) type {
    return struct {
        const Self = @This();
        root: *Node,
        // intended that you pass an arena allocater from a higher scope
        allocater: std.mem.Allocator,

        pub const Node = struct {
            value: T, // will be a u8
            left: ?*Node, // subtree where value is < this nodes value
            equal: ?*Node, // subtree where value == this nodes value
            right: ?*Node, // subtree where value is > this nodes value
            allocater: std.mem.Allocator, // make this an arena
            string_end: bool, // just because this is a string end doesn't mean it's finished

            // Note that there is no deinit. It is assumed that the user will
            // deinit by some mass deletion. Reminder that the intent here is for
            // the user to provide an arena as the allocator.
            fn init(self: *Node, value: T, allocater: std.mem.Allocator) void {
                self.value = value;
                self.left = null;
                self.equal = null;
                self.right = null;
                self.allocater = allocater;
                self.string_end = false;
            }

            // given a node search for a string
            fn search(self: *Node, uri: []const u8) RESULT {
                if (uri[0] < self.value) {
                    if (self.left) |left| {
                        return left.search(uri);
                    } else {
                        return RESULT.COULD_NOT_BE_FOUND;
                    }
                } else if (uri[0] > self.value) {
                    if (self.right) |right| {
                        return right.search(uri);
                    } else {
                        return RESULT.COULD_NOT_BE_FOUND;
                    }
                } else {
                    if (uri.len == 1) {
                        return RESULT.FOUND;
                    } else {
                        if (self.equal) |equal| {
                            return equal.search(uri[1..]);
                        } else {
                            return RESULT.COULD_NOT_BE_FOUND;
                        }
                    }
                }
            }

            // given a node search for a string and return the last node searched
            // if found
            fn search_for_node(self: *Node, uri: []const u8) ?*Node {
                if (uri[0] < self.value) {
                    if (self.left) |left| {
                        return left.search_for_node(uri);
                    } else {
                        return null;
                    }
                } else if (uri[0] > self.value) {
                    if (self.right) |right| {
                        return right.search_for_node(uri);
                    } else {
                        return null;
                    }
                } else {
                    if (uri.len == 1) {
                        return self;
                    } else {
                        if (self.equal) |equal| {
                            return equal.search_for_node(uri[1..]);
                        } else {
                            return null;
                        }
                    }
                }
            }

            // performs a search
            // HOWEVER, if string is not found will add it to the trie
            fn add(self: *Node, uri: []const u8, allocater: std.mem.Allocator) !RESULT {
                if (uri[0] < self.value) {
                    if (self.left) |left| {
                        return try left.add(uri, self.allocater);
                    } else {
                        var new_node = try allocater.create(Node);
                        new_node.init(uri[0], allocater);
                        self.left = new_node;
                        if (uri[1] == '/') {
                            new_node.string_end = true;
                        }
                        return new_node.add(uri, allocater);
                    }
                } else if (uri[0] > self.value) {
                    if (self.right) |right| {
                        return try right.add(uri, self.allocater);
                    } else {
                        var new_node = try allocater.create(Node);
                        new_node.init(uri[0], allocater);
                        self.right = new_node;
                        if (uri.len > 1) {
                            if (uri[1] == '/') {
                                new_node.string_end = true;
                            }
                        }
                        return new_node.add(uri, allocater);
                    }
                } else {
                    if (uri.len == 1) {
                        self.string_end = true;
                        return RESULT.ADDED;
                    } else {
                        if (self.equal) |equal| {
                            return equal.add(uri[1..], allocater);
                        } else {
                            var new_node = try allocater.create(Node);
                            new_node.init(uri[1], allocater);
                            self.equal = new_node;
                            if (uri[1] == '/') {
                                new_node.string_end = true;
                            }
                            return new_node.add(uri[1..], allocater);
                        }
                    }
                }
            }

            fn print_recursive(self: ?*Node, data: *std.ArrayList(u8)) !void {
                if (self) |ok| {
                    try print_recursive(ok.left, data);

                    try data.append(ok.value);
                    if (ok.string_end) {
                        std.debug.print("\t{s}\n", .{data.items});
                    }

                    try print_recursive(ok.equal, data);
                    _ = data.pop();
                    try print_recursive(ok.right, data);
                }
            }

            fn gather_recursive(self: ?*Node, data: *std.ArrayList(u8), matches: *std.BufSet, sort_method: types.SORT_METHOD) !void {
                if (self) |ok| {
                    try gather_recursive(ok.left, data, matches, sort_method);

                    try data.append(ok.value);
                    if (ok.string_end) {
                        try matches.insert(data.items);
                    }

                    if ((ok.string_end and sort_method == types.SORT_METHOD.Full) or !ok.string_end) {
                        try gather_recursive(ok.equal, data, matches, sort_method);
                    }
                    _ = data.pop();
                    try gather_recursive(ok.right, data, matches, sort_method);
                }
            }
        };

        pub fn init(value: T, allocater: std.mem.Allocator) !Self {
            var root_node = try allocater.create(Node);
            root_node.init(value, allocater);
            return Self{
                .allocater = allocater,
                .root = root_node,
            };
        }

        pub fn search(self: *Self, uri: []const u8, add_if_not_found: bool) !RESULT {
            var search_result: RESULT = undefined;
            // std.debug.print("\t{s}.\n", .{@tagName(search_result)});
            if (add_if_not_found) {
                // std.debug.print("attempting to add {s}\n", .{uri});
                search_result = try self.root.add(uri, self.allocater);
            } else {
                search_result = self.root.search(uri);
            }
            return search_result;
        }

        pub fn search_for_node(self: *Self, uri: []const u8) ?*Node {
            return self.root.search_for_node(uri);
        }

        pub fn print(self: Self) !void {
            var data = std.ArrayList(u8).init(self.allocater);
            std.debug.print("Trie:\n", .{});
            try self.root.print_recursive(&data);
        }

        pub fn gather_completions(self: Self, uri: []const u8, matches: *std.BufSet, sort_method: types.SORT_METHOD) !void {
            std.debug.print("attempting to gather {s}\n", .{uri});
            var data = std.ArrayList(u8).init(self.allocater);
            const last_searched_node = self.root.search_for_node(uri);
            if (last_searched_node) |node| {
                if (node.equal) |equal| {
                    try data.appendSlice(uri);
                    try equal.gather_recursive(&data, matches, sort_method);
                }
            } else {
                std.debug.print("search for node failed\n", .{});
            }
        }
    };
}

general remarks

I think using heavily nested if-else makes this code look more complicated than it is, by using early exit instead of if-else chains and using if-expressions we can get something more succinct that also looks less intimidating.

Another thing you need to apply is creating and using helper functions, to avoid repeating the same code, via multiple copy-paste. Not saying that everything deserves a helper function, but sometimes it is really useful.

I tried to create a very in-depth step-by-step explanation.

sidenote

I haven’t really run the code, so I could have made a small mistake somewhere, but I didn’t really want to spend the additional time to come up with my own example code and replace the external import of types.

I think it is better to either include these things in the future or stub them out in some way (just makes it more likely, that someone engages with it, when they don’t have to invest additional time to create a good example use of the data structure).

detailed explanation

From this:

fn search(self: *Node, uri: []const u8) RESULT {
    if (uri[0] < self.value) {
        if (self.left) |left| {
            return left.search(uri);
        } else {
            return RESULT.COULD_NOT_BE_FOUND;
        }
    } else if (uri[0] > self.value) {
        if (self.right) |right| {
            return right.search(uri);
        } else {
            return RESULT.COULD_NOT_BE_FOUND;
        }
    } else {
        if (uri.len == 1) {
            return RESULT.FOUND;
        } else {
            if (self.equal) |equal| {
                return equal.search(uri[1..]);
            } else {
                return RESULT.COULD_NOT_BE_FOUND;
            }
        }
    }
}

To this, by applying early-exit and if-expressions:

fn search(self: *Node, uri: []const u8) RESULT {
    const success = RESULT.FOUND;
    const fail = RESULT.COULD_NOT_BE_FOUND;
    if (uri[0] < self.value) return if (self.left) |left| left.search(uri) else fail;
    if (uri[0] > self.value) return if (self.right) |right| right.search(uri) else fail;
    if (uri.len == 1) return success;
    return if (self.equal) |equal| equal.search(uri[1..]) else fail;
}

I think this is much easier to read because with every line the function either exits and is done or it continues to the next line.

Additionally because I pulled out success and fail into named constants it is also easy to see that search_for_node is essentially the same function (minus small type differences).

fn search_for_node(self: *Node, uri: []const u8) ?*Node {
    const success = self;
    const fail = null;
    if (uri[0] < self.value) return if (self.left) |left| left.search_for_node(uri) else fail;
    if (uri[0] > self.value) return if (self.right) |right| right.search_for_node(uri) else fail;
    if (uri.len == 1) return success;
    return if (self.equal) |equal| equal.search_for_node(uri[1..]) else fail;
}

Now I was wondering what is the real value of having both search_for_node and search?
search_for_node already gives you found / not-found via its optional, so if we want to keep the api we also can do this:

fn search(self: *Node, uri: []const u8) RESULT {
    if (self.search_for_node(uri)) |_| RESULT.FOUND else RESULT.COULD_NOT_BE_FOUND;
}

fn search_for_node(self: *Node, uri: []const u8) ?*Node {
    const success = self;
    const fail = null;
    if (uri[0] < self.value) return if (self.left) |left| left.search_for_node(uri) else fail;
    if (uri[0] > self.value) return if (self.right) |right| right.search_for_node(uri) else fail;
    if (uri.len == 1) return success;
    return if (self.equal) |equal| equal.search_for_node(uri[1..]) else fail;
}

At this point I probably would remove the names again (which I only introduced to see whether the code is basically the same).

We can do similar things with the add function.
First lets pull out the repeating code of creating a new node:

fn createNode(uri: []const u8, allocater: std.mem.Allocator) *Node {
    var new_node = try allocater.create(Node);
    new_node.init(uri[0], allocater);
    if (uri.len > 1) {
        if (uri[1] == '/') {
            new_node.string_end = true;
        }
    }
    return new_node;
}

I don’t understand why only one of the branches contains the if (uri.len > 1) { condition, I think the reasoning behind this may be that in the other branches it always happens to be true, but I don’t think such a small condition is a good reason to duplicate the code in 3 branches. And I doubt it results in a big optimization.

Now lets remove the nested ifs (if’s that only set booleans can easily be removed):

fn createNode(uri: []const u8, allocater: std.mem.Allocator) !*Node {
    var new_node = try allocater.create(Node);
    new_node.init(uri[0], allocater);
    if (uri.len > 1 and uri[1] == '/') {
        new_node.string_end = true;
    }
    return new_node;
}
fn createNode(uri: []const u8, allocater: std.mem.Allocator) !*Node {
    var new_node = try allocater.create(Node);
    new_node.init(uri[0], allocater);
    new_node.string_end = uri.len > 1 and uri[1] == '/';
    return new_node;
}

Now we can rewrite the add function:

fn add(self: *Node, uri: []const u8, allocater: std.mem.Allocator) !RESULT {
    if (uri[0] < self.value) {
        if (self.left) |left| {
            return try left.add(uri, self.allocater);
        } else {
            self.left = try createNode(uri, allocater);
            return self.left.add(uri, allocater);
        }
    } else if (uri[0] > self.value) {
        if (self.right) |right| {
            return try right.add(uri, self.allocater);
        } else {
            self.right = try createNode(uri, allocater);
            return self.right.add(uri, allocater);
        }
    } else {
        if (uri.len == 1) {
            self.string_end = true;
            return RESULT.ADDED;
        } else {
            if (self.equal) |equal| {
                return equal.add(uri[1..], allocater);
            } else {
                var new_node = try allocater.create(Node);
                new_node.init(uri[1], allocater);
                if (uri[1] == '/') {
                    new_node.string_end = true;
                }
                self.equal = try createNode(); // NOTE this is where I noticed that our createNode function isn't quite right
                return new_node.add(uri[1..], allocater);
            }
        }
    }
}

The note shows the point where I noticed that init gets called with uri[1] instead of with uri[0] at this point, so we need to make a small change to the createNode function:

fn createNode(uri: []const u8, val:u8, allocater: std.mem.Allocator) !*Node {
    var new_node = try allocater.create(Node);
    new_node.init(val, allocater);
    new_node.string_end = uri.len > 1 and uri[1] == '/';
    return new_node;
}

Now back to add:

fn add(self: *Node, uri: []const u8, allocater: std.mem.Allocator) !RESULT {
    if (uri[0] < self.value) {
        if (self.left) |left| {
            return try left.add(uri, self.allocater);
        } else {
            self.left = try createNode(uri, uri[0], allocater);
            return self.left.add(uri, allocater);
        }
    } else if (uri[0] > self.value) {
        if (self.right) |right| {
            return try right.add(uri, self.allocater);
        } else {
            self.right = try createNode(uri, uri[0], allocater);
            return self.right.add(uri, allocater);
        }
    } else {
        if (uri.len == 1) {
            self.string_end = true;
            return RESULT.ADDED;
        } else {
            if (self.equal) |equal| {
                return equal.add(uri[1..], allocater);
            } else {
                self.equal = try createNode(uri, uri[1], allocater);
                return self.equal.add(uri[1..], allocater);
            }
        }
    }
}

If you stare at it a bit you realize that there is still a lot of repetition here.
Lets take this for example:

if (self.left) |left| {
    return try left.add(uri, self.allocater);
} else {
    self.left = try createNode(uri, uri[0], allocater);
    return self.left.add(uri, allocater);
}

The first try is redundant with the return:

if (self.left) |left| {
    return left.add(uri, self.allocater);
} else {
    self.left = try createNode(uri, uri[0], allocater);
    return self.left.add(uri, allocater);
}

Now if we always had self.left we could pull out the return self.left.add(uri, allocater); from the branches, so a different way to look at it is:

if (self.left) |_| {
} else {
    self.left = try createNode(uri, uri[0], allocater);
}
return self.left.add(uri, allocater);

Or:

if (self.left == null) self.left = try createNode(uri, uri[0], allocater);
return self.left.add(uri, allocater);

Again we can pull this out into a helper function:

fn ensureNode(node:*?*Node, uri:[]const u8, offset:u8, allocater:std.mem.Allocator) !RESULT {
    if (node.* == null) node.* = try createNode(uri, uri[offset], allocater);
    return node.*.?.add(uri[offset..], allocater);
}

And rewrite add:

fn add(self: *Node, uri: []const u8, allocater: std.mem.Allocator) !RESULT {
    if (uri[0] < self.value) {
        return ensureNode(&self.left, uri, 0, allocater);
    } else if (uri[0] > self.value) {
        return ensureNode(&self.right, uri, 0, allocater);
    } else {
        if (uri.len == 1) {
            self.string_end = true;
            return RESULT.ADDED;
        } else {
            return ensureNode(&self.equal, uri, 1, allocater);
        }
    }
}

Use early exit:

fn add(self: *Node, uri: []const u8, allocater: std.mem.Allocator) !RESULT {
    if (uri[0] < self.value) return ensureNode(&self.left, uri, 0, allocater);
    if (uri[0] > self.value) return ensureNode(&self.right, uri, 0, allocater);
    if (uri.len == 1) {
        self.string_end = true;
        return RESULT.ADDED;
    }
    return ensureNode(&self.equal, uri, 1, allocater);
}

Result:

fn search(self: *Node, uri: []const u8) RESULT {
    if (self.search_for_node(uri)) |_| RESULT.FOUND else RESULT.COULD_NOT_BE_FOUND;
}
fn search_for_node(self: *Node, uri: []const u8) ?*Node {
    if (uri[0] < self.value) return if (self.left) |left| left.search_for_node(uri) else null;
    if (uri[0] > self.value) return if (self.right) |right| right.search_for_node(uri) else null;
    if (uri.len == 1) return self;
    return if (self.equal) |equal| equal.search_for_node(uri[1..]) else null;
}

fn createNode(uri: []const u8, val: u8, allocater: std.mem.Allocator) !*Node {
    var new_node = try allocater.create(Node);
    new_node.init(val, allocater);
    new_node.string_end = uri.len > 1 and uri[1] == '/';
    return new_node;
}
fn ensureNode(node: *?*Node, uri: []const u8, offset: u8, allocater: std.mem.Allocator) !RESULT {
    if (node.* == null) node.* = try createNode(uri, uri[offset], allocater);
    return node.*.?.add(uri[offset..], allocater);
}

// performs a search
// HOWEVER, if string is not found will add it to the trie
fn add(self: *Node, uri: []const u8, allocater: std.mem.Allocator) !RESULT {
    if (uri[0] < self.value) return ensureNode(&self.left, uri, 0, allocater);
    if (uri[0] > self.value) return ensureNode(&self.right, uri, 0, allocater);
    if (uri.len == 1) {
        self.string_end = true;
        return RESULT.ADDED;
    }
    return ensureNode(&self.equal, uri, 1, allocater);
}

Unless I made a mistake somewhere, I haven’t changed anything about what the functions return.


One thing about the add function, it seems it can only either fail with a Zig error, or return RESULT.ADDED (even if the result already existed), thus the return value seems unneeded and I would probably change the function to just return !void.

Alternatively you could keep track of whether something actually was added and otherwise return RESULT.FOUND but that would be a bit more work.

I would probably do that by splitting the function into add and addHelper, add would declare a var added = false; and pass a reference &added as argument to addHelper which passes it to ensureNode to set it to true if it creates a new node, ensureNode would then call addHelper instead of add.

2 Likes

Thanks for the effort post. I didn’t anticipate that level of engagement appreciate it.

2 Likes