Fast memory and GeneralPurposeAllocator

I think with MultiArrayList you would get most of the benefit without manual allocation or the more complex things I am doing with StencilVectors, something like this:

const Data = struct {
    first_child: u32,
    data: u8,
    count: u8,
};

const Gaddag = struct {
    const NodeData = std.MultiArrayList(Data);
    const NodeIds = std.ArrayListUnmanaged(u32); // id -> index

    nodes: NodeData,
    ids: NodeIds,

    pub fn init() Gaddag { // or use a named empty value for usage with 0.14 decl literals
        return .{
            .nodes = .{},
            .ids = .{},
        };
    }
};

Then create some function to resize a node given its id to a new size.
The old node elements get added to a freelist, the new nodes are gotten from a free list or appended to the MultiArrayList, the id is updated to point to the new index.

The part that is better about StencilVectors is that they allow you to index into the StencilVector directly by using the bitpattern, so there is no need for scanning through the child element to find the correct child, so if you have a whole lot of lookups that might increase the performance quite a bit. (But I donā€™t really know why I would care for scrabble (are you writing a simulator again that runs millions of iterations?))

The idea of the stencilvectors I still have to grasp but the ā€˜freelistā€™ idea I get, I believe.
BTW: For the dutch language I have 353.688 words which leads to a total of 10.881.796 nodes in our gad-dag. For the english list I have nr of nodes is 6.592.251 for 267.751 words.

So first letā€™s check if I understand the freelist ideaā€¦
Nodes are stored simply in a list.
When the child nodes have to grow

  1. A ā€˜re-usable holeā€™ arises inside our buffer. We keep track of these re-usable holes in some dictionary or hash for later use (location + available length). Probably fill these holes with zeroes for clarity.
  2. If we find a re-usable hole of the correct length we copy the children there and otherwise copy the new-length-children to the end of the buffer.

I am not making some simulator now :slight_smile: I just want the algorithm to be as fast as possible for any scrabbleboard (15x15) for any rack of letters (7) for every language I support.

I donā€™t know what you are referring to, do you mean the children slice?
Yes that one only is 16 bytes for nothing, even if the array is of zero length.

Later on - but not now - I will enhance the whole thing for unicode, any set of characters, any size of board, any size of rack, up until reasonable limits.
This whole thing was born from my rant about ā€œthe unicode messā€. To practise unicode. Very soon I got into performance problems, but as said that is for later.

1 Like

Ok I think I figured it out. Insanely fast. I will update github and send link when ready.

1 Like

git updated. speed is quite ok. loading time is around 400 ms.
scrabble/src/gaddag.zig at main Ā· ericlangedijk/scrabble Ā· GitHub.

@Sve is this about the freelist strategy you described?

Yes the freelists are used like what I meant.

I think I would use self.freelists[index] directly most of the time or maybe sometimes const fl = &self.freelists[index], I donā€™t think the get_freelist function is really useful.

I think you overuse range for loops, I would change the loops that operate on freelists to iterate over the slices, so instead of:

for (0..32) |i|
{
    self.freelists[i].deinit();
}

this:

for (&self.freelists) |*f| f.deinit();

instead of:

for(1..32) |i|
{
    const freelist = self.get_freelist(@intCast(i));
    self.wasted += @truncate(freelist.items.len * i);
    for (freelist.items) |idx|
    {
        @memset(self.nodes.items[idx..idx + i], Node.EMPTYNODE);
    }
}

for (0..32) |i|
{
    var freelist = self.get_freelist(@intCast(i));
    freelist.clearAndFree();
}

I donā€™t think you need to special case the zero freelist, I think it should result in a no-op?:

for(&self.freelists, 0..) |*f, i| {
    self.wasted += @truncate(f.items.len * i);
    for (f.items) |idx| @memset(self.nodes.items[idx..][0..i], Node.EMPTYNODE);
    f.clearAndFree();
}

I also used re-slicing instead of the [idx..idx + i] arithmetic, I just like it better, I think it is mostly the same to the compiler.

You are a genius. And right: the code is much shorter.
Reslicing? Where? (ah you mean for(&self.freelists, 0..)
(btw I read your post with these iterators. interesting)
And: Thank you very much! You put me on the path I needed!!

With reslicing I mean using self.nodes.items[idx..][0..i] instead of self.nodes.items[idx..idx + i] , I like that with the former you donā€™t have to mention idx twice.

Thank you for the interesting conversations!

1 Like

And I reduced memory use from 800 MB downto 75.

2 Likes