As a sequel to this link which got a bit out of hand:
memory and trees
I created a quite well performing words graph, where each letter contains children which can follow that letter. So a letternode can have 32 max subchildren.
pub fn find_child_node(self: *Graph, node: Node, charcode: CharCode) ?Node
{
if (node.count == 0) return null;
const children = self.nodes.items[node.child_ptr..node.child_ptr + node.count];
for(children) |run| if (run.data.code == charcode) return run;
return null;
}
In the past and now again I came to three conclusions, which I think are still valid:
- Sorting the children does not speed up. A binary search is much slower for only 32 items.
- Even one little check inside the loop
if (run.data.code > charcode) return null
slows the search process down. - Adding a 32 bit bitmask for pre-checking an existing child does not speed up either!
Obviously (also in my old delphi code) a simple loop without branching is insanely fast.
BTW: I do not use Node pointers because the Nodes are very small.
One question I have: is the zig compiler able to SIMD these kind of loops?
If not would some crazy SIMD instruction speed things up or not?
Something like:
SIMD.gather_all_charcodes_from_the_addressed_slice(node.count);
and then:
SIMD.is_any_charcode_the_same_as_input_charcode()
and then:
return the found node (which is in the chunk of memory)
Currently building the scabble movegenerator, which is already quite promising.
Calculating 100.000 moves, including calculating the score, in 1/100 of a second.
Another question: is there any performance difference in these parameters *Graph
or *const Graph
?
As a bonus for clarity: this is my Node.
/// 6 byte node. or 10 bytes with mask
pub const Node = extern struct
{
pub const EMPTYNODE: Node = Node.init(0);
/// When cleaning up the graph, we fill the unused nodes (from the remaining freelists) with invalid nodes.
pub const INVALIDNODE: Node = Node.invalidnode();
pub const Data = packed struct
{
/// The 5 bits character code
code: CharCode = 0,
/// Is this node the beginning of a word?
is_bow: bool = false,
/// Is this node the end of a word?
is_eow: bool = false,
/// Is this node a whole word?
is_whole_word: bool = false,
};
/// Our 1 byte node data
data: Data align(1),
/// maximum count is 32
count: u8 align(1),
/// Index into Graph.nodes. Children are always sequential in memory.
child_ptr: u32 align(1),
// The charcode mask of children (not yet decided if we need that, probably not)
//mask: CharCodeMask align(1),
fn init(charcode: CharCode) Node
{
return Node {.data = Data {.code = charcode }, .count = 0, .child_ptr = 0 };
}
fn invalidnode() Node
{
return Node
{
.data = Data {},
.count = 0,
.child_ptr = 0xffffffff,
};
}
};