I have this recursive move generator function. The max depth is between 40 and 45.
fn gen
(
self: *MoveGenerator, // mutable self
board: *const Board, // immutable board
square: Square, // 1 byte -> new square will be passed on
node: Node, // 10 bytes -> new node will be passed on
rack: Rack, // 8 bytes -> new rack will be passed on
move: Move, // 24 bytes -> new move will be passed on
path: Path, // 32 bytes -> new path will be passed on
comptime inputflags: GenFlags, // u2
comptime ori: Orientation, // 2 distinct values
comptime dir: Direction // 2 distinct values
) void
and was wondering what would be the best strategy.
keep as is
put the mutable vars in a struct
put the mutable vars in a struct and maintain my own list (max 42 deep) on the stack. and pass a depth int + pointer.
The thing is: not always will the âmutablesâ to pass on be changed, but just passed on unchanged.
I have the suspicion that the compiler will outsmart me and tend to choose option 1.
Solutiions 1. and 2. should be basically the same, maybe with 2. the data might be a bit smaller, because zig is allowed to reorder struct members, but maybe it does the same thing for function arguments too?
3. seems worse to me. It needs more memory on the stack, since you end up passing an additional 16 bytes to the function. And you are making it harder to read, both for yourself and the compiler.
If possible, the best strategy for recursion is generally to not use recursion at all. Recursion has inherent overheads, since it needs to duplicate constant function arguments for every call.
But it is rather difficult to go from recursive to iterative functions, since you basically end up simulating your own stack and return addresses. Furthermore the gains get smaller the more expensive your function body is. So in practice it is often just not worth the effort. Judging by your function signature it likely falls into that category.
I think you should spend your energy instead on optimizing the function body itself, instead of focusing on the recursive function call here.
This seems overly proscriptive to me. Itâs true that you donât want to borrow a page from functional programming and write âiteration is recursionâ code in Zig.
But compilers can be pretty smart about recursion, small constants (some value of âsmallâ) can be kept in registers, larger struct parameters can be passed by *const and I see that as best practice now given the uncertain status of Parameter Reference Optimization.
The additional headache of recursion is ensuring that recursive calls will be reasonably bounded. Thatâs stricter than making sure that while loops have a termination condition in all cases, because one might have limited control over how deep in the stack a function call is triggered. Itâs a hard maximum of 8MiB of headroom (or whatever the platform number is), the actual amount available can be much smaller.
The closer to the tail of the function that the recursion happens, the more room the compiler has to minimize stack use, since anything which is never referenced after the recursive call can be recycled.
Zig also has tail call elimination, which can be forced with @call and .always_tail, so you can actually do the âiteration is recursionâ pattern if it makes sense (and ensure that you actually get a tail call).
I agree with all of your specific advice to the OP actually, Iâm more speaking to the help topic itself.
I noticed a bit of speedup after making parameters *const for structs > 8.
I think the famous chess engine Stockfish minimizes recursion and keeps its own stack with a abundant pointer festival.
it would be interesting to see what would happen if i rewrite the 3 recursive functions in here
into one monolithic big non-recursive function as an experiment.
Itâs not the first step youâd want to take, no. But it can make a major difference for certain kinds of code.
If you can afford to optimistically mark the opp as processed, then this call can be moved to the tail position, and you can at least try to force the issue with @call and .always_tail.
For LLVM / Zig, whether a call can be a tail call is more complex than it is in Lua or Scheme, so that may not actually work. Specifically it can have problems with mutually-recursive cycles, and fair warning, the error youâll get if .always_tail isnât possible is fairly cryptic, it comes straight from LLVM and has the term musttail in it.
But thatâs the kind of thing you can look for.
I canât claim to understand the code in detail, but my intuition is that a Scrabble board has a finite size, meaning that the recursion here will be bounded, with an upper limit which isnât dynamic by the size of the data. So I wouldnât worry too much about using recursion. That said, recursion does prevent some optimizations, notably inlining, so itâs worth looking for opportunities to reduce it, and possibly getting rid of it entirely.
You definitely want to pass structs by *const in a recursive web like this. If it were C, I would advise doing what you can to make the prefix of function calls identical, and thatâs probably a good idea here as well. The optimizer will still look for opportunities to reorganize things, to minimize the movement of data needed to set up the prelude to a function call, but this is a permutative operation, so itâs very expensive and optimizers give up fairly fast on it. Since recursion prevents inlining, whatever call convention the optimizer comes up with for a function has to work at every call site, it doesnât want to generate several versions of the function which differ only by the placement of data. So any help you can give it improves your odds.
The recursiveness is a blessing I must say. But I will certainly try it as an exercise. And limited to a depth of 30 on a 15x15 board.
I will experiment with the @call.
In theory the algo is very easy: follow the squares using either a boardletter or try-letter, keeping track of the letter-node in the âgaddagâ character-tree.
it still needs 200 lines of code in my case.
Which is 6 or 8 times less than my old delphi code by the way.
Currently it produces around 3.000.000 moves per second, which is quite good I think. Not sure yet if there are much speedups which can be achieved without deep diving into SIMD or higher math.
I think that this scrabble algorithm can be seen as a special case of what is called WaveFunctionCollapse which is basically a set of rules for how things can be connected (in the scrabble case with complex constraints) and then generating new tiles / placing them, it is likely that the special casing is easier to optimize, but the more general algorithm could give additional ideas for techniques that could be explored.
I just succeeded to fix that and even made algorithm simpler. As a bonus I managed to sucessfully survive the unicode hell so my Slovenian works now.
It is very difficult to get good lists of words from the internet BTW. A lot of them are garbage.