I’ve been working through Crafting Interpreters but in Zig. This is my work so far: https://github.com/Southporter/zlox
I only have a the last 3 chapters left, and it has been a blast. I’ve had to make a few changes to match Zig idioms coming from the C: explicit allocators, utilizing slices, etc.
I used the GC chapter as a chance to explore the Allocator interface. I set up my GC to return an allocator that the different objects use to track how much is allocated.
Comptime instead of macros was a great change too. It was nice to just write normal zig to do things like turn an Object into the appropriate struct.
Anyway, thought I would share what i’ve been working on.
I got through the rest of the chapters, so this is now “done” in terms of following the book.
I was a little surprised by my results after doing the “Optimizations” chapter: it looks like NaN tagging doesn’t really add a speed up for Zig. It’s possible I’m doing it wrong. I’ll need to add instrumentation to get some profiling.
Here are some of my benchmarks:
zlox-original is -Doptimize=ReleaseFast based off master commit 957bfa61b4c4f76f9e57564e1105334bf480f527 (i.e. End of Chapter 29)
zig-out/bin/zlox is -Doptimize=ReleaseFast -DnanTagging=true based off the chapter-30 branch.
Benchmark 1: ./zlox-original benchmarks/zoo_sum.lox
Time (mean ± σ): 11.469 s ± 0.315 s [User: 11.383 s, System: 0.004 s]
Range (min … max): 11.102 s … 12.198 s 10 runs
Benchmark 2: zig-out/bin/zlox benchmarks/zoo_sum.lox
Time (mean ± σ): 11.226 s ± 0.274 s [User: 11.138 s, System: 0.003 s]
Range (min … max): 10.927 s … 11.796 s 10 runs
Summary
zig-out/bin/zlox benchmarks/zoo_sum.lox ran
1.02 ± 0.04 times faster than ./zlox-original benchmarks/zoo_sum.lox
Do your benchmarks actually contain setups that generate NaN values?
Else I wouldn’t find it surprising if they aren’t affected by it.
I took a short look and noticed that your non-nan code uses a switch statement, while your nan code uses a series of predicates in if-else chains, the former seems like a more direct expression of what the possible branches are, with the latter the compiler has to recover the information that all of those branches are mutually exclusive and could be turned into something switch like.
But measuring and looking at the compiled output would probably the way to go.
One guess from me would be that using a bit mask to select the tag and then switching on that might be better, but that is just something I would try.
Using packed structs also might make sense, but I haven’t looked at the code in-depth enough to know whether that would fit.