Zlox - Crafting Interpreters in Zig

I’ve been working through Crafting Interpreters but in Zig. This is my work so far: https://github.com/Southporter/zlox
I only have a the last 3 chapters left, and it has been a blast. I’ve had to make a few changes to match Zig idioms coming from the C: explicit allocators, utilizing slices, etc.

I used the GC chapter as a chance to explore the Allocator interface. I set up my GC to return an allocator that the different objects use to track how much is allocated.

Comptime instead of macros was a great change too. It was nice to just write normal zig to do things like turn an Object into the appropriate struct.
Anyway, thought I would share what i’ve been working on.

10 Likes

Loved the book, but the repo gives 404. I’d really love to see your work…

Oops, looks like I initially made the repo Private. I’ve fixed it, the link should work now.

1 Like

yeah, it works now

Cool! This book is really useful

I got through the rest of the chapters, so this is now “done” in terms of following the book.

I was a little surprised by my results after doing the “Optimizations” chapter: it looks like NaN tagging doesn’t really add a speed up for Zig. It’s possible I’m doing it wrong. I’ll need to add instrumentation to get some profiling.

Here are some of my benchmarks:
zlox-original is -Doptimize=ReleaseFast based off master commit 957bfa61b4c4f76f9e57564e1105334bf480f527 (i.e. End of Chapter 29)
zig-out/bin/zlox is -Doptimize=ReleaseFast -DnanTagging=true based off the chapter-30 branch.

Benchmark 1: ./zlox-original benchmarks/zoo_sum.lox
  Time (mean ± σ):     11.469 s ±  0.315 s    [User: 11.383 s, System: 0.004 s]
  Range (min … max):   11.102 s … 12.198 s    10 runs

Benchmark 2: zig-out/bin/zlox benchmarks/zoo_sum.lox
  Time (mean ± σ):     11.226 s ±  0.274 s    [User: 11.138 s, System: 0.003 s]
  Range (min … max):   10.927 s … 11.796 s    10 runs

Summary
  zig-out/bin/zlox benchmarks/zoo_sum.lox ran
    1.02 ± 0.04 times faster than ./zlox-original benchmarks/zoo_sum.lox

And here is the result from a 13th gen Intel i5

Benchmark 1 (3 runs): ./zlox-original benchmarks/zoo_sum.lox
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          14.0s  ±  159ms    13.8s  … 14.1s           0 ( 0%)        0%
  peak_rss            887KB ± 2.36KB     885KB …  889KB          0 ( 0%)        0%
  cpu_cycles         29.7G  ±  118M     29.6G  … 29.8G           0 ( 0%)        0%
  instructions       88.3G  ±  390M     87.9G  … 88.5G           0 ( 0%)        0%
  cache_references   65.8K  ± 22.3K     47.6K  … 90.7K           0 ( 0%)        0%
  cache_misses       40.1K  ± 15.3K     27.2K  … 57.1K           0 ( 0%)        0%
  branch_misses      23.2M  ±  546K     22.7M  … 23.8M           0 ( 0%)        0%
Benchmark 2 (3 runs): zig-out/bin/zlox benchmarks/zoo_sum.lox
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          13.6s  ±  182ms    13.4s  … 13.8s           0 ( 0%)          -  2.3% ±  2.8%
  peak_rss            918KB ±    0       918KB …  918KB          0 ( 0%)        💩+  3.4% ±  0.4%
  cpu_cycles         30.3G  ± 17.8M     30.3G  … 30.3G           0 ( 0%)        💩+  2.0% ±  0.6%
  instructions        122G  ± 24.0M      122G  …  122G           0 ( 0%)        💩+ 38.6% ±  0.7%
  cache_references   18.1K  ± 6.28K     10.9K  … 22.3K           0 ( 0%)        ⚡- 72.5% ± 56.6%
  cache_misses       14.5K  ± 5.05K     8.66K  … 17.6K           0 ( 0%)          - 63.9% ± 64.5%
  branch_misses      23.8M  ±  290K     23.6M  … 24.2M           0 ( 0%)          +  2.8% ±  4.3%

Do your benchmarks actually contain setups that generate NaN values?
Else I wouldn’t find it surprising if they aren’t affected by it.

I took a short look and noticed that your non-nan code uses a switch statement, while your nan code uses a series of predicates in if-else chains, the former seems like a more direct expression of what the possible branches are, with the latter the compiler has to recover the information that all of those branches are mutually exclusive and could be turned into something switch like.

But measuring and looking at the compiled output would probably the way to go.

One guess from me would be that using a bit mask to select the tag and then switching on that might be better, but that is just something I would try.

Using packed structs also might make sense, but I haven’t looked at the code in-depth enough to know whether that would fit.

Very nice. I just bought “ writing a c compiler” and im planning on doing that in zig as well.

1 Like