I am looking for guidance on how I can help get preserve_none calling convention supported by zig.
I am building an Ethereum Virtual Machine interpreter named Guillotine in zig. It’s already the fastest EVM ever built but there is a way to make it up to 30% faster
Guillotine: evmts/guillotine on github (2 link limit)
My EVM uses tailcall recursion where every machine instruction tailcalls the next
As far as I understand it, the preserve_none calling convention will allow me to reuse registers which can massively help performance
Zig does not support this calling convention
I actually forked zig and started trying to add this feature myself but it’s still a WIP: evmts/zig repo on github (2 link limit)
Curious if there are any workarounds I can do in meantime to get this calling convention and if there is anything I can do to advocate for and/or help get this calling conventions upported
Definitely! I was using a labeled switch in the iteration right before I refactored to using tailcall recursion. The tailcall recursion version had better branch prediction and cache efficiency so it benchmarks faster while having the potential to get this nice perf boost with preserve_none
Another nice-to-have benefit of the tailcall version is if you look at the stack trace it matches the opcodes being dispatched 1to1 and is quite nice to debug.
The labeled switch version though was super clean code I really liked that version.
Interesting, is this a recent comparison? Would be interesting to know if tailcalls still win, maybe there are improvements to switching that could be made. Either way, preserve_none seems like a great addition.
I see no particular reason why we couldn’t add support for those preserve_* calling conventions that LLVM has.
If you aren’t comfortable bringing a patch to completion and PRing it, you can also just open an enhancement issue on ziglang/zig. I might have some spare time to look at it soon.
Yea I would need to investigate more to explain why but my best guess is there is something about iterating down an array of function pointers that the branch predictor is able to pick up on
I believe tailcalls won’t preserve the stack-trace, that’s the purpose of tailcalls: reuse the last callframe instead of pushing a new function on the stack. I believe logging is the only way to get a list of executed opcode while debugging, or pushing them in an arraylist.