Performance/Optimization tooling and resources

Zig is my first language where there is more to performance optimization than just using microbenchmarks or using a profiler for an interpreted language. I’m getting familiar with the different aspects of optimization such as cache coherency, branching, and SIMD.

However, I’m not familiar with any of the tools for making actual measurements for these things? How can I measure cache hits vs cache misses or how often I’m reading from L1, L2, L3, or main memory. I’m currently developing on an M1 Mac and I’m looking for any good resources that are specific to benchmarking on this machine. Most of my searches point to using tools like perf or *trace on linux. I’m also not sure which tools may be specific to C/C++ or if they are relevant for any compiled machine code.

My project for testing this out is implementing the deflate compressor/decompressor in Zig and there are quite a few hot loops that I’d like to have more insight on.

4 Likes

On MacOS you can use DTrace or XCode Instruments that builds on top of DTrace.

1 Like

Implementation wise, all these tools rely on compiled code and (sometimes) on debug info, so, from their perspective, C++, Rust, and Zig code look more or less identical. So, feel free to google “how to profile C++”, the tricks probably will be applicable to Zig (though some amount of manual hammering and duct-taping might be required).

I also have written a fairly general post about low-level optimizations here: https://matklad.github.io/2023/04/09/can-you-trust-a-compiler-to-optimize-your-code.html It is a bit rust specific, but might be helpful non-the-less (well, at least writing it helped me to put my mental model of a compiler in order :slight_smile: )

4 Likes