Wrong compiler optimizations

After benchmarking my database, I focused on how the Zig compiler optimizes things. One of the persistent issues that impacts performance is incorrect function inlining; by playing with inline and noinline, I went from 1200 ns to 375 ns on a hot path.

This is because aggressive/incorrect inlining causes the stack frame to explode and introduces too many alloca (in LLVM IR), increasing register pressure: “hot” data leaves the registers and ends up on the stack/cache, introducing cost.

I also noticed that this isn’t just my problem, but was discussed by a Tigerbeetle developer who tried to find these compiler flaws using a built-in tool called copyhound.zig. The tool not only estimates how large the scope of an optimized function becomes (to understand when to split it), but also identifies unnecessary copies (memcpy) introduced by the compiler.

That said, I find it questionable that the documentation states:

It is generally better to let the compiler decide when to inline a function

Because yes, the compiler decides, but not always well. It’s unclear how much this depends on LLVM heuristics.

Another example that calls the optimizations into question is the description of fillUnbuffered of std.Io.Reader, if you analyze the binary, although using .fixed() never needs fillUnbuffered, it occurs multiple times.

In addition to this, analyzing the LLVM IR of Zig code that uses std.Io, we see that they are still loaded into vtable functions even when unused, even with -O ReleaseFast or -O ReleaseSmall.

I know that gcc or clang with decades of tuning have optimized all of this. What do you think? Am I wrong or is Zig still not mature enough in this field?

5 Likes

I’m pretty sure Zig’s performance is on par with clang. Long ago, when I was deciding whether to get into zig, I looked at some benchmarks and assembly, and they were on par. But Zig has experimented with LLVM IR, so I don’t know if that still holds.

Well, one possible interpretation for this quote is: most people won’t understand enough about register pressure to outdo the compiler.

Just to clarify, fillUnbuffered was not inlined in this case? Maybe this is one case where inlining it could remove a whole bunch of dead code.

#31421
Just going to leave here, once again, my support for removing runtime interfaces and going back to comptime interfaces, like the excelent old Reader.

3 Likes

The lack of devirtualization of Io is a well-known issue, that there are plans to address.

Zig 1.0 is a long way away, so it’s a mistake to draw conclusions from a single point in time. If there is an impactful missed optimization opportunity, then opening new unique issues with a reduced test case is appropriate.

2 Likes