I’d like to provide some performance data points in the release notes for the upcoming 0.11.x release. Does anyone have a medium-to-large sized project that has a branch that compiles with zig 0.10.x and a branch that compiles with zig 0.11.x so we can see the difference?
Note that simply checking out an old version of your project won’t be very interesting, because it means all the modifications made since then will make the comparison unfair.
Do you have a preferred way that people time their compilations? Are you looking for statistics provided directly by the compiler or some sort of outside source? I think it may be helpful to give us a standard way of doing this so we can provide good data points.
One comparison that could be quite handy would be using callgrind. The basic usage is like this:
valgrind --tool=callgrind zig build-exe ...
This will dump some profiling data into the cwd, which can be analyzed in a few different ways but I personally enjoy using kcachegrind, which looks something like this:
If you do multiple runs, kcachegrind will open them all up and show comparisons. Mainly, it will be interesting to see where most of the CPU instruction count is spent, compared to each other.
In order for this to work, you will need unstripped release builds of Zig. The binaries provided on the website are stripped. I’m happy to help with that if you need any assistance obtaining such binaries. It should be only a matter of passing -Dstrip=false to zig build.
So, yeah, it feels like something is quadratic and/or undercached in the compiler, but hard to say more without deeper knowledge of how this should work.
I got curious why perf and kcache grind point to related, but different functions… I think kcachegrind is just confused, as it thinks that abiSizeAdvanced takes 334.94% of total exectution time. Without cycle detection, abiSizeAdvanced looks like this in kcachegrind:
Hmm I think I see the problem. structs do actually cache their field offsets, however, at some point the LLVM backend stopped using that information and started doing its own calculations, which repeat the calculation every time. So if the zig source code initializes N fields, then this is O(N^2) calculations.
I’m working on migrating structs over to InternPool today, so it’s actually perfect timing for me to look into solving this perf regression as well.
If only Performance Tracking ⚡ Zig Programming Language was not bitrotted… I would love to have a tool like this available. Alas, it requires recurring operational maintenance, and I lack the time to keep it up and running.