@setFloatMode not taking effect when using zig build

I am not sure whether this is a bug so didn’t want to immediately open an issue. Also, when I tried making a minimum example, it DID produce different binaries…

The setup is as follows:

I ported a C program that is doing a bunch of matrix multiples. When I enable the -ffast-math flag on gcc the C version runs about 3x as fast. I tried this on zig to not great results.

  1. the binaries did not get faster at all
  2. it seemingly does not take affect when using zig build (I have not modified the build script)

Whether I have this at the top or sprinkled in every function, the resulting binaries are all identical.

comptime {
rm zig-cache/ zig-out/ -rf
zig build -Doptimize=ReleaseFast
# cp it out
❯ xxhsum no-opt opt
51b77931143e6f82  no-opt
51b77931143e6f82  opt

Oddly enough, when directly using zig build-exe I do get different binaries, however still no speed up.

I tested with godbolt Compiler Explorer and can confirm that it does produce much different code in theory.

Here is the link to the project if you want to try building it yourself: GitHub - cgbur/llama2.zig: Inference Llama 2 in one file of pure Zig

I think this change in code is unrelated to setting the float mode. There is something weird going on.
If I remove the extra lines in the second piece of code, suddenly they both output the same assembly.

Here look at this godbolt. It’s the same code, except the second one has the extra newlines and the output is completly different.

Another mystery! Still trying to understand why it doesn’t affect the binaries in the actual project, the 3x speedup in the C version is no small margin.

Has anyone had success in using the setFloatMode in any of their projects?

edit: Ive overcome all of the performance deficit and more by using @Vector in the matmul. We are now cooking in terms of performance.

Implementation Tokens/s
llama2.c make run 116
llama2.c make runfast 375
llama2.zig zig build run -Doptimize=ReleaseFast 466