Why is Zig much slower than other LLVM-based compilers? (includes benchmark)

The Zig compiler seems much slower than other LLVM-based compilers like rustc or clang.

The benchmark below compiles a minimal binary in debug mode.
For Zig, almost all of the time is spent in phase LLVM Emit Object.
How does Zig use LLVM differently so that it’s 8x slower than Clang and 4.5x slower than rustc?

echo 'pub fn main() void {}' > main.zig
echo 'fn main() {}' > main.rs
echo 'int main() { return 0; }' > main.cc

hyperfine --shell=none --export-markdown out.md \
  'zig build-exe main.zig' \
  'rustc main.rs' \
  'g++ main.cc -o main'

zig version # => 0.13.0
rustc --version # => rustc 1.78.0
clang++ --version #=> clang version 17.0.6

Results

Command Mean [s] Min [s] Max [s] Relative
clang++ main.cc -o main 0.157 ± 0.008 0.142 0.171 1.00
rustc main.rs 0.282 ± 0.041 0.243 0.381 1.79 ± 0.28
zig build-exe main.zig 1.269 ± 0.080 1.127 1.360 8.06 ± 0.66
1 Like

Are you sure what you’re building is comparable? You may be building glibc in the zig build, whereas you are linking it for the two other builds.

Assuming you’re on Linux, if you look at the LLVM module that is being compiled it will be evident. Rust and Clang emit only the main function and then rely on a precompiled libc for everything, whereas Zig is producing a static executable that has compiled all the parts of the standard library that are depended on. In particular, debug builds almost always depend on the ability to print a stack trace, which ends up being a fair amount of code.

You can get the three compilers to do a similar amount of work by having them emit object files instead:

Benchmark 1 (242 runs): clang -c main.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          20.6ms ± 1.51ms    18.8ms … 25.6ms          6 ( 2%)        0%
  peak_rss           96.6MB ±  193KB    95.7MB … 96.7MB          5 ( 2%)        0%
  cpu_cycles         50.4M  ± 1.92M     46.4M  … 56.3M           2 ( 1%)        0%
  instructions       76.2M  ± 32.5K     76.1M  … 76.3M           8 ( 3%)        0%
  cache_references   2.45M  ± 18.3K     2.42M  … 2.52M           9 ( 4%)        0%
  cache_misses        425K  ± 3.03K      408K  …  441K           9 ( 4%)        0%
  branch_misses       347K  ± 2.65K      342K  …  359K           3 ( 1%)        0%
Benchmark 2 (217 runs): rustc --emit=obj main.rs
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          23.0ms ± 1.09ms    21.1ms … 25.1ms          0 ( 0%)        💩+ 11.9% ±  1.2%
  peak_rss            120MB ±  298KB     119MB …  121MB          0 ( 0%)        💩+ 24.1% ±  0.0%
  cpu_cycles         52.3M  ± 3.26M     45.2M  … 58.9M           0 ( 0%)        💩+  3.8% ±  1.0%
  instructions       66.9M  ± 20.3K     66.9M  … 67.0M          10 ( 5%)        ⚡- 12.1% ±  0.0%
  cache_references   2.94M  ± 14.1K     2.91M  … 3.00M           4 ( 2%)        💩+ 20.1% ±  0.1%
  cache_misses        694K  ± 6.06K      681K  …  718K           4 ( 2%)        💩+ 63.6% ±  0.2%
  branch_misses       380K  ± 1.97K      374K  …  388K           6 ( 3%)        💩+  9.4% ±  0.1%
Benchmark 3 (135 runs): zig build-obj main.zig
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          37.2ms ± 4.58ms    27.3ms … 48.1ms          0 ( 0%)        💩+ 80.7% ±  3.1%
  peak_rss           92.5MB ±  839KB    90.2MB … 94.1MB          0 ( 0%)        ⚡-  4.2% ±  0.1%
  cpu_cycles         41.1M  ± 1.32M     38.0M  … 44.3M           0 ( 0%)        ⚡- 18.4% ±  0.7%
  instructions       50.9M  ± 7.44K     50.9M  … 50.9M           0 ( 0%)        ⚡- 33.1% ±  0.0%
  cache_references   2.62M  ± 28.7K     2.56M  … 2.72M           1 ( 1%)        💩+  7.1% ±  0.2%
  cache_misses        491K  ± 16.4K      457K  …  538K           0 ( 0%)        💩+ 15.7% ±  0.5%
  branch_misses       275K  ± 2.47K      270K  …  284K           2 ( 1%)        ⚡- 20.8% ±  0.2%

Percentage-wise it still does not look good, but at this point we’re looking at differences of tens of milliseconds, so it starts to get into constant-time overhead territory.

Anyway, the main focus of the compiler development team right now is addressing this by introducing incremental compilation which means the compiler will no longer redo all the work building the standard library with successive compilations. With this in place it will become clear that the Zig compiler is indeed, quite fast.

21 Likes

Great answer, thank you!