Why `-fno-emit-bin` is so much faster to get to the first compilation error?

We’ve recently added zig build check to TigerBeetle, which works similar to cargo check. That is, it’s an almost normal, except that -fno-emit-bin is passed to the compiler, so that no time is spend in LLVM. The intended use-case is to check for compilation errors while you are hacking away and not yet ready to actually run the code:

What surprises me a great deal is that check seems to be much faster, even if there are errors! Naively, I’d expect the time to first error to be more or less equivalent between the two, but it is wildly different:

  • zig build — 7.8s
  • zig build check — 1.1s

Both are cached, repeated builds of code with a simple sema error.

If we don’t even get to emitting a binary due to compilation errors, what makes the check build so much faster?

Also, PSA, you might want investigate adding zig build check to your workflow.

9 Likes
Mostly irrelevant info, see posts below

I don’t quite get the same dramatic results, although there is still a significant difference:

~/src/zig/tigerbeetle$ hyperfine -i "zig-release build" "zig-release build check" --warmup 1
Benchmark 1: zig-release build
  Time (mean ± σ):     454.7 ms ±   5.8 ms    [User: 378.1 ms, System: 118.4 ms]
  Range (min … max):   449.1 ms … 469.8 ms    10 runs
 
  Warning: Ignoring non-zero exit code.
 
Benchmark 2: zig-release build check
  Time (mean ± σ):     201.3 ms ±   3.9 ms    [User: 159.8 ms, System: 82.3 ms]
  Range (min … max):   196.2 ms … 211.5 ms    14 runs
 
  Warning: Ignoring non-zero exit code.
 
Summary
  'zig-release build check' ran
    2.26 ± 0.05 times faster than 'zig-release build'

the error I induced is:

src/tigerbeetle.zig:7:7: error: expected ',' after field
abcdef
      ^

Didn’t investigate why the difference exists beyond running strace:

$ strace -c -f zig-release build
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 46.68    0.435965         202      2150       531 futex
 34.22    0.319583       79895         4           wait4
  5.95    0.055581          88       629           readv
  3.34    0.031190          14      2203        42 openat
  2.80    0.026179          12      2176           close
  2.08    0.019381           3      4873           mmap
  1.43    0.013357          10      1320           fstat
  1.41    0.013187          20       639           flock
  1.40    0.013107          15       823           read
  0.15    0.001426           1      1008           msync
  0.15    0.001421           1      1001           munmap
  0.13    0.001181           2       415           pread64
  0.06    0.000516          34        15        10 execve
  0.04    0.000358           3        96           write
  0.03    0.000258           2       112           brk
  0.03    0.000255          63         4           fork
  0.02    0.000220          16        13           clone
  0.01    0.000136           3        42         9 readlink
  0.01    0.000129           5        25           mprotect
  0.01    0.000101           8        12           poll
  0.01    0.000084           3        24           gettid
  0.01    0.000064           2        29           uname
  0.01    0.000053           1        44           rt_sigprocmask
  0.00    0.000043           6         7           pipe2
  0.00    0.000035           8         4           eventfd2
  0.00    0.000033           8         4           writev
  0.00    0.000022          11         2           faccessat
  0.00    0.000019          19         1           ftruncate
  0.00    0.000019           2         7         2 arch_prctl
  0.00    0.000019           4         4           set_tid_address
  0.00    0.000015           7         2           open
  0.00    0.000015           0        25        23 mkdirat
  0.00    0.000010           2         4           ioctl
  0.00    0.000009           0       116        20 newfstatat
  0.00    0.000008           4         2         2 lseek
  0.00    0.000004           1         3           sched_getaffinity
  0.00    0.000000           0         9           rt_sigaction
  0.00    0.000000           0        26         9 access
  0.00    0.000000           0         4         2 madvise
  0.00    0.000000           0         9           dup2
  0.00    0.000000           0        20           getcwd
  0.00    0.000000           0         2           chdir
  0.00    0.000000           0         4           geteuid
  0.00    0.000000           0        14           getdents64
  0.00    0.000000           0         3         1 unlinkat
  0.00    0.000000           0         1         1 renameat
  0.00    0.000000           0         2           set_robust_list
  0.00    0.000000           0        10           prlimit64
  0.00    0.000000           0         4           getrandom
  0.00    0.000000           0         2           rseq
------ ----------- ----------- --------- --------- ------------------
100.00    0.933983          52     17948       652 total
$ strace -c -f zig-release build check
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 48.49    0.462997         228      2030       477 futex
 32.55    0.310785       77696         4           wait4
  5.75    0.054880          87       629           readv
  3.26    0.031091          15      1949        42 openat
  2.63    0.025154          13      1924           close
  2.45    0.023432          10      2153           mmap
  1.39    0.013268          16       806           read
  1.32    0.012646          11      1078           fstat
  1.18    0.011244          17       635           flock
  0.64    0.006148           4      1270           munmap
  0.07    0.000711           1       415           pread64
  0.07    0.000703           0      1008           msync
  0.02    0.000226           6        36           brk
  0.02    0.000182          45         4           fork
  0.02    0.000166          12        13           clone
  0.02    0.000164           4        37         4 readlink
  0.01    0.000139         139         1           ftruncate
  0.01    0.000112           1        66           write
  0.01    0.000104           4        24           gettid
  0.01    0.000104           6        17        15 mkdirat
  0.01    0.000070          23         3           writev
  0.01    0.000061           5        11           poll
  0.01    0.000059           8         7           pipe2
  0.00    0.000047          23         2           faccessat
  0.00    0.000046           1        29           uname
  0.00    0.000044          22         2         2 lseek
  0.00    0.000035           2        17           getcwd
  0.00    0.000027           1        25           mprotect
  0.00    0.000027           2        10           prlimit64
  0.00    0.000017           0       116        20 newfstatat
  0.00    0.000016           8         2           open
  0.00    0.000014           3         4           ioctl
  0.00    0.000014           3         4           eventfd2
  0.00    0.000006           0         9           rt_sigaction
  0.00    0.000006           1         4         2 madvise
  0.00    0.000005           0        44           rt_sigprocmask
  0.00    0.000005           1         4           getrandom
  0.00    0.000004           1         3           sched_getaffinity
  0.00    0.000000           0        26         9 access
  0.00    0.000000           0         9           dup2
  0.00    0.000000           0        15        10 execve
  0.00    0.000000           0         2           chdir
  0.00    0.000000           0         4           geteuid
  0.00    0.000000           0         7         2 arch_prctl
  0.00    0.000000           0        14           getdents64
  0.00    0.000000           0         4           set_tid_address
  0.00    0.000000           0         3         1 unlinkat
  0.00    0.000000           0         1         1 renameat
  0.00    0.000000           0         2           set_robust_list
  0.00    0.000000           0         2           rseq
------ ----------- ----------- --------- --------- ------------------
100.00    0.954759          65     14484       585 total

zig build check is definitely doing fewer syscalls, so something is up.

I couldn’t reproduce the difference with latest Zig master and my own project resinator, though, so it’s possible this might have been fixed since 0.11.0:

$ hyperfine -i "zig build" "zig build check" --warmup 1
Benchmark 1: zig build
  Time (mean ± σ):     189.6 ms ±   8.1 ms    [User: 359.6 ms, System: 126.7 ms]
  Range (min … max):   180.7 ms … 214.8 ms    16 runs
 
  Warning: Ignoring non-zero exit code.
 
Benchmark 2: zig build check
  Time (mean ± σ):     187.7 ms ±  11.6 ms    [User: 364.5 ms, System: 120.9 ms]
  Range (min … max):   178.6 ms … 219.9 ms    16 runs
 
  Warning: Ignoring non-zero exit code.
 
Summary
  'zig build check' ran
    1.01 ± 0.08 times faster than 'zig build'

That’s syntax error, so it should get detected in the embarrassingly parallel phase, before sema.

I got that drastic difference due to sema error (@src().fn_fame typo).

1 Like

Currently, the bottleneck of the compiler is Semantic Analysis, which is what produces those compile errors.

How it works today is ping ponging between Sema and Codegen. For example, when analyzing bar in this code:

fn foo() void {}
fn bar() void { foo(); }

Here is what will happen:

  1. Sema foo
  2. Codegen foo
  3. Sema bar
  4. Codegen bar
  5. LLVM Emit Object…

In this case, Codegen means using the LLVM IR builder API to construct an LLVM Module. The LLVM IR builder, written in C++, does slow C++ things. You know, calling nested inherited constructors, dutifully running unnecessary destructors, being blissfully ignorant about batch operations, celebrating move semantics while heap allocating out the wazoo. Classic stuff.

Anyway, with -fno-emit-bin, here is what happens instead:

  1. Sema foo
  2. Sema bar

So not only did LLVM Emit Object disappear but also the Codegen phases.

In other words, even constructing the LLVM IR Module is slow.

Related: directly output LLVM bitcode rather than using LLVM’s IRBuilder API


I meant to add: as you might have guessed, a future planned enhancement is moving Codegen to a different thread than Sema.

14 Likes

I want to add two more notes to this:

  1. Zig collects all errors before reporting them. So you’re waiting for a full compilation even when there are errors.
  2. When incremental compilation lands, it means that compilations that have compile errors still make progress. So you didn’t waste time waiting for the full compilation, you actually will get a faster compilation next time, even when compile errors are found.
5 Likes

Ah, makes sense. For future reference, here’s a way to reproduce this with Tigerbeetle @ commit c318c49ecef140f68a75e02fb177b645be4c22a3:

--- a/src/fifo.zig
+++ b/src/fifo.zig
@@ -17,6 +17,9 @@ pub fn FIFO(comptime T: type) type {
         name: ?[]const u8,
 
         pub fn push(self: *Self, elem: *T) void {
+            const frame = @src().fn_fame;
+            _ = frame;
+
             if (constants.verify) assert(!self.contains(elem));
 
             assert(elem.next == null);
Benchmark 1: zig-release build
  Time (mean ± σ):     18.527 s ±  0.052 s    [User: 17.980 s, System: 0.590 s]
  Range (min … max):   18.491 s … 18.564 s    2 runs
 
  Warning: Ignoring non-zero exit code.
 
Benchmark 2: zig-release build check
  Time (mean ± σ):      2.337 s ±  0.007 s    [User: 2.240 s, System: 0.143 s]
  Range (min … max):    2.332 s …  2.342 s    2 runs
 
  Warning: Ignoring non-zero exit code.
 
Summary
  'zig-release build check' ran
    7.93 ± 0.03 times faster than 'zig-release build'

strace shows 65018 syscalls for zig build vs 20629 syscalls for zig build check, and most of the difference is the number of mmap and munmap syscalls performed.

1 Like

Incredible timing: Emit LLVM bitcode without using LLVM by antlilja · Pull Request #19031 · ziglang/zig · GitHub

5 Likes

Just as a follow up PSA: you can also configure ZLS to run your check step on save. I’m test-driving it now and it seems promising.

2 Likes