Several disjoint thoughts here!
I was there when incremental compilation was retrofitted into rustc and into Kotlin compilers, and in both cases there were some bad bugs shipped, where incremental compilation was enabled in X, critical bug detected in X + m, and incremental getting disabled until X + m + n. That is to say, this is likely harder than it seems, you really need to invest into some ways to prevent this kinds of bugs, and that, even if you are sure you are done (version X), your users might discover something nasty later.
Curiously, I think we had grand total of one incremental compilation bug in rust-analyzer. That is because it was incremental from day one, and it was based on incremental framework which guaranteed correct results by construction. (debugging the framework was easy, because you only needed to debug incremental update algorithm, and the actual computation were fully abstract). The bug we had was due to non-determinism in a user-defined proc macro, which broke the core assumption of the framework.
My understanding though that Zig doesn’t go for a “framework” approach (which probably makes sense, as rust-analyzer is as anti-dod as it gets), so this probably isn’t applicable
A very useful implementation property would be path independence. That, regardless of the path you use to get to a particular source code, the result is byte-for-byte identical (as opposed to being merely equivalent). That indeed opens up path for fuzzing.
Yes, obviously this needs fuzzing / randomized generative testing. After working at TigerBeetle, I am convinced that, if you care about correctness, you should build the bulk of your test-suite around an integrated randomized test. There are so many bugs which are next to impossible to find by just looking at the code or doing manual example-based test.
For fuzzing compilers, the blueprint is wasm-smith:
That is a thing that can generate random, but syntactically and semantically WASM code, which also gives minimization for free, and also for free integrates with coverage-guided fuzzers like AFL.
The idea is very simple: write a function that takes an rng
argument, and produces a random valid code fragment. Which is easy — you basically pick a random expression kind, and then generate sub-expressions, pruning them by the known result type. Then, the key trick — make the rng
finite, such that it can generate only up to n
numbers. That is, an rng is just a pre-filled array of random bytes. And than patch the generating function to short-circuit if the rng
runs out of entropy (eg, just generate undefined
expression). With this setup, you can shorten the byte array to make the resulting code shorter. And you can use AFL as a source of those random bytes, to plug into coverage-guided aspect.
If you can generate random code, it should be easy to generate random equivalent histories. For example, you can start with A
, then apply some operations to go from A
to different B
, then repeat the same with different seed to go from A
to C
, an then reply the edits backwards to get from B
and C
to identical code.
More directly, you can build A
incrementally by fixing the sequence of random bytes used as a seed for the generator, and gradually increasing its length.
Alternatively, you can try applying local edits to the seed sequence, which may result in local edits to the generated code.
Fuzzing is tricky to integrate with CI. While normal tests are edge triggerd (you run test on commit, it finishes in finite time), fuzz tests are level triggered (you run it for however long you can, it can discover a failing seed at any point, and there’s a set of currently failing seeds). I think we arrived at a good pattern at TigerBeetle:
- we have a machine which in a loop fetches code from the main branch, runs the fuzzer, records failing seeds and pushes (commit hash, seed) pairs to a json file in the repo
- there’s a piece of html/js that displayes the table of “hey, these seeds are currently failing on these commits”: TigerBeetle DevHub.