How Zig incremental compilation is implemented internally?

matklad · March 12, 2024, 4:07pm

Are there any in-depth blog posts or design documents about then intended eventual design of Zig’s incremental compilation? I have a fairly good understanding of salsa (the incremental engine behind rust-analyzer) and seeing various PRs by @mlugg fly by makes me very curious about the difference between the two.

Sadly, salsa isn’t a great example of how to document incremental systems, but here are some links for interested folks:

The Basic Algorithm section from rustc guide explains the high-level idea decently
How Salsa Works links to an in-depth video about Salsa
Note that the above links to the state of the repostitory as of four year ago — salsa unfortunately got caught in a middle of significant refactor since then, so the code in the main branch doesn’t necssary makes sense (I think? Maybe refactor was completed since I last looked at aslsa)
Durable Incermentality is a relatively recent post of mine that dives deeper into one particular non-trivial optimization in salsa. For cases where almost nothing has changed after update, the work to figure out that everything is up to date might dominate the work to actually recompute few changed part. So, you need to do something to do nothing really fast.
Finally, build system a la carte paper, while not begin a direct inspiration for salsa, is a great way to conceptualize the design space.

As a bonus, here’s a post about incremental in Kotlin, which follows an interesting worse-as-better approach to the problem: https://blog.jetbrains.com/kotlin/2020/09/the-dark-secrets-of-fast-compilation-for-kotlin/

mlugg · March 12, 2024, 7:27pm

Nice question! I can give an overview here – I’m also intending to do a proper write-up of the system at some point into a docs file in the Zig repo, although that’ll probably assume a bit more technical background (since it’ll be targeted towards existing compiler devs).

First, we need to understand the compiler pipeline. After parsing (which is boring), there are three important stages:

AstGen
Sema
CodeGen

AstGen consumes a parsed AST and generates an instruction-based intermediate representation called ZIR (Zig Intermediate Representation). This implementation detail was actually recently promoted to the standard library, in std.zig.AstGen and std.zig.Zir. This pass operates on whole files, and has no context - it doesn’t deduce any type information, and it knows nothing of the contents of any other files in the project. It also doesn’t do anything compilation-specific, for instance it doesn’t change behavior based on your target. For that reason, the results of AstGen are cached globally. You might have noticed that when you update your local Zig compiler, when you first build something, you’ll see AST Lowering... in the progress message for a moment - that’s AstGen running over the whole standard library on the new compiler version (the caches aren’t compatible across versions), since we eagerly run it over every file which could possibly be referenced.

After we have generated all of our ZIR, then we have Sema. This is the heart of the compiler – it’s the stage that performs semantic analysis, which includes type checking, comptime code execution, most error messages, etc. Sema interprets the ZIR which AstGen emitted and turns it into AIR (Analyzed Intermediate Representation), a much more simple and low-level IR which is sent to the code generator. CodeGen is actually interleaved with Sema: after a function is semantically analyzed, it’s immediately sent to the code generator.

Sema is where the magic happens, so we’re going to have to zoom in on some implementation details now. When writing Zig, you might think of semantic analysis as being something that happens to the whole program at once, but that’s not how the compiler sees it. Instead, semantic analysis occurs in discrete steps, analyzing one thing at a time. There are two kinds of thing we can analyze:

A runtime function body. When we know that a function may be referenced at runtime, and we know all of its comptime args, we will analyze the function body and send the results to CodeGen.
Anything else, at comptime. The thing we analyze here is currently called a Decl. Here, I have to explain a somewhat unfortunate set of terms we currently have - despite the name, a Decl does not necessarily correspond to a source-level declaration. Every source declaration – that’s a container-level const, var, fn, comptime, usingnamespace, or test – has a Decl, but not every Decl is a source declaration. I’m actively working on some internal reworks to make this less confusing, but for now, I’ll explain the system as it exists today.

The idea behind incremental compilation is that for each of these things we can analyze – a runtime function or a Decl – we can register “dependencies” for everything the analysis used. The current (WIP) system has the following kinds of dependency:

src_hash – the value of the 128-bit hash of a range of source bytes. The compiler relies on the uniqueness of 128-bit hashes in quite a few places!
decl_val – the resolved value of a single declaration, e.g. a container-level const.
namespace – the full set of names in a namespace. This basically only exists because of @typeInfo.
namespace_name – the (non-)existence of a single name in a namespace.
func_ies – the resolved IES of a runtime function.

On an incremental update, we can invalidate changed dependencies as we analyze more things. We can invalidate src_hash, namespace, and namespace_name dependencies as soon as AstGen completes – the other two we learn about as we perform analysis. The idea is that in the main loop of analysis, we will find a Decl or runtime function which is known to be outdated and which (ideally) has all up-to-date dependencies, and re-analyze it. For a Decl, we can then mark decl_val dependencies as out-of-date or up-to-date depending on whether the value changed; similarly with func_ies dependencies for runtime functions.

There’s a lot more that I’ve not covered here:

ZIR instruction tracking
Type recreation
Locating unreferenced Decls/functions
All of the codegen/linking stuff; that’s not really my domain

…but this is a basic overview of some of the system

One day I’ll probably write a blog post with more detail. For now, let me know if you have any questions, I’ll still be happy to answer!

JPL · March 13, 2024, 9:42am

Hello, very, very interesting, topics like this should be moved to doc (forum) and perhaps have a sequel.

matklad · March 13, 2024, 10:55am

I have a bunch!

What’s an “IES”?
How “values” work? My understanding is that, during compilation, Zig creates a lot of values (comptime ints, structs, maybe also types & functions?), which are stored an interned into InternPool. I don’t think I have a more specific question here, it’s just a very general “what are the things that sema operates with”.
I guess one subquestion is how do we determine “sameness” of values, the question of equality and identity. For ints it is trivial, 92==92 and 92!=91. But what about things like const S = struct { ... }. Would it be the same struct if we add a whitespace before it? If we move it to the top of the file? If we move it to a different file?
What happens in this scenario:
- I run zig build test
- I edit one source file
- I run zig build test -Dtest-filter="specific-test"
For the second test invocation, not only the source code changes, we also need to compile less stuff. More over, the file I have touched might or might not affect the selected test. How does it work?
What is the “granularity” of incrementality? Eg, if I have a comptime call like foo(1, 2, 3), and then, after a change, it is foo(1, 2, 4), would we recompute the entire call? or would we look into the body of foo and only recompute stuff depending on the last argument? Or would we recompete the entire call-site of foo? I guess “an atom of incrementality” is that Decl/Csu thing?

mlugg · March 13, 2024, 1:00pm

What’s an “IES”?

Oops, sorry! IES stands for “inferred error set”. When a function has an inferred error set (fn foo() !void { ... }), then code which e.g. performs a switch on that error set has to “resolve” it: figure out the concrete errors it contains. To do that, we have to analyze the function body if we haven’t already so that we encounter all ZIR that could return errors, which implicitly adds them to the IES.

How “values” work? […]

Yep, your understanding is totally right, and types and functions are indeed also values which we do store in the InternPool. The basic way this works is that every value (including types) we create is assigned a 32-bit “index” into the pool; this index points into an items array, which for more complex values may also reference some data stored separately. When we want to add a value, there are some efficient datastructures centered around ArrayHashMap that allow us to quickly find the value in items and return that existing index if it’s already there; if it’s not there, we add the new item onto the end.

I guess one subquestion is how do we determine “sameness” of values, the question of equality and identity. […]

The notion of uniqueness for interned values loosely aligns with “equality” in the Zig sense. However, the example you’ve given of moving a struct declaration gets a bit deeper into the weeds.

Let’s say that on an incremental update, we move a const S = struct { ... }; declaration down one line(*). Then, the source code of that declaration itself is perfectly unchanged – we’re just considering the bytes from the const to the ;, and they haven’t changed. So, if there are no other changes, the dependencies of the Decl corresponding to S would not be invalidated. But instead, let’s suppose that we, say, added a newline between = and struct, so that the source bytes have actually changed, even if not meaningfully. Then, the src_hash dependency of the Decl corresponding to S would be invalidated, so we would re-analyze the ZIR instruction which declares a struct.

Now we have to introduce something new. As well as the Decl we already have corresponding to S, types have this thing called an “owner Decl”. It doesn’t correspond to a source declaration as such, but it’s the context in which we perform resolution of the type (to allow self-reference, type resolution happens lazily, in a few stages). That Decl contains all dependencies which arise from type resolution - for instance, if the field types of a struct make reference to a declaration, the struct’s owner Decl will have a decl_val dependency on whatever was referenced.

When we re-analyze a struct_decl ZIR instruction, we look up in the InternPool the type which exists at the same AST node (in reality we track it based on the ZIR instruction index) and with the same set of captured values. This is #18816 – structs are considered equivalent if they live at the same source location and reference all the same external values. However, if this lookup succeeds, we aren’t quite done: we need to check if the owner Decl of the type is marked as outdated.

If it’s not, everything is fine – we reuse the existing type. However, if it’s outdated, that means the structure of this type (something to do with its fields or memory layout) has changed. That means, in essence, that every use site that used this struct needs to be analyzed again – so we need a “new” type, with a fresh InternPool index to make sure it’s considered distinct. Therefore, we remove the old type from the pool (the item is replaced with a dummy removed value which lookups will never match) and create a new one (with a new owner Decl to match). Then, when analysis of S finishes, its resolved value will ultimately be this “new” type, so any dependencies on S are invalidated and re-analyzed, etc.

This system is a bit weird and subtle, and will get at least a little easier to understand after the proposed internal reworks I linked to above (which have since been greenlit by Andrew) go through, but I hope this made some sense!

*: a column is a bit weirder because of some details of how we track source locations internally for debug info, but I don’t feel like getting into that now

What happens in this scenario: […]

When we’re performing an incremental update, we don’t have a good way to immediately tell if things will become unreferenced. Of course, in this example it’s fairly obvious in theory, but you can imagine a case where it’s not; start with this:

export const a: u32 = b;
const b = 123;

…and change it to this:

export const a: u32 = 123;
const b = @compileError("bad");

Here, re-analyzing b means we’ll emit the compile error, but it shouldn’t be referenced! But it’s quite hard for us to know that ahead of time.

So, on an incremental update, we re-analyze anything that we’ve previously analyzed which could be referenced. How do we deal with things becoming unreferenced?

The answer is currently vaporware, but I’ve done some preemptive work on it. The idea is that after all analysis is complete, we will perform a graph traversal on a big table of references between Decls and runtime functions. These are kind of like dependencies, but we have different constraints on what accesses need to be fast on the datastructure and the entries are a little different, so they’ll be stored separately. Anything we reach in this traversal is “alive”: it needs to be in the binary, and any of those things that failed should emit their respective compile errors. However, anything that we don’t reach we know to be unreferenced, so we can just ignore those compile errors! We could also instruct codegen/linker to remove it from the binary if it’s there, but – at least for Debug builds, which are the primary focus of incremental compilation – that’s not strictly necessary.

So, let’s consider your change. Here’s what happens on the rebuild:

Some src_hash dependencies are invalidated which results in a bunch of re-analysis. If any test was modified, it is re-analyzed and emitted to the binary regardless. If re-analysis of the test now emits a compile error, we save it for later.
We begin traversing the graph of references between Decls and functions. When we encounter a namespace type (struct etc) which is referenced, we traverse its namespace to figure out which Decls are “implicitly” referenced. Notably, that includes any tests which pass the test filter. So here, we will exclude any tests not matching the filter from the set of things reachable in the traversal.
Some tests are now unreferenced. We don’t include them in the test functions passed to the test runner. If this update introduced compile errors in a test function which is now known to be unreferenced, we silently ignore that error – although we do keep it stored for future incremental updates, since if a future update makes the test referenced again, we can just use the error message that we’ve already found!

I hope that all makes sense!

What is the “granularity” of incrementality? […]

You’ve probably figured out from some of the answers above that yes, as you suggested, the most “atomic” thing is the analysis of an entire Decl or runtime function body. In the example you gave, we would indeed recompute the entire call. This is arguably something of a weakness of the system; our saving grace is that even today, the compiler is pretty damn fast (ignoring LLVM). Assuming you’re not horribly abusing comptime, individual analysis of functions or Decls is generally gonna happen on the scale of milliseconds. The granularity we have is a nice middle ground between compiler complexity and useful incrementality.

JPL · March 13, 2024, 2:24pm

it’s really very good what you do, I have reverse-engineered, my product has been propagated on thousands of mainframes, my introductory test at Alstom 50 mainframes not an error.
I find your approach super cooooollll

matklad · March 13, 2024, 6:46pm

Do we have an early cut off here? That is, do we try to optimize the case where owned decl changed, but this didn’t affect the struct in question? I am thinking about this case:

const S = struct {
    field: type_fn(92)
}

Now let’s say we changed something about type_fn, but in a way that type_fn(92) result remains the same. What happens here?

I think that:

the decl for S remains the same, as well, nothing changed about it.
the owner decl for S is outdated, because the decl_val dependency on type_fn is invalidated

Do we unconditionally invalidate everything that depens on S type now, or do we somehow figue out that no, the “type” returned by the type_fn is the same?

mlugg · March 13, 2024, 7:06pm

Your analysis is pretty much correct. S preserves the same Decl, but the owner Decl of the type is outdated, and thus we create a new one. However, the reason it’s outdated isn’t quite what you think. When we change a function body, that function “value” is not actually considered to have changed. Instead, what happens is that by performing an inline call (comptime calls are basically a special kind of inline call) of type_fn, we have a decl_src dependency on its body (*). This dependency is invalidated by the change, which is what makes the struct’s owner Decl outdated. So the struct type is recreated, meaning the value of the Decl corresponding to S changes, so dependencies on it are invalidated so that all uses of S are re-analyzed.

The reason it has to work like this is lazy type resolution. The types of the fields of S aren’t resolved when the type is created: this happens in the “field type resolution” phase later. This is important to allow recursive definitions, e.g. if the struct had a parent: ?*S field.

In the future, we might be able to improve this, but before making incremental compilation any more granular, I personally would like to ship a version of it. That way, we can see what unnecessarily analysis is performed in practice, and work on improving those cases. If cases like this turn out to be exceedingly rare, then it’s not necessarily worth the complexity in the compiler to save people a couple of seconds every so often!

Type resolution is a bit tricky in general, and there are a few pending problems regarding its interaction with incremental. So any of this might change over the next few months!

(*): This raises an interesting point which I haven’t mentioned so far: the current formulation of incremental compilation isn’t compatible with comptime call memoization, since we need the thing being analyzed to pick up all the dependencies which arise from the callee. For this reason, I’ll be disabling memoization when using incremental at first. Down the line we can restore it pretty easily, it’ll just require a bit more dependency tracking logic.