How does Zig compile generics so quickly?

Hi everyone,

I’m trying to create a new language and was hoping to understand some things about Zig’s compiler internals.

Specifically I’d like to know: How does Zig compile generics so quickly?

In most other modern languages like C++, Rust, Swift, etc. generics are known to slow down compilation times, and yet in Zig they don’t seem to have as much of an adverse effect.

One of the major reasons could be that Zig uses it’s own custom x86 ASM backend, instead of LLVM, so it’s expected to be faster than other languages which depend solely on LLVM.

However that’s not the answer that I’m trying to find. The greatest cost of generics comes from monomorphisation where new artifacts are created by instantiating the generics. And by definition, that is slow, because this causes lot of code duplication, and bloat.

Languages such as C++/Rust which often decide to monomorphise a lot, suffer from slow compilations. In C++, it’s significantly slower, because templates need to be instantiated in each compilation unit (source file). Rust is slightly better because it compiles at a crate scope. So the Rust compiler can optimize and reduce the number of instantiations required.

I would request to know what such similar optimization technique Zig compiler uses to compile generics quickly. Specifically because Zig does lot more complex things than other languages, (such as comptime), where code can be run during compilation, and is expected to be slow.

However it isn’t slow. I haven’t measured in detail, but somehow by my annecdotal observation alone, Zig compiles fast.

The only optimizations that I think Zig might be doing, are,

  • Reducing the number of instantiations by “caching” previous instantiations. But that is limited by RAM size, and for extremely large projects (similar to Google Chrome browser) it might be difficult for the compiler to search and find every instantiation.
  • Reducing the number of searches for previous instantiations, by “caching” the results of these searches themselves, and using them to quickly create new objects.
  • Using a deterministic name-mangling algorithm where every instantiation of any generic (say ArrayList(f64)) would create a deterministic mangled name like, ArrayList_fhUehyG_blah_blah_f64 so the compiler could assume that such an instantiaion previously was created, or will be created in future, and just directly links to it, instead of wasting time compiling it again.

Is there any other older thread or resources where I can find a more definitive answers?

Thank you

1 Like

To understand Generics differences in compilation speed, you should probably compare the llvm backend to the other (llvm backed) languages. I expect most generic stuff in Zig is going on upstream of the backend, and comparing Zigs debug native backend designed for speed to llvm only obscures the generics question.

The following is sheer speculation on my part:

There’s an early video from Andrew talking about the beginnings of Zig, where he talks about how choosing the comptime model, and the decision that types are values in comptime, and so “generics just fell out”. I suspect much of the perfomance comes in part from the sheer simplicity of the model. All the type checking ends up being normal type checking at a lower level, and not something unique to generics/traits,

Other factors include lazy evaluation, single unit of compilation model, and so on. Overall Zig perfomance is an accumulation of many decisions. Zig is designed from end-to-end with compiler performance as a major consideration. Many other languages barely consider compilation performance in their design.

2 Likes

Yes, that’s what I’m questioning.

Since, types are values, they need to be substituted in every monomorphisation such as (ArrayList(u32) , ArrayList(f64), etc.)

Moreover since Zig doesn’t have traits, the compiler has to substitute the types, and build every generic function separately (similar to C++). If Zig had traits like Rust, the generic function needed to be compiled/typechecked only once, and for every instantiation it would just need to check if the provided type implements the traits it requires.

Thus Zig’s generics are more closer to C++ and should theoretically make it slower.

However it isn’t slow. That’s what perplexes me.

In Zig there are no constraints to evaluate, no traits to check.

Doing nothing is always faster than doing something.

At each call site to the generic type function that actually gets evaluated (remember Zig’s lazy model will drop many potential sites and not evaluate them), it is interpreted at comptime, generating an intermediate representation identical to non-generic code that is type-checked with Zig’s simple type system.

Complexity is usually slower than simplicity.

4 Likes

You probably know this, but there is caching/memoization of the monomorphized decls, so that each combination of type params for a given decl only has to be compiled once. This caching has a cost I’m sure, but without it the cost would be much, much higher.

EDIT: I don’t know the compiler internals, I just remember hearing this.

3 Likes

The simplest answer to your question: Zig does memoization of calls in comptime which results in those function calls being run only once per unique input. Yes, this does mean that the entire compilation unit has to fit in memory at once, which actually isn’t a big deal.

Except the above isn’t true: memoization was turned off sometimes, I think for incremental? It does get dedup’d at some point before machine code generation, but I’m not exactly sure on the process. Either way, memoization of the generic function call is not actually why Zig compiles so much faster than C++ templates.

Hashimoto’s Zig compiler series gives an overview of how the Zig compiler works. Most importantly for your question is the section on Sema: “generics” in Zig essentially fall out of how comptime works: Zig – Mitchell Hashimoto

The other important bit is a general combination of “why is Zig fast?” and “why is C++ slow?”.

The Zig compiler is fast because it’s actually designed to be fast, including having near veto rights on the language’s design. If a language feature would be nice, but it’ll make compilation slow, it’s a no go.

Andrew’s talk going over design of the compiler for making it fast: https://www.youtube.com/watch?v=IroPQ150F6c

Mathew’s followup a couple years later with the additional work they’ve done: https://www.youtube.com/watch?v=KOZcJwGdQok

The biggest piece of this puzzle is actually: C++'s design for templates is essentially optimally designed for making compile times take forever. When compiling c/c++, every c/cpp file is its own compilation unit. That is, you run the compiler once for every source file, with info on how it should look up header files; It’s only after all of the files are compiled to machine code individually that they are linked together.

What does this look like in practice? Say you have 2 functions, foo, and bar. They’re defined in different files, and foo calls bar. So you have foo.cpp, and bar.cpp. You also need bar.h, which gives the signature for bar. So compilation goes like this:

  1. Compile foo.cpp, which reads bar.h for it’s signature, and generates machine code for foo.
  2. Compile bar.cpp, which reads bar.h for it’s signature, and generates machine code for bar.
  3. Link those object files together, the dangling reference from foo to bar is set up, and the executable is output.

Ok, let’s say that bar is actually a template type Bar, and the function foo uses a specialization of it for int. How does the above need to change? Well, there’s an immediate problem, the compilation of foo.cpp knows that it needs the machine code for Bar<int> to be generated. However, bar.cpp doesn’t know that. So c++'s hack is to do the following: Move the implementation code for Bar into Bar.h. Bar.cpp possibly disappears, or maybe holds some non-generic code. Now when foo.cpp is compiled, it also reads all of the implementation of Bar, and generates machine code for each template instantiation. This part of C++'s design isn’t that slow, really. The slow part comes in when baz.cpp also uses Bar<int>. It doesn’t know that foo.cpp has already compiled that machine code, it’s an entirely separate process. It’s only when the linker runs that it can go “hey I have two of these, I can throw one away”. So the result of all of this is that you end up paying the cost for compiling again and again and again.

This design was chosen for two reasons: 1. Early computers had limits in the kilobytes of memory. 2. Templates weren’t part of the design, they were tacked on much much later. No modern language should ever copy this ancient and outdated method of compilation.

tl;dr: Zig uses a single compilation unit. This avoids repeatedly taking template implementations, loading them, parsing them, and compiling them all the way down to machine code. Zig is also designed from the ground up to avoid such traps in compilation speed, and the compiler team has speed as one of the largest priorities in the design and implementation of said compiler.

5 Likes

C++ is also much more likely to add new features and complexity than to remove old features and complexity, which is pretty much the opposite of Zig.

Zig would rather sacrifice backwards compatibility and stay a small, sharp, precise language (one of the benefits of being pre 1.0), while C++ does expand a lot of effort on backwards compatibility and often adding new variations of already existing features (allowing you to do something yet another way), while being slow on removing/transitioning away from old features, which causes it to keep dragging around a lot of that complexity, which makes the language bigger and much more difficult to update, maintain and improve.

While C++ is big, focused on features and stability, Zig is small, much more minimal and focused on speed and improvements.

6 Likes