Mitchell’s posts are a great starting point, although bear in mind that once you reach the one on Sema
there’s some outdated information – in particular, the explanation of Type
vs Value
vs TypedValue
is no longer accurate.
Here’s a summary. The Zig compiler pipeline looks vaguely like this:
Parse -> AstGen -> Sema -> CodeGen
Parse
and AstGen
are in the standard library as std.zig.{Parse,AstGen}
. The result of this is a block of instructions for each file. These instructions are called ZIR (Zig Intermediate Representation). The code is not yet type-checked: this happens in semantic analysis (Sema
). Most error messages and comptime magic happens in Sema
; the main notable things that AstGen
handles while lowering the AST to ZIR are RLS (Result Location Semantics; see the langref if unfamiliar) and certain “global” error messages (those which do not require semantic analysis, e.g. “unused variable”; these are the errors that zig ast-check
can pick up on).
Sema
’s job is to take those ZIR instructions and convert them to AIR – or, in the case of comptime execution, interpret them. I find that it helps to think of Sema
as an interpreter which in some cases emits AIR instructions to instead do the operation at runtime. Sema
is definitely the most important part of the compiler, but it can also be quite hard to understand, largely due to its size.
Loosely, the idea is that a single “body” of ZIR instructions is interpreted by the main loop, Sema.analyzeBodyInner
. This function is essentially a big ol’ switch
inside of a loop over the instructions. The switch
cases mostly dispatch to handler functions, e.g. zirCondbr
to handle the condbr
instruction.
When analyzing a runtime function, Sema
emits AIR instructions which are sent to the code generation backend. The default is the LLVM backend – this lives in src/codegen/llvm.zig
. We also have several WIP self-hosted backends, for instance in src/arch/x86_64/CodeGen.zig
.
You note the InternPool
as a fairly isolated part of the compiler. The primary role of InternPool
is to store immutable comptime-known values (including types) in an efficient manner, exposing a (relatively) type-safe API for accessing them. It’s a very important part of the compiler, but can be a bit tricky to grasp, because there are some slightly tricky memory optimization strategies in play (including Andrew’s favourite pet datastructure std.AutoArrayHashMapUnmanaged(void, void)
). You don’t need to have a deep understanding of the InternPool
implementation for a simple understanding of Sema
.
If you have a debug build of the compiler, you can dump the ZIR for any file using zig ast-check -t foo.zig
. You can dump all AIR emitted by Sema
for a compilation by passing --verbose-air
to the build-{exe,lib,obj}
command (be warned: there’ll probably be a lot, so you’ll want to pipe it to a file). If you have any specific questions, let me know and I’ll answer them as best as I can. Happy hacking!