Emit "monomorphized" zig code

Hello, the title probably doesn’t make much sense as I don’t think zig calls it monomorphization. Lately I’ve been playing around with comptime (really enjoying this take on metaprogramming). I don’t understand the inner workings of the compiler yet, but I’m guessing there is a step for interpreting the compile-time parts of a Zig program and emitting the “monomorphized” version of it? If so is it possible to look at the code during this compilation step, ie. a compile option for zig build-obj? Something like this could help a lot with learning, as you get to see what your comptime code gets turned into .

I did look through the options for the compiler, and you can obviously emit assembly and llvm-ir, but I don’t think I found anything for zig-specifc intermediate representations you’d get further up the pipeline (I may be blind though). Thanks in advance, and happy new year.

6 Likes

The pipeline of Zig code looks something like this:

  1. Source code
  2. AST (Abstract Syntax Tree)
  3. ZIR (Zig Intermediate Representation)
  4. AIR (Analyzed Intermediate Representation)
  5. Backend-specific logic (LLVM backend, x86-64 self-hosted backend, etc.)

Comptime evaluation happens in the transition from ZIR to AIR, using a process called Sema. So, it sounds like AIR is what you’re looking for.

It is possible to look at a textual representation of ZIR and AIR using the following two commands, but these commands only work on a debug build of the compiler, so you will need to build one from source to run them:

  • ZIR: zig ast-check -t file.zig
  • AIR: zig build-obj --verbose-air file.zig

For comparison, here’s a simple main function:

pub fn main() !void {
    std.debug.print("Hello, world!", .{});
}

And here are the ZIR and AIR for the function, as of Zig 0.12.0-dev.1836+dd189a354:

ZIR
  [60] pub main line(2) hash(b34a5dd1dd5f828d0d25f54718b526c5): %4 = block_inline({
    %23 = func_inferred(ret_ty=@void_type, inferror, body={
      %5 = block({
        %6 = dbg_block_begin()
        %7 = dbg_stmt(2, 5)
        %8 = decl_ref("std") token_offset:4:5 to :4:8
        %9 = dbg_stmt(2, 8)
        %10 = field_ptr(%8, "debug") node_offset:4:5 to :4:14
        %11 = dbg_stmt(2, 14)
        %12 = dbg_stmt(2, 20)
        %13 = field_call(nodiscard .auto, %10, "print", [
          {
            %14 = str("Hello, world!")
            %15 = break_inline(%13, %14)
          },
          {
            %16 = struct_init_empty_result(%13) node_offset:4:38 to :4:41
            %17 = break_inline(%13, %16)
          },
        ]) node_offset:4:5 to :4:42
        %18 = dbg_block_end()
        %19 = restore_err_ret_index(%5.none)
        %20 = break(%5, @void_value)
      }) node_offset:3:21 to :3:21
      %21 = restore_err_ret_index(.none.none)
      %22 = ret_implicit(@void_value) token_offset:5:1 to :5:1
    }) (lbrace=1:21,rbrace=3:1) node_offset:3:1 to :3:7
    %24 = break_inline(%4, %23)
  }) node_offset:3:1 to :3:20
AIR
  %0!= save_err_return_trace_index()
  %2!= dbg_block_begin()
  %3!= dbg_stmt(2:20)
  %4!= call(<fn (@TypeOf(.{})) void, (function 'print__anon_2997')>, [@Air.Inst.Ref.empty_struct])
  %5!= dbg_block_end()
  %7!= ret(<@typeInfo(@typeInfo(@TypeOf(test.main)).Fn.return_type.?).ErrorUnion.error_set!void, {}>)

In particular, in this example, you can see how the generic call to std.debug.print has been monomorphized down to a call to print__anon_2997.

11 Likes

What a great write-up, thank you. I’m hoping to one day get to hack on the compiler so getting an overview like this is very helpful. I’ll be honest I was hoping comptime was processed in it’s own step into something that looked more familiar to zig code, that was a bit of a bummer.

Still both of these IR look interesting, AIR looks very similar to the format that compiler error-messages come in and seems like it could be useful to inspect the (expanded) types that are passed around in your program (you can obviously already do that with all the built-in reflection zig has, but maybe it would save you writing up all those print statements). Both look very similar to llvm-ir.

The issue is getting a debug build of the compiler though. If I understand correctly you need LLVM 17 to build Zig from source and my package-manager is still on LLVM 16. I’m sure you can get pre-compiled binaries for LLVM from somewhere but I lost motivation and didn’t try too hard at this point :frowning:

Either way, thanks for the detailed reply

2 Likes

If I understand correctly you need LLVM 17 to build Zig from source

It’s actually possible to build Zig without linking to LLVM at all! Such a build uses Zig’s own code generation backends rather than LLVM to emit binaries, the most complete of which is the C backend (utilized in Zig’s own bootstrap chain). Regardless, you don’t care about codegen here: the point is just that you don’t need to link to LLVM to look at the previous bits of the pipeline.

Linking LLVM is actually an opt-in process. If you have an up-to-date system installation of Zig, you should be able to get a non-LLVM debug compiler just by running zig build in the compiler’s source tree. Then you can use.zig ast-check -t foo.zig to dump the ZIR for foo.zig; and you can run a compilation with --verbose-air to dump all AIR (beware however that there’ll be a lot, particularly if you don’t override the panic handler).

9 Likes