Const, Var, and Comptime Compiler Optimizations

Overview

The Zig compiler can use comptime-known information to produce optimized assembly code. These optimizations can involve (but are not limited to) removing read/store instructions and pre-calculating results.

Example 1: Const vs Var Integers

In this example, we’ll calculate the square of a value x and return it to the user using different qualifiers for x. This example does not have any compiler optimizations applied.

We begin with var:

var x: i32 = 24;

export fn foo() i32 {
    return x * x;
}
foo:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     eax, dword ptr [example.x]
        imul    eax, dword ptr [example.x]
        mov     dword ptr [rbp - 4], eax
        seto    al
        jo      .LBB0_1
        jmp     .LBB0_2

We can see that example.x is being loaded and a multiplication operations occurs.

Now for const:

const x: i32 = 24;

export fn foo() i32 {
    return x * x;
}
foo:
        push    rbp
        mov     rbp, rsp
        mov     eax, 576
        pop     rbp
        ret

Here we see that the direct value 576 (which is the square of 24) has been pre-computed and any reference to example.x has been removed.

Example 2: Comptime Keyword

Comptime-known information can also be subject to optimizations similar to example 1. In this case, we will make a function that takes a comptime parameter and observe similar results. To begin, no compiler optimizations are applied:

fn bar(comptime x: i32) i32 {
    return x * x;
}

export fn foo() i32 {
    return bar(11) * bar(12);
}

This generates 3 segments of interest:

example.bar__anon_861:
        push    rbp
        mov     rbp, rsp
        mov     eax, 121
        pop     rbp
        ret

example.bar__anon_862:
        push    rbp
        mov     rbp, rsp
        mov     eax, 144
        pop     rbp
        ret
foo:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        call    example.bar__anon_861
        mov     dword ptr [rbp - 8], eax
        call    example.bar__anon_862
        mov     ecx, eax
        mov     eax, dword ptr [rbp - 8]
        imul    eax, ecx
        mov     dword ptr [rbp - 4], eax
        seto    al
        jo      .LBB0_1
        jmp     .LBB0_2

Without additional optimization flags, we can see that both bar(11) and bar(12) behave similar to our first example while foo has call and multiply operations.

With ReleaseFast we see this:

foo:
        mov     eax, 17424
        ret

All calls to bar were elminated and a single number is pre-calculated and returned. In fact, bar does not even appear in the assembly output. This may be due to factors such as inlining.

The comptime keyword can also be used in front of a function call to create an effect similar to using comptime parameters without generating anonymous function overloads (no compiler optimizations are applied):

fn bar(x: i32) i32 {
    return x * x;
}

export fn foo() i32 {
    return comptime bar(11) * bar(12);
}
foo:
        push    rbp
        mov     rbp, rsp
        mov     eax, 17424
        pop     rbp
        ret

Example 3: Struct of Integers

In this example, we’ll make a user defined type with integers and see if it also picks up similar optimizations based on const vs var. We begin with var and no additional optimizations.

var data: struct { x: i32, y: i32 } = .{ .x = 41, .y = 42 };

export fn foo() i32 {
    return data.x * data.y;
}
foo:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     eax, dword ptr [example.data]
        imul    eax, dword ptr [example.data+4]
        mov     dword ptr [rbp - 4], eax
        seto    al
        jo      .LBB0_1
        jmp     .LBB0_2

Here, we see the a similar pattern with both integer members involved in a multiplication instruction.

With const:

const data: struct { x: i32, y: i32 } = .{ .x = 41, .y = 42 };

export fn foo() i32 {
    return data.x * data.y;
}
foo:
        push    rbp
        mov     rbp, rsp
        mov     eax, 1722
        pop     rbp
        ret

All references to our data struct has been removed and the result has also been pre-calculated. This shows that similar optimizations are applied to user defined types as well as fundamental types.

5 Likes

I’m planning on expanding this Doc with more types besides integers. I want to also include strings and user generated structs. I figured we’ll start here and if anyone has additional ideas then we can add them.

At some point, we’re going to do SIMD documentation, so this Doc is really about the effect of const and var.

My understanding is that in Zig all function calls are considered runtime calls unless they are explicitly marked comptime at the call site, the function’s return type is a comptime only type, e.g. type, or the function is inline.

Edit: The above is talking about my understanding of the stage of semantic analysis. Later stages can figure out that a function call doesn’t depend on runtime state and optimize it out.

The same foo assembly is generated in ReleaseFast regardless of whether bar has its parameter marked as comptime. My guess is that the optimization eliminating the calls to bar is done at the level of LLVM since I cannot reproduce the optimization when passing -fno-llvm unless I explicitly mark the computation comptime:

fn bar(x: i32) i32 {
    return x * x;
}

export fn foo() i32 {
    return comptime bar(11) * bar(12);
}
1 Like

Interesting - I think the comptime keyword should be noted for it’s optimization behaviour in this case, too. We should probably add that example to the doc. Good insight :slight_smile:

That said, that’s the assembly that godbolt handed back. It was able to see through the call with the comptime parameters probably because it noticed that it’s always just returning the same value. It was totally eliminated when I tried it. You’re probably right though - that’s probably an LLVM thing.

My understanding was that the compiler is free to decide to inline a function if it thinks it is a good idea (probably based on some heuristics).

I was under the impression that the non-llvm backends don’t have many optimizations yet, so maybe this is just a status quo thing and in the future it might optimize as well (at least when you aren’t building for unoptimized hot-code-reloading speed).

2 Likes

Yeah, maybe I didn’t phrase that right. On the path to code generation a lot of stuff can be inlined and optimized out. I was just talking about comptime code evaluation, not what the compiler decides to do after that. And I still might be incorrect, I’m just remembering the answers a Zig maintainer gave to some questions I had on discord about comptime calls.

2 Likes

Either way, your comment about comptime in front of the call is still valuable, so we should try to work that in.

True, inlining could be a factor here as well.

2 Likes