Zig compiler optimisation in action

When I program in C++, I often like to break up long functions by introducing local functions in the form of lambdas, especially when code fragments need to be repeated. I prefer local helper functions because they don’t pollute the parent’s namespace and are directly proximate to their uses. I’ve always trusted optimising compilers to look through what look like function calls at the language level but are really ways to inline code.

So I do miss direct language support for lambdas in Zig. But Zig does offer local static functions, so I put a simple one into Compiler Explorer to see if they are optimised similarly. And then I discovered something had changed for the better since 0.13.0, but I’m not sure what part of the compiler is responsible for the improvement.

export fn square(num: i32) i64 {
    const S = struct {
        pub fn op(a: i32, b: i32) i64 {
            return a * b;
        }
    };
    return S.op(num, num);
}

generates the following assembly:

square:
        mov     eax, edi
        imul    eax, eax
        ret

which is perfect. So I wanted to compare it to the non-lambda version:

export fn square2(num: i32) i64 { return num * num; }

but I was outsmarted by Zig trunk:

square2:
        jmp     square

In Zig 0.13.0, we get

square2:
        mov     eax, edi
        imul    eax, eax
        ret

I realise this is not earth-shattering, but I thought it was interesting that there’s some compiler optimisation introduced between 0.13.0 and trunk that realises square and square2 are effectively the same set of operations. Possibly this is an LLVM improvement?

Here’s the godbolt link: Compiler Explorer

Now I can be care-free creating heaps of local functions. I don’t like the extra level of indentation caused by the wrapping struct, but I do like the optimisations.

2 Likes

You will get the same optimization on ReleaseSmall with 0.13.0 or even 0.11.0. I think it just included in ReleaseFast now

2 Likes

It’s LLVM that is doing it.
This is called function folding, and has existed for a while.
I think the reason LLVM didn’t fold it in 0.13 was because it decided that the jmp was not worth it. It can’t ellide the function entirely here, because you exported both of them, so even if it realizes that a function already exists, it only has two options, either inline it or redirect it. The thresholds for these kinds of decisions change all the time. The jmp adds extra time to the function, because it could have started the computation immediately. It also requires prediction and it may require loading an extra cache line.

4 Likes

There’s no need to carry C++ habits into Zig code. It isn’t idiomatic to put a function like that in another function, wrapped in a struct, except when something about the function needs to be comptime-configured using parameters of the parent function.

It’s easy to create as many struct types for namespace containers as you find useful to organize code, so private helper functions for some main function can all be organized in a container, rather than defined inside the function.

// Already inside a containing namespace:
pub usingnamespace struct {
    // This will be added to your parent namespace
    pub fn somePublicFunction(...) T {}

    // This remains private
    fn someHelperFunction(...) T {}
}

That is itself a bit unusual, but if you want, say, have several helper functions with the same name applied to different public functions, this works to get that.

Embedding them in an extraneous struct when you don’t need to will confuse other people, including you later, because of the convention of only embedding functions-within-functions if there’s a need for comptime stuff to happen. If I saw that I would scan for what comptime parameter justifies the embedded namespace, and then be confused if I didn’t find it.

It’s better to go with the grain when learning a new language. In C++ that’s lambdas, in Zig, it isn’t.

1 Like