Inlining functions

In the functions sections of the language reference there’s an example of inlining a function using callconv(.Inline). I’ve also seen inline fn used in existing Zig code.

  • Is there a difference?
  • Is one preferred over the other?
  • Is it really necessary or is it best to let the compiler decide when to inline a function?
3 Likes
  • is there a difference

No, there seems to be no difference.

  • Is one preferred over the other?

I prefer using inline fn because I think it’s more readable.
It also seems to be the preferred version in the compiler source code and standard library.
There is over 400 uses of inline fn and only 4 uses of callconv(.Inline)

  • Is it really necessary or is it best to let the compiler decide when to inline a function?

Generally the llvm optimizer is pretty good at inlining functions.
However manual function inlining is already happening in semantic analysis and is independant of the build mode.
So manual inlining can improve runtime of debug builds.
Additionally it might improve compile-time if the function is small enough or only used once.

However I would recommend to only manually inline if you have a good reason. But generally I guess it wouldn’t hurt to mark tiny functions as inline

By the way there is also a third way to inline functions using

@call(.always_inline, function, parameterTuple);

This allows you to decide at the callsite whether a function should be inlined.

8 Likes

It’s best to let the compiler decide when to inline a function, except for these scenarios:

  • You want to change how many stack frames are in the call stack, for debugging purposes
  • You want the comptime-ness of the arguments to propagate to the return value of the function
  • Performance measurements demand it. Don’t guess!

Otherwise you actually end up restricting what the compiler is allowed to do when you use inline which can harm binary size, compilation speed, and even runtime performance.

12 Likes

I only inline functions if they consist of mostly inline assembly or if they’re really small and are things like just a single return statement that computes a value. Otherwise I try to avoid inlining. I’d say that’s typically good advice generally.

Unfortunately also the documentation currently does not elaborate on this restriction:

Note that inline actually restricts what the compiler is allowed to do. This can harm binary size, compilation speed, and even runtime performance.

Are there good examples of what the compiler is allowed to do with a normal fn that it is not allowed to do with an inline fn?

yeah, it’s allowed to not inline it.

1 Like

Somewhat on a tangent, I think the compiler ought to issue a warning when the user-supplied main function got optimized down to nothing. Take the following code for example:

const call_count = 100_000_000;

fn add(a: u64, b: u64, c: u64) u64 {
    return a + b + c;
}

var sum: u64 = undefined;

pub fn main() void {
    var i: u64 = 0;
    while (i < call_count) : (i += 1) {
        sum = @call(.never_inline, add, .{ i + 1, i + 2, i + 2 });
    }
}

If you compile it at godbolt at ReleaseSmall or ReleaseFast, you’ll notice that the absence of main in the printout. The function gets tossed out due to the lack of side-effects. This could be quite misleading when someone’s trying to benchmark something.

That’s the exact type the compiler will inline for you, the ones you don’t need to help it with. It’s like saying you only help when it already knows the answer.

Just yesterday I was godbolting two versions of a function (the std.mem.eql has some weird code paths in it that look inefficient) and I had to use constantly slices from main, I tagged both functions as noinline, but since they were constants it didn’t inline the functions but it still constant folded them down down to just a zeroing of eax and ret. I could not get it to stop constant folding lol.

so even if the function isn’t tossed there are still things like that that should be warned about. I have more problems with constant folding throwing away code than the optimization passes getting rid of it all. std.mem.doNotOptimize doesn’t prevent constant folding which is a pain when trying to benchmark something.

std way over inlines and over unrolls. Horribly at time. But it does it backwards often when people do it themselves. They inline the wrong function all the time and hurt performance. Take this example:

pub fn outer(x: u32) u32 {
    if(x == 0)
        return inner(x);
    return x * 2;
}

inline fn inner(x: u32) u32 {
    // something complex like allocation or syscall
}

even though inner is only used in that one location, you do no want it inlined at that level.
And I’ve seen this a few times. If you force the inline here it probably can’t inline outer (not would you want it at that point) into its caller.

You really want outer to inline not inner. If inner happens rarely (this growing a data structure) you might even want to flag it as noinline, literallly the exact opposite. If you dont inline inner so the compiler is then free to inline outer, you can prevent a function call and keep your hotpath small (which will allow other inlines further out).

I use the noinline (setting the path cold might do the same thing) on those inner complex functions when I have a performance sensitive piece of code I want to keep small for the common case and don’t find the function call for the complex case (recently on a shared data structure where I have to do a bunch of locks to grab more data but it happens rarely so I bury it behind an if and noinline it and it allows outer to be extreme fast and optimize very well.

So don’t use inline. It you are thinking it is a small function that should be inline, the compiler already knows that. And the complex functions it has more optimization information than you do most of the time. The onus should be on the dev to show inline is needed not defaulting to inline and should be shown it isnt needed. I would suggest documenting why you are force inlining a function. zig’s inlining is semantic and it can change the behavior and types (helping the type analysis see through a function is one of few legitimate uses I can think of, but it should be documented by because it can get very confusing how it affects types and behavior).

(I have another rant on over unrolling and the loop stream detector getting trashed but that is for another thread)

1 Like
  • You want the comptime-ness of the arguments to propagate to the return value of the function

For an example of this, see std.mem.readInt.

Okay, that was more obvious than I thought it would be :sweat_smile: