How do I know when return value optimization is possible?

Idiomatic init() functions return the new object by value, and i guess in some cases this is elisioned, but apparently not always. For example here…

const print = @import("std").debug.print;

pub const BigStruct = struct {
    data: [16]u64,
    pub fn init(v: u64) BigStruct {
        const f: BigStruct = .{ .data = .{v} ** 16 };
        print("{*}\n", .{&f});
        return f;
    }
};

pub fn main() void {
    const f: BigStruct = .init(42);
    print("{*}\n", .{&f});
}

…two different addresses are printed, which means the whole object was unnecessarily copied.

For cases like this, it’s probably better to have an in-place init function (fn init(self: *BigStruct, v: u64) void), but how can i tell when this is necessary and copy elision is not possible?

Why are return-by-value init functions even so common in zig and used everywhere in the std library, as opposed to C and C++ where you would normally initialize objects in-place?

1 Like

I’m guessing you compiled without optimization.
Such a function wouldn’t even need RVO, it would probably get inlined.

Even compiled it does seem to do a copy, which is weird, maybe it’s because of the slice. I’ve doubled check on godbolt and it seems like it.

print in init breaks the return value optimization.

The following works:

pub const BigStruct = struct {
    data: [16]u64,
    pub fn init(v: u64) BigStruct {
        const f: BigStruct = .{ .data = .{v} ** 16 };
        return f;
    }
};

pub fn main() void {
    const f: BigStruct = .init(42);
    _ = f;
}

The only way to validate is to view the output assembly:

        lea     rdi, [rbp - 128]
        mov     edx, 42
        call    example.BigStruct.init
4 Likes

Your Godbolt link still does not set -OReleaseFast. If you set the flag, you can easily see full-on inlining being applied to OP’s code:

1 Like

Yes, but even though it is inlining the function with -OReleaseFast, the struct is still copied:

        vmovups ymm0, ymmword ptr [rbp - 144]
        vmovups ymm1, ymmword ptr [rbp - 112]
        vmovups ymm2, ymmword ptr [rbp - 80]
        vmovups ymm3, ymmword ptr [rbp - 48]
        vmovups ymmword ptr [rbp - 176], ymm3
        vmovups ymmword ptr [rbp - 208], ymm2
        vmovups ymmword ptr [rbp - 240], ymm1
        vmovups ymmword ptr [rbp - 272], ymm0

I think it is because of this issue: result location: ability to refer to the return result location before the `return` statement · Issue #2765 · ziglang/zig · GitHub

2 Likes

I’m not sure that’s the issue. It looks like the external call to print is throwing the optimizer off. If we replace it with a function where the compiler has full visibility, everything behaves as expected (godbolt)

pub const BigStruct = struct {
    data: [16]u64,
    pub fn init(v: u64) BigStruct {
        const f: BigStruct = .{ .data = .{v} ** 16 };
        const g = f.useBigStruct();
        return g;
    }

    pub fn useBigStruct(b: @This()) @This(){
        var c = b;
        for(&c.data) |*p|{
            p.* += 1;
        }
        return c;
    }
};
vbroadcastsd    ymm0, qword ptr [rip + .LCPI3_0]
vmovups ymmword ptr [rbp - 256], ymm0
vmovups ymmword ptr [rbp - 224], ymm0
vmovups ymmword ptr [rbp - 192], ymm0
vmovups ymmword ptr [rbp - 160], ymm0

The same poorly optimized code happens with an equivalent C code compiled with GCC: godbolt.
It could be that the compiler thinks print is going to modify the BigStruct, even though it’s marked as const.

1 Like