Pass by value semantics

I’m currently a bit confused about pass by value semantics in Zig 0.16.0.

From docs: Documentation - The Zig Programming Language

Primitive types such as Integers and Floats passed as parameters are copied, and then the copy is available in the function body. This is called “passing by value”. Copying a primitive type is essentially free and typically involves nothing more than setting a register.

Structs, unions, and arrays can sometimes be more efficiently passed as a reference, since a copy could be arbitrarily expensive depending on the size. When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster. This is made possible, in part, by the fact that parameters are immutable.

Immutable has a very strong meaning in compiler world and it particular it means that nothing can change it. So calling an external function magic() can’t modify the argument. This is not true for a: *const A argument because magic() may have a mutable version of the pointer and modify a. This is quite important for optimization to happen properly in face of extern calls.

You can see this in the following snippet: Compiler Explorer
Despite the extern call, the check is moved outside of the loop and the loop is unrolled twice.
This is great: optimization happens despite the extern call.

But as soon as you introduce the len parameter in the math function, the optimization is lost, and on top of that A is copied with a memcpy.

Given that pass by value arguments are immutable, why is Zig introducing a memcpy ?

My understanding was that the memcpy in 0.15.2 was a stopgap for the PRO footgun, but this was apparently removed in 0.16 according to @mlugg eliminate hidden pass-by-reference footguns · Issue #5973 · ziglang/zig · GitHub

1 Like

This was PRO, which was removed. The doc is outdated.

I don’t think the len parameter had anything to do with the change in optimizations.

1 Like

Maybe it’s me, but it looks like A is copied in both versions?

        mov     edx, 8200
        call    memcpy@PLT

This snippet is present regardless of which two lines is commented out.

For the docs, I think it is sub optimally worded. First, it confuses language semantics with implementation strategy. Second, it paints a misleading picture of the actual implementation strategy. Opening up with a difference between primitives and structs is a sure way to confuse the user.

I’d phrase it like this:

Parameters are passed by value: a function gets an independent copy of a parameter that can’t be modified externally:

litmus test example with a global var.

Of course, the implementation is free to follow the “as if” rule and eliminate the copy when that doesn’t change program’s semantics and leads to faster code. Notably, most function calls are inlined completely, and there’s no parameter passing to speak of.

This phrasing omits the “const parameters enable optimizations” part, but I think that logical inference is incorrect. You could imagine Zig where the semantics is mutable var, but the implementation is the same as today. Basically, you write fn f([var] x: u32) void, and compiler internally thinks about it as

fn ([const] x_original: u32) void {
    var x: u32 = x_original;
}

and that extra copy inside the function can be eliminated by the same logic which today checks for “useless var”.

That is, on the implementation level, keeping parameter ABI const is helpful when passing arguments through multiple level of the call, and, arguably, this is the right ABI for extern functions. So, this is a performance bug in the C ABI that parameters are passed by references, and the caller is allowed to modify their memory, such that modifications remain visible to callee. But, again, this is implementation strategy concern, not language semantics concern, even if aligning the two leads to a simpler compiler.

5 Likes

I think it was there because PRO was a big selling point for Zig. Describing it like you did, with the “as-if” rule, would undersell it, because that’s what every other compiler does. And it mentions const because it was also trying to sell const parameters.

My bad I posted the wrong snippet. The size of the data array seems to matter wrt inlining.

My bad, I posted the wrong snippet. When using [32]u64 the memcpy is ellided when len is not used. (I updated OG post with link)

Ok I’ve tracked it back. The change happened between 0.9 and 0.10. Before 0.9 big structs are passed by immutable reference, but since 0.10 they are passed by copy and hoping LLVM does magic. Which is very brittle as in my example where increasing the struct size of adding a parameter breaks the optimization. My mental model seems quite outdated.

This seems to actively discourage passing by value in favor of passing by pointer.
Makes me a bit sad. For me this was the power of Zig having the advantage of Rust immutable references without paying the cost.

Are there plans to bring this back now that it’s harder to accidentally alias argument and return values ?

2 Likes

It’s more like memcpy is inlined, rather than elided:

        movups  xmm0, xmmword ptr [rdi]
        movaps  xmmword ptr [rbp - 176], xmm0
        movups  xmm0, xmmword ptr [rdi + 16]
        movaps  xmmword ptr [rbp - 240], xmm0
        movups  xmm0, xmmword ptr [rdi + 32]
        movaps  xmmword ptr [rbp - 112], xmm0

This is doing memcpy, one SSE register at a time.

1 Like

Thanks.

I’ve confirmed with another example:

pub const A = struct {
    data: [1024]u64,

    noinline fn wrap2(a: A) u64 {
        return a.wrap1();
    }

    noinline fn wrap1(a: A) u64 {
        return a.math();
    }
    
    noinline fn math(a: A) u64 { 
        var res: u64 = 0;
        for (a.data[0..]) |x| {
            res += x;
        }
        return res;
    }
};

pub export fn math(a: *const A) u64 {
    return a.wrap2();
}

Every non-inlined call result in a memcpy. I find this pretty bad TBH.
Are there plans to improve this in Zig ? or are we stuck in the C world ?

Parameter Reference Optimization (PRO) and Result Location Semantics (RLS) were intended to address this, but they didn’t work out. It turned out to be hard to find a set of rules that:

  • enables the compiler to pick by-reference or by-value on the case-by-case basis
  • is easy for human to get right
  • is easy for compiler to safety-check when the human gets it wrong
  • doesn’t require “borrow checker” (complex flow sensitive local type inference + verbose non-local annotations)

Note that “borrow checker” might be necessary, but is not sufficient here. Rust doesn’t do the optimal thing, because they way you specify “don’t care” parameter passing is via &T, but that actually guarantees, at the level of language semantics, that the address of T is observable and meaningful. The compiler can’t just turn this into by value, in general.

For me, this is also the case where I feel Zig’s not quite perfect: I want all “memcpy to avoid aliasing” to be explicit in the code, both for aesthetic and for practical reasons (there were numerous bugs in TigerBeetle where we accidentally memcpyed killobytes of data via implicit copies added by the compiler).

Carbon and Hylo are two languages which I think try to make a smarter choice here, but I don’t know the current state. Would appreciate a “Passing Arguments” PLT/compiler design post with a deep dive about what we know and what we don’t know about the topic!

15 Likes

(LLM translation warning: My posts on the Ziggit forum are basically based on machine translation, but this time the machine translation really feels somewhat terrible.)

I’ve given this issue a lot of thought, and I believe PRO is theoretically feasible. The current hurdles seem to stem primarily from LLVM’s limitations and the lack of a true “immutable pointer” concept in Zig, similar to what D has.

Under different targets, the parameter passing ABI for large structs varies. Windows, for instance, passes all large structs by reference. This forces the caller to temporarily copy the value to the stack and pass a pointer to that location. Linux takes a different approach, pushing large structs directly onto the stack.

While the Linux convention feels more elegant since it avoids redundant indirection, Windows’ approach is actually more flexible for optimization. The caller has much more context about whether the arguments themselves are mutable or aliased than the callee does. Therefore, leaving the decision of whether an extra stack copy is needed up to the caller makes sense.

If we ignore the target’s default ABI and look purely at callconv(.auto) for internal functions, we could theoretically adopt a Windows-like pass-by-reference convention for large structs. The caller could naturally apply aggressive optimizations, and the ideal semantic of “const parameters don’t require separate copies” could be fully realized.

However, I quickly hit a roadblock. If our argument originates from a *const T, the compiler cannot safely optimize it. It has to defensively assume the underlying value might be mutated due to aliasing. In practice, we frequently pass around *const T to indirectly reference const parameters, which immediately defeats the PRO optimization.

This happens because semantic information is lost when taking a pointer. What starts as a strictly immutable value within a lifetime degrades into a read-only view that might change under the hood. I understand the value of *const T for expressing read-only views, but true immutable semantics are lost here. This led me to D’s concept of immutable pointers, and I think officially introducing a similar concept to Zig would be helpful.

The approach above was my initial thought. Optimizations in that direction are purely semantic-driven and don’t rely on LLVM’s backend machinery. The benefit of semantics-driven development is that developers can predict whether a certain optimization will definitely occur. E.g., if we pass a const position as a parameter, developers can predict that there will definitely be no extra expensive copies here.

However, when I considered how to actually implement this within the LLVM pipeline, I hit a wall again.

LLVM is perfectly equipped to decide exactly “which parameters are better passed by reference and which should be passed by value” However, if we leave the final ABI decision entirely to the backend, the frontend loses the ability to know the actual function signature, making semantic-driven optimizations impossible.

This led me to a second approach for the callconv(.auto) convention. What if the frontend lowers all internal function parameters to pointers, decorates them with readonly noalias? The caller then decides, based on the parameter’s mutability, whether to pass a pointer to the original data or a pointer to a temporary stack copy. From there, we let LLVM’s ArgumentPromotion pass do the heavy lifting. Because the function is internal, LLVM can safely rewrite its signature. If the optimizer determines that a parameter is better passed by value, it will automatically promote the pointer to a value and update all local call sites.

This is what I can think of at the moment. I’m stiil curious about whether this is an optimal path forward or if there are hidden pitfalls.

1 Like

Having used D in the past, I am pretty skeptical about the usefulness of true immutability: relatively few things are immutable (nobody can write) as opposed to read-only (I can’t write), and of those which are, most are comptime. It also feels like we need something else here? We don’t need immutability, just a guarantee that the pointed to data won’t change while we are looking. That’s noalias, I think.

3 Likes

I think our usage of data types can generally be categorized into stateful and stateless. For stateful data, it is usually sufficient to use *T, and *const T represents the read-only nature of stateful data in a specific context. If a large amount of stateful type data is used, I can understand there might be doubts about the use of immutable pointers.

Our expectations for PRO activation mostly pertain to stateless data types. I believe such data should be const as much as possible and avoid using var whenever possible.

One counterargument might be that when such stateless type data is used in loops, it is difficult to avoid declaring it as mutable. However, I now use a paradigm in which by rewriting the loop as a state machine for its loop variable, I ensure that such data types always exist in const form:

const loop_start: Stateless = foo();
sw: switch(loop_start) {
    else => |bar| {
        if (is_loop_end(bar)) break :sw;
        ...
        continue :sw baz(bar);
    },
}
1 Like

Out of curiosity, what about without LLVM, with the native (x86) backend - is there any known characteristic difference?

I second @matklad’s invitation to a “‘Passing arguments’ PLT/compiler design … deep dive” post by somebody in the core team.

1 Like

It actually reads very well.

I’m intrigued by this, and look forward to more about it. Agreed, caller clarity on whether a temporary stack copy is required is in the spotlight, but callee clarity, for the purpose of designing a function signature that doesn’t result in surprises (copies or aliasing) is also essential. Am I stating the overly-obvious?

AFAIK based on the last time I looked at the output generated by the native backend it doesn’t have any (or only very few) optimizations implemented. So every argument was copied naively and the prologue stored all callee saved registers regardless if used or not. Of course there also wasn’t any function inlining that could reduce the overhead introduced by that naive saving/copying.

Later when optimizations are implemented there likely won’t be a big difference for such things compared to llvm.

I find that very naive because what then happens when you don’t have optimizations enabled in debug builds? This would introduce a lot of unnecessary overhead and, so I think, lead to exact state C++ and Rust are in, that they are basically unusably slow without optimizations because they rely so much on pervasive inlining (especially because of the use of tiny getters/setters in things like vector)

2 Likes

This makes sense.

The reason I advocate for “optimization guaranteed by semantics” is precisely to ensure that even if optimizations are not enabled, certain large object copies are not expected to happen. However, always using noalias readonly pointers undermines the performance of non-optimized scenarios from another perspective.

To address this, a possible compromise could be for the compiler frontend to only pass by value for the most conservative cases, such as parameters smaller than usize, while for larger ones, use readonly noalias pointer passing and rely on further optimizations from the backend.

2 Likes

This also makes sense.

As far as I understand this was basically what Zig did in some earlier versions and it seemed to cause some problems, which is why it isn’t done anymore.

Other cases in which this might be problematic are return value optimization and copy-ellision passes when something like this is in the code:

process(calculate_matrix())
or
process(a, bunch).do(of, random).something(parameters).else()

This just makes the optimizers job harder. The programmer in my mind should be responsible to accurately annotate where something should be passed by value or const pointer.

1 Like

The core team tried for years to make it work, before giving up. If you want to try tackling this, you should start by looking at what they’ve tried. Remember that Zig uses the same parameters semantics as C. If this optimization were simple, LLVM would have it already.
With any idea regarding this, the first step is always to put it against the ArrayList test, which your idea would fail. The test is to append to the list an object which is already on the list:

pub fn append(list: *List, item: Item, allocator: Allocator) Allocator.Error!void{
  list.items = try allocator.realloc(list.items + 1);
  list.items[list.items.len - 1] = item;
}

// Somewhere else
try list.append(list.items[0], allocator);

Semantic-wise, this is correct, since the parameter is a copy of the item. But the optimization breaks it. If the realloc changes the location of the list.items, the pointer that you secretly passed to function is now dangling. Note that copying the list to stack would not fix it, because the copy would not be deep.

6 Likes

I didn’t fully understand. If item itself is a type of data that requires a deep copy, this should be a logical issue; semantically, it is an error, not a parameter-passing optimization problem.

In the semantic-driven optimization I envision, since list itself is mutable, list.item is also mutable. Therefore, it certainly won’t be optimized and will instead be copied to the stack during the frontend phase.

Based on the problem that occurred, I think Zig does not by default copy parameters to the stack before passing by reference, but passes by reference directly. And I thought only immutable data could be passed by reference directly; otherwise, it should be copied to the stack before passing by reference.

I think what they mean is something like this:

const std = @import("std");
const mem = std.mem;

const LargeItem = struct {
    data: [128]u8, // Large enough that a compiler might want to pass it by pointer
};

fn ownAppend(list: *std.ArrayList(LargeItem), gpa: mem.Allocator, item: LargeItem) !void {
    // This call might force the ArrayList to grow, freeing the old memory block
    try list.ensureUnusedCapacity(gpa, 1);
    // If the memory moved, 'item' is now a dangling pointer
    list.appendAssumeCapacity(item);
}

pub fn main(init: std.process.Init) !void {
    var list: std.ArrayList(LargeItem) = .empty;
    defer list.deinit(init.gpa);

    try list.append(init.gpa, LargeItem{ .data = [_]u8{42} ** 128 });
    // Value semantics guarantee this should be a safe snapshot copy.
    try ownAppend(&list, init.gpa, list.items[0]);
}

If you pass by pointer(reference) nothing will be copied onto the stack(except if you dereference it later onto a stack variable).

Honestly I don’t understand what you mean by that.

1 Like