Pass by value semantics

I just don’t understand why you need something like this. Where is the problem of typing *const T instead T where needed?

For me I mostly just follow the convention of pass by value or pointer already established for the type except for specific cases where something different is needed. Of course this only works if such a convention is already established.

But for most types I know beforehand if they are expected to get large or not and can decide on that. If they are passed by value and then, against my expectations get a bunch of fields I need to do one of two things: Rethink the type, API and maybe even subsystem around it, and split it up and structure it in a better way; or change functions to pass the pointer instead when the type can’t be split reasonably.

That’s it. This changing every function to a pointer is sometimes grudge work but grep and quickfix make it okay and teach me to next time think more clearly beforehand.

As long as you need to implement generics, you cannot anticipate the size of the target type. For the same data structure, which is better, passing by value or by reference, also differs on different architectures.

Of course, I know of a possible approach, which is to use noalias *const T everywhere and then hope for LLVM’s ArgPromotion. But as far as I know, extensive use of noalias *const T is not recommended in Zig. If extensive use of noalias *const T is considered good practice, I don’t mind not having PRO.

I tried to illustrate this in the opening post.
The problem is that any extern call becomes an optimization barrier in the presence of *const T.

Most of the std also works by value, eg std.ArrayList or std.HashMap.
And I think it should because the right call for *const T vs T is also architecture specific.

I think we could improve pass-by-value implementation by removing a few unneeded memcpy with a few simple heuristics.

Would implementing a Pro(comptime T: type) type that returns either *const T or T based on the size of T and the target be a good temporary solution for generic code? With it’s counterpart inline fn pro(ptr: anytype) Pro(@TypeOf(T)) that either returns the pointer or dereference it, hoping it’s rendered trivial enough for the compiler to optimize.

3 Likes

“where needed” is the problem here, due to two cases:

Case 1:

fn frobnicate(T: type, value: T) void { ... }

This is a generic function, the caller picks T and there might be multiple of those. Eg, something like array_list.push(value) wants T if value is i32, but it wants *const T if the value is something much larger.

Case 2:

const Foo = struct {
    // ...
};

fn frobnicate(value: Foo) void { ... }
$ zig build-exe -target x86_64-linux frobnicate.zig
$ zig build-exe -target x86-linux    frobnicate.zig

Even for concrete code, the “cutoff point” where you should switch from T to *const T depends on the target CPU architecture. x86 has less architectural registers than x86_64, so there are some types that want to be *const T on x86 and T on x86_64.

5 Likes

Wouldn’t it then make more sense to just have another function like

fn frobnicateConst(T: type, value: *const T) void { ... }

This is some extra typing but communicates everything clearly and let’s the caller decide instead of doing it behind my back.

That’s true but at least for me it’s more about the size class. Copying 8 or 16 bytes by value is almost always fine re. Doing it with 32 or more bytes is more questionable so I would default to pointer. Of course it depends on the target but we likely all have some general assumptions of where the code will run and can adjust roughly to that. And if the last percentages of performance are needed you really can’t do anything general anyway.


Maybe my context is too multithreaded or something, but I wouldn’t like it if things I’ve said to be passed by value aren’t actually copied and because of that I get some issues in optimistic concurrency contexts. Having PRO would then need an extra annotation to force the compiler to really pass it by value and disallow it. But this, at least to me, seems backwards.

2 Likes

Technically, language concepts/abstractions like variables do not equate to data specifically existing on the stack or in any particular location, right?

The compiler could load the value as an immediate into a register, and ultimately never need to copy it into the stack, or observe the copy is unnecessary and elide it.

1 Like

PRO only applies to copy elision at function call boundaries. In any case, there will always be an explicit copy across thread boundaries, and this copy cannot be omitted under any circumstances. The copy at the parameter passing is generally redundant, meaning that without PRO, you would make a copy when crossing the thread boundary, and then make an additional redundant copy when passing parameters to a function. PRO only concerns itself with redundant copies during parameter passing; it does nothing about copies across threads.

I think we might be talking past each other here. I’m not talking about passing arguments to a thread during creation, which, as you noted, involves manually copying them to the new stack. I’m talking about the shared data structures that are accessed during execution.

Consider something like a Seqlock, where the entire premise relies on taking a local snapshot by value, verifying it, and then working on that snapshot.

const LargeState = struct { data: [1024]u8 };
var shared_state: LargeState = undefined;
var seqlock: atomic.Value(usize) = .init(0);

fn doWork() void {
    var local_snapshot: LargeState = undefined;
    while (true) {
        const seq1 = seqlock.load(.acquire);
        if (seq1 % 2 != 0) continue;
        local_snapshot = shared_state;
        const seq2 = seqlock.load(.acquire);
        if (seq1 == seq2) break;
    }
    process(local_snapshot);
}

fn process(state: LargeState) void { ... }

In this context, the copy isn’t redundant, as you said - it’s load bearing for thread safety. With PRO the optimizer might look at this and decide to make the local_snapshot a *const LargeState.

If PRO does this copy elision, process is now operating directly on the live shared_state behind the scenes. Another thread can easily modify that shared_state while process is running, leading to data races and other things even after the seqlock validation was already finished.

To prevent this under PRO, you would have to forcefully defeat the optimizer using compiler barriers just to guarantee that pass by value actually does pass by value.

Another example could be just a simple SPSC ring buffer where some thread writes to it and another reads from it.

var ring: [8]LargeMessage = undefined;

fn pop(self: *RingBuffer) LargeMessage {
    self.mutex.lock();
    defer self.mutex.unlock();

    const msg = self.ring[tail];
    self.tail = (tail + 1) % self.ring.len;
    return msg;
}

fn worker() void {
    while (true) {
        // We pop the message and pass it directly.
        // Semantically, `pop` returns an independent copy by value,
        // so we don't need to hold the lock during processing.
        process(shared_ring.pop());
    }
}

fn process(msg: LargeMessage) void { ... }

Again PRO would easily break this because the writing thread can just overwrite the value we’ve just “freed” within the pop.

And these are just simple cases in largely self containing datastructures. Imagining this to large and complex codebases explains, at least to me, why the Zig devs scrapped this.

This is impossible; you have a fundamental misunderstanding of PRO. PRO only acts on parameter passing, and local_snapshot = shared_state; will definitely perform a full copy. It cannot make the parameter passed into local_snapshot become a reference to shared_state.

const msg = self.ring[tail] It is impossible to be omitted by PRO (because the scenario here is not about parameter passing at all). If a problem occurs here, it is considered a compilation error.

Of course, zig’s PRO used to be quite bad and completely ignored the aliasing issue. Even so, it will not have an impact here.

Is there a very kind someone that could help me figure out where the “copy function argument on the stack” is implemented in Zig ?

I was thinking it would be in AIR, but my current understanding is that it’s actually delegated to each backend, eg in LLVM backend:

https://codeberg.org/ziglang/zig/src/commit/b3747dd707eeaf06e0bacbde259bf71ef525a961/src/codegen/llvm.zig#L1303-L1308

This makes it harder to implement a copy-ellision pass at the AIR level.

1 Like

This is a question better-suited for the Zulip, which is more likely to have people with a knowledge of AIR around to answer.

1 Like

I think the key advantage is predictability. When the language semantics guarantee that immutable, stateless values can be passed without extra overhead, users don’t have to depend on whether a particular optimization pass is enabled. That becomes especially valuable in generic code, where performance characteristics can otherwise be harder to reason about.

I’ve read a bit more on the topic and now understand the confusion I had, though I still have doubts if PRO is really a good idea, both in general and in the context of Zig.

Yes I now understand, thank you.

However I think, the hazard still remains if we pass a shared or global variable directly to a function. If a developer writes a call like process(shared_state) expecting the value to be copied at the function boundary, PRO could silently optimize this to pass a pointer instead of copying it. And since the value a pointer points to can be overwritten at any time in zig this could get problematic. Yes this could be a race condition, so anything goes, but also could not be because of some more complex multithreading schemes.

I think that to write thread-safe code under a language with PRO, developers would have to defensively assign every shared or global variable to a local stack variable before passing it to any function by value, which would make that language for me basically unusable.


Another thing, which is a bit more meta, is, that I don’t like the distinction between value and pointer[1] passing. I prefer the mental model that everything is passed by value, be that a “literal value” or an address. When a language attempts to blur this distinction by dynamically switching between copying a struct and passing its address under the hood, it breaks the direct mapping to the hardware. A systems language is much easier to reason about when pass-by-value guarantees a physical copy, and passing a pointer guarantees we are operating on an address.


  1. I dislike the term reference for this even more. At least for me that’s a higher abstraction level than a pointer, which is just an address(and provenance,…). ↩︎

2 Likes

Even without PRO, this is very unsafe if there is no protection against mutation in one thread while accessing it in another thread. The copy could be made in one thread while being mutated in another thread, because copying is not atomic.

Shared mutable state must always be protected in some way (a mutex, etc). If the copy is done before the call using a mutex, for example, then there is no problem with or without PRO. The same is true if a mutex is held during the call.

2 Likes

Any real examples? I suspect that the issue you’re discussing would exist at the function call boundary Of course, even if they do exist, under the ‘PRO determined by parameter variability’ that I envision, this cannot be a problem, because any shared variables are mutable rather than immutable. For me, it is already sufficient that PRO applies to constants.

I think the core realization is that const in Zig is an access control and not a physical property of the underlying memory. A const pointer only tells the compiler, that this specific path cannot mutate the data. And even that assumption is wrong in light of things like @constCast and inline assembly or just calling to some opaque function. Of course the underlying memory isn’t immutable even if we disallow those things. For that we can have concurrent threads, OS operations, like another thread/process having a writable map to our read-only pages or us just remapping the thing as r|w.

This makes any pointer-based data sharing in concurrent environments highly susceptible to hidden compiler transformations. Take this example:

fn foo(state: LargeShared) void {
    if (!state.is_done) {
         std.debug.print("State isn't done\n", .{});
    }
    state.is_done = true;
    // A concurrent thread resets the global state back to false here.
    if (!state.is_done) {
         std.debug.print("State isn't done\n", .{});
    }
}

If the compiler guarantees physical pass-by-value, state is a private, isolated stack copy. The second check is guaranteed to see local.is_done == true, and the message can never print twice.

But with PRO and copy elision, the compiler might optimize state to be a direct alias of the shared global memory. If a concurrent thread resets the global state back to false mid-execution, the second check reads the live global memory. The function prints “State isn’t done” twice, violating basic single-threaded control flow expectations. A basic non-repeatable read[1].

Of course in a typical program this wouldn’t be written as clearly as here. But nested into functions across the entire system. I think it’s easy to imagine that finding such a thing when compiling can be very hard and the outcome can be very bad. So the compiler has to be pessimistic and will likely disallow most things. Then the question is: Why even have such an “optimization” if it can’t be done reliably?


  1. For anyone who hasn’t done so: It can be really worthwhile looking into the database literature for ideas about concurrency control and problems with transactions. ↩︎

At this point you have invented your own version of PRO and then complain about it.

The main thing I want to get optimize is chaining function that passes the same struct three or four times down the callstack without having to worry if any of those function will fail to inline, and triggering a copy.

Yes the first call of the chain may need a copy but not after that. This is a common pattern in eg std library. Look at ArrayHashMap.get it passes the key and the map by value down 4 functions. I don’t want this resulting in 4 copies. This mostly don’t happen, but I have seen fail because of unecessary complicated code in hashmap leading to a failed inlining. (I have a PR open, but I feel there is a tension between idiomatic Zig and what the compiler can optimize correctly)

Please note: Unlike in C, in Zig, function parameters themselves are const and cannot be modified, so this code cannot compile because it modifies the parameter itself. If you want to modify the parameter itself, in Zig, the only way is to create a copy of the parameter as a var inside the function and then modify that copy.