I think we might be talking past each other here. I’m not talking about passing arguments to a thread during creation, which, as you noted, involves manually copying them to the new stack. I’m talking about the shared data structures that are accessed during execution.
Consider something like a Seqlock, where the entire premise relies on taking a local snapshot by value, verifying it, and then working on that snapshot.
const LargeState = struct { data: [1024]u8 };
var shared_state: LargeState = undefined;
var seqlock: atomic.Value(usize) = .init(0);
fn doWork() void {
var local_snapshot: LargeState = undefined;
while (true) {
const seq1 = seqlock.load(.acquire);
if (seq1 % 2 != 0) continue;
local_snapshot = shared_state;
const seq2 = seqlock.load(.acquire);
if (seq1 == seq2) break;
}
process(local_snapshot);
}
fn process(state: LargeState) void { ... }
In this context, the copy isn’t redundant, as you said - it’s load bearing for thread safety. With PRO the optimizer might look at this and decide to make the local_snapshot a *const LargeState.
If PRO does this copy elision, process is now operating directly on the live shared_state behind the scenes. Another thread can easily modify that shared_state while process is running, leading to data races and other things even after the seqlock validation was already finished.
To prevent this under PRO, you would have to forcefully defeat the optimizer using compiler barriers just to guarantee that pass by value actually does pass by value.
Another example could be just a simple SPSC ring buffer where some thread writes to it and another reads from it.
var ring: [8]LargeMessage = undefined;
fn pop(self: *RingBuffer) LargeMessage {
self.mutex.lock();
defer self.mutex.unlock();
const msg = self.ring[tail];
self.tail = (tail + 1) % self.ring.len;
return msg;
}
fn worker() void {
while (true) {
// We pop the message and pass it directly.
// Semantically, `pop` returns an independent copy by value,
// so we don't need to hold the lock during processing.
process(shared_ring.pop());
}
}
fn process(msg: LargeMessage) void { ... }
Again PRO would easily break this because the writing thread can just overwrite the value we’ve just “freed” within the pop.
And these are just simple cases in largely self containing datastructures. Imagining this to large and complex codebases explains, at least to me, why the Zig devs scrapped this.