Pass by value semantics

I just don’t understand why you need something like this. Where is the problem of typing *const T instead T where needed?

For me I mostly just follow the convention of pass by value or pointer already established for the type except for specific cases where something different is needed. Of course this only works if such a convention is already established.

But for most types I know beforehand if they are expected to get large or not and can decide on that. If they are passed by value and then, against my expectations get a bunch of fields I need to do one of two things: Rethink the type, API and maybe even subsystem around it, and split it up and structure it in a better way; or change functions to pass the pointer instead when the type can’t be split reasonably.

That’s it. This changing every function to a pointer is sometimes grudge work but grep and quickfix make it okay and teach me to next time think more clearly beforehand.

As long as you need to implement generics, you cannot anticipate the size of the target type. For the same data structure, which is better, passing by value or by reference, also differs on different architectures.

Of course, I know of a possible approach, which is to use noalias *const T everywhere and then hope for LLVM’s ArgPromotion. But as far as I know, extensive use of noalias *const T is not recommended in Zig. If extensive use of noalias *const T is considered good practice, I don’t mind not having PRO.

I tried to illustrate this in the opening post.
The problem is that any extern call becomes an optimization barrier in the presence of *const T.

Most of the std also works by value, eg std.ArrayList or std.HashMap.
And I think it should because the right call for *const T vs T is also architecture specific.

I think we could improve pass-by-value implementation by removing a few unneeded memcpy with a few simple heuristics.

Would implementing a Pro(comptime T: type) type that returns either *const T or T based on the size of T and the target be a good temporary solution for generic code? With it’s counterpart inline fn pro(ptr: anytype) Pro(@TypeOf(T)) that either returns the pointer or dereference it, hoping it’s rendered trivial enough for the compiler to optimize.

1 Like

“where needed” is the problem here, due to two cases:

Case 1:

fn frobnicate(T: type, value: T) void { ... }

This is a generic function, the caller picks T and there might be multiple of those. Eg, something like array_list.push(value) wants T if value is i32, but it wants *const T if the value is something much larger.

Case 2:

const Foo = struct {
    // ...
};

fn frobnicate(value: Foo) void { ... }
$ zig build-exe -target x86_64-linux frobnicate.zig
$ zig build-exe -target x86-linux    frobnicate.zig

Even for concrete code, the “cutoff point” where you should switch from T to *const T depends on the target CPU architecture. x86 has less architectural registers than x86_64, so there are some types that want to be *const T on x86 and T on x86_64.

2 Likes

Wouldn’t it then make more sense to just have another function like

fn frobnicateConst(T: type, value: *const T) void { ... }

This is some extra typing but communicates everything clearly and let’s the caller decide instead of doing it behind my back.

That’s true but at least for me it’s more about the size class. Copying 8 or 16 bytes by value is almost always fine re. Doing it with 32 or more bytes is more questionable so I would default to pointer. Of course it depends on the target but we likely all have some general assumptions of where the code will run and can adjust roughly to that. And if the last percentages of performance are needed you really can’t do anything general anyway.


Maybe my context is too multithreaded or something, but I wouldn’t like it if things I’ve said to be passed by value aren’t actually copied and because of that I get some issues in optimistic concurrency contexts. Having PRO would then need an extra annotation to force the compiler to really pass it by value and disallow it. But this, at least to me, seems backwards.

1 Like

Technically, language concepts/abstractions like variables do not equate to data specifically existing on the stack or in any particular location, right?

The compiler could load the value as an immediate into a register, and ultimately never need to copy it into the stack, or observe the copy is unnecessary and elide it.

1 Like

PRO only applies to copy elision at function call boundaries. In any case, there will always be an explicit copy across thread boundaries, and this copy cannot be omitted under any circumstances. The copy at the parameter passing is generally redundant, meaning that without PRO, you would make a copy when crossing the thread boundary, and then make an additional redundant copy when passing parameters to a function. PRO only concerns itself with redundant copies during parameter passing; it does nothing about copies across threads.

I think we might be talking past each other here. I’m not talking about passing arguments to a thread during creation, which, as you noted, involves manually copying them to the new stack. I’m talking about the shared data structures that are accessed during execution.

Consider something like a Seqlock, where the entire premise relies on taking a local snapshot by value, verifying it, and then working on that snapshot.

const LargeState = struct { data: [1024]u8 };
var shared_state: LargeState = undefined;
var seqlock: atomic.Value(usize) = .init(0);

fn doWork() void {
    var local_snapshot: LargeState = undefined;
    while (true) {
        const seq1 = seqlock.load(.acquire);
        if (seq1 % 2 != 0) continue;
        local_snapshot = shared_state;
        const seq2 = seqlock.load(.acquire);
        if (seq1 == seq2) break;
    }
    process(local_snapshot);
}

fn process(state: LargeState) void { ... }

In this context, the copy isn’t redundant, as you said - it’s load bearing for thread safety. With PRO the optimizer might look at this and decide to make the local_snapshot a *const LargeState.

If PRO does this copy elision, process is now operating directly on the live shared_state behind the scenes. Another thread can easily modify that shared_state while process is running, leading to data races and other things even after the seqlock validation was already finished.

To prevent this under PRO, you would have to forcefully defeat the optimizer using compiler barriers just to guarantee that pass by value actually does pass by value.

Another example could be just a simple SPSC ring buffer where some thread writes to it and another reads from it.

var ring: [8]LargeMessage = undefined;

fn pop(self: *RingBuffer) LargeMessage {
    self.mutex.lock();
    defer self.mutex.unlock();

    const msg = self.ring[tail];
    self.tail = (tail + 1) % self.ring.len;
    return msg;
}

fn worker() void {
    while (true) {
        // We pop the message and pass it directly.
        // Semantically, `pop` returns an independent copy by value,
        // so we don't need to hold the lock during processing.
        process(shared_ring.pop());
    }
}

fn process(msg: LargeMessage) void { ... }

Again PRO would easily break this because the writing thread can just overwrite the value we’ve just “freed” within the pop.

And these are just simple cases in largely self containing datastructures. Imagining this to large and complex codebases explains, at least to me, why the Zig devs scrapped this.

This is impossible; you have a fundamental misunderstanding of PRO. PRO only acts on parameter passing, and local_snapshot = shared_state; will definitely perform a full copy. It cannot make the parameter passed into local_snapshot become a reference to shared_state.

const msg = self.ring[tail] It is impossible to be omitted by PRO (because the scenario here is not about parameter passing at all). If a problem occurs here, it is considered a compilation error.

Of course, zig’s PRO used to be quite bad and completely ignored the aliasing issue. Even so, it will not have an impact here.