Should I pass constants by reference?

echoptic · March 8, 2024, 9:54am

In the standard library I always see that structs are passing self like this: self: Self, not like self: *const Self, so Im wondering should I also do that for passing all structs into functions, for example i have a function that takes an immutable struct, should I pass it like value(param: Struct) or like pointer(param: *const Struct)? Also when returning from functions, should I return value or pointer to value. I dont care about lifetimes in this case, just curious in general. Is there any cost or benefit of doing either it way? It would be logical to me that I should always pass by using *const

tensorush · March 8, 2024, 10:04am

Hi! This should clarify the function argument passing semantics:
Note: self is just a naming convention for the “receiver” parameter. It could be different, like this, but for convenience and conformity everyone uses self.

echoptic · March 8, 2024, 11:03am

Im curious about which one i should default to, pass structs by value or reference for constant params, from what I read, it seems like the compiler can decide to do whats best, but is that gonna change?

tensorush · March 8, 2024, 11:34am

Well, as you’ve noted from std lib, the commonly adopted default is to pass by value.

marler8997 · March 28, 2024, 5:32am

I found the C programmer!

Zig is a little unique here, when a function takes a parameter by value, it’s free to optimize that into a reference under the hood. My general advice is to do the simplest thing, just pass and return everything by value until you can’t.

Some situations where you cant, is if you need modify the contents of Self, then you’ll need to take *Self. It’s pretty rare that you’d need to take in *const Self, but there are some situations.

There has been 1 time where I had a very large type and Zig wasn’t optimizing passing it around as a reference in Debug mode and it was causing a huge slow down that I had to force it to be a reference. In general though it’s not something to worry about.

anticrisis · August 13, 2024, 11:54pm

In C++ land the general advice is to pass structs by value if their size is less than “a couple of pointers.” Otherwise pass by const reference.

The advice is vague on purpose: it’s not possible in general to make a “correct” decision because the performance characteristics of your code will vary greatly from a compiler writer’s assumptions. As a result, I don’t think C++ compilers override the programmer’s selection of how to pass a const struct, even though they could without violating any semantic assumptions.

I don’t know if zig’s compiler will make different decisions, but I imagine that doing so would violate some of its principles about simple and easy to understand code.

squeek502 · August 14, 2024, 12:21am

If I understand your post correctly, Zig has made different decisions. From here:

Pass-by-value Parameters

[…]

Structs, unions, and arrays can sometimes be more efficiently passed as a reference, since a copy could be arbitrarily expensive depending on the size. When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster.

anticrisis · August 14, 2024, 2:33am

Thank you for the reference. The document says “may,” which is appropriate for the language design. Semantically both ways of passing arguments are identical.

For a low-level performance-oriented compiler, I suggest the programmer should be able to force pass-by-value even if the value is larger than the compiler’s view of what is optimal. It’s fine for the compiler to have some default behaviour based on its best guess, as long as it can be overridden a programmer who has performance-tested their own code in their own operating conditions.

I haven’t looked at generated code in Debug and Release builds yet as others have done so I’m not informed enough to comment further. For now, I’d just say it’s nice to have the level of control C and C++ give us for parameter passing, even if it’s a bit of a learning curve for many people.

dimdin · August 14, 2024, 3:42am

I agree. Similar to forcing by reference with *const T there must be a way to force by value T.
This is also a solution to the aliasing problem.

squeek502 · August 14, 2024, 3:49am

Do you take this position for all compiler optimizations? I’m unsure why this one should be singled out.

With that said, there are currently known bugs with this optimization, so (as mentioned in previous comments in this thread) using explicit *const T is sometimes an improvment:

sometimes there is an unwanted memcpy when passing large structs by-value · Issue #17580 · ziglang/zig · GitHub
Pass by reference "optimization" copies the entire struct on the stack when taking its address. (this didn't happen in stage1) · Issue #16343 · ziglang/zig · GitHub

anticrisis · August 14, 2024, 4:06am

That’s a good question. I’m open to the possibility that Zig will make different choices about optimisation than those made by C and C++ committees and compiler makers, but removing any bit of control over how data gets moved around in memory versus the control available in those languages would need to be very carefully considered. Otherwise there will always be a case to be made that Zig is not suitable versus C or C++ in a particular use case for performance reasons, and that would be a shame.

(Edited to clarify that a goal for Zig ought to be that performance reasons should not be a reason to select C/C++ over Zig. There may be other valid reasons.)

mnemnion · August 14, 2024, 3:01pm

Semantically both ways are not identical. *const T guarantees that the parameter will have the identity of the passed value, T does not. This can make a real difference, although mostly, it won’t.

That last part being some of the problem, but not all of it. There was a conversation here awhile back about the difficulties of forcing Zig to make a copy of a struct, and why this can matter, it’s less to do with parameters and more to do with return values: but a parameter and a return value are sometimes one and the same.

Worth pointing out that for one central optimization, inlining, low-level languages, Zig included, do offer the ability to force the issue one way or the other. I do think that “same memory region” and “different memory region” represents another case where code should be able to insist on either outcome, and in status quo, it’s tricky to consistently get the latter.

I suggested a @copy builtin as a brute-force solution, but it would be nice to have a more elegant answer to this problem and the aliasing issues which prompt it. This one is a “trust the plan” sort of thing for me, Andrew has said a few times that he has some good ideas for how to solve the aliasing problem, and that’s enough for me for now.

chung-leong · August 14, 2024, 4:07pm

I think he meant that when the compiler substitute pass-by-value with pass-by-reference, it does so without introducing semantic changes. The following code should always produce two distinct addresses:

const std = @import("std");

const Struct = struct {
    numbers: [4]usize = .{ 0, 0, 0, 0 },
};

fn hello(arg: Struct) void {
    std.debug.print("Address in function: {x}\n", .{@intFromPtr(&arg)});
}

pub fn main() void {
    const s: Struct = .{};
    std.debug.print("Address in main: {x}\n", .{@intFromPtr(&s)});
    hello(s);
}

Address in main: 101e300
Address in function: 7ffc2d69aeb8

We can verify that the struct is passed by ref in Godbolt:

mnemnion · August 14, 2024, 6:21pm

I don’t think that’s correct though. Clearly it did in that case, but the docs are pretty clear on this one:

Structs, unions, and arrays can sometimes be more efficiently passed as a reference, since a copy could be arbitrarily expensive depending on the size. When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster. This is made possible, in part, by the fact that parameters are immutable.

So a larger struct than that would report the same pointer, for some value of “larger”. The point is that you can’t rely on one or the other.

It only makes a practical difference in cases where reference semantics and result location semantics clash due to aliasing. That’s all well-trodden ground, the important point is that the semantics of passing by constant pointer are “this memory region will appear in the function” and the semantics of passing by reference is “the compiler chooses”, and it’s worthwhile not to conflate those things.

The missing point on the triangle is insisting on a copy, and maybe the solution to the Attack of the Killer Features will mean we don’t really need that one. My hunch is that as long as constant references to mutable state exist (and sometimes you need that), forcing a copy will be essential. Of course you can allocate heap for the copy and copy it, that will always work. With stack memory things are not so simple.

IntegratedQuantum · August 14, 2024, 9:05pm

No, even with a struct that’s 8 MB, it still shows different pointers, but that’s likely a bug and not a deliberate feature: Pass by reference "optimization" copies the entire struct on the stack when taking its address. (this didn't happen in stage1) · Issue #16343 · ziglang/zig · GitHub

mnemnion · August 14, 2024, 11:04pm

That’s uhhh… I think it’s safe to say that my statement “you can’t rely on one or the other” will hold over a sufficient number of releases. This is one of the bugs in the optimization which @squeek502 was referring to. And linked to.

This conversation is getting a bit muddled, I think, between what the compiler is supposed to do, what it happens to do, and what aspects of program behavior should be considered semantics.

But it does illustrate why pass-by-const-pointer and pass-by-reference dont have identical semantics, and aren’t intended to. If you pass by const pointer, the address of the pointer will be the same as the value from which you took it, passing by reference there are no guarantees of that. That’s a semantic difference:

const Selfie = struct {
    me: *Selfie = undefined,
  
    pub fn compareToSelf(selfie: Selfie) bool { 
        return @intFromPtr(selfie.me) == @intFromPtr(&selfie);      
    }

    pub fn compareToSelfPtr(selfie: *const Selfie) bool {
        return @intFromPtr(selfie.me) == @intFromPtr(selfie);
    }
};

test "self reference" {
    var self = Selfie{};
    selfie.me = &self;
    // This must be true
    expect(selfie.compareToSelfPtr());
    // Anything can happen
    _ = selfie.compareToSelf();
};

That’s an example of a semantic difference. Contrived, yes, but things which lie closely adjacent to this are the essence of the aliasing issue.

It’s quite clear that pass by reference is not supposed to have the C semantics of passing a struct, which is by value. It’s weird to find out that it apparently always does that at the moment, but I can’t imagine that situation continuing forever.

To quote Andrew from the other issue @squeek502 linked to:

Temporary workaround is to pass by const pointer instead. However, we don’t want Zig users to get used to doing this, or they will never kick the habit even when this issue is fixed, which is why I am making this a high priority issue, and denying any more requests to do the workaround in the standard library.

So do what you need to for your programs to work and run fast, but let’s not allow a transient bug to solidify into doctrine.

chung-leong · August 14, 2024, 11:41pm

The behavior seems correct to me, since the variable in the callee is semantically supposed to be a copy of the variable in the caller. On the other hand, this could lead to a lot of unnecessary copying when you need to pass structs to C functions. Hmmm…

mnemnion · August 15, 2024, 12:32am

No, it clearly, and explicitly, is not supposed to be a copy. It’s supposed to be a reference:

Structs, unions, and arrays can sometimes be more efficiently passed as a reference, since a copy could be arbitrarily expensive depending on the size. When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster. This is made possible, in part, by the fact that parameters are immutable.

Even if it were semantically a copy which it is not, it doesn’t make sense that repeatedly taking the address of it changes said address. You could make a narrow case that this is legal behavior, but I’m not buying it: once it’s passed, it’s a value, and pointing at the same value twice should point to the same region of memory.

The fact that it’s immutable makes this ~~bug~~ regression relatively unlikely to cause logic bugs in a program, since the only useful thing to do with that address is pass it down stack, where it doesn’t matter that taking N pointers makes N copies on the stack.

Zig isn’t a standardized language, so sure, it could be decided that this is just what happens when you repeatedly take addresses to the same reference, but that would be an extremely weird choice to make, rules like “If you take a pointer to a value, it will point to the address of that value, unless it’s a reference, in which case, the compiler will make a fresh copy of that value and the pointer will point to that” make a language hard to work with.

Sze · August 15, 2024, 9:37am

Was this supposed to use *Selfie?

mnemnion · August 15, 2024, 2:29pm

*const Selfie yes, and it makes no sense without that, good catch. I edited it accordingly.