Copy or reference?

Here is something I don’t understand:

const std = @import("std");
const print = std.debug.print;

const A = struct {
    x: u8 = 0,
};

var a = A{};

fn f(b: A) void {
    a.x = 2;
    print("a.x = {}, b.x = {}\n", .{a.x, b.x});
}

pub fn main() void {
    a.x = 1;
    {
        const b = a;
        f(b);
    }
    a.x = 1;
    {
        const b = a;
        f(b);
        _ = &b;
    }
}

The output is

a.x = 2, b.x = 2
a.x = 2, b.x = 1

The only difference between the two blocks in main is the _ = &b; statement. I would have expected the output b.x = 1 in both cases.
It seems to me that

  • b is like a reference to a in the first block
  • b is a copy of a in the second block

Am I doing something illegal? What do I miss?

3 Likes

That’s strange. I think it’s a bug that happens when the compiler optimizes the code. I might be wrong.

When a struct is passed to a function, it’s copied. Unless the pointer to that struct is passed

/// pass by value
fn f(b: A) void { ... }

/// pass by reference
fn f(b: *A) void { ... }

From Zig Reference:

Structs, unions, and arrays can sometimes be more efficiently passed as a reference, since a copy could be arbitrarily expensive depending on the size. When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster. This is made possible, in part, by the fact that parameters are immutable.

Zig currently assumes no aliasing, otherwise you have interesting problems described in this amazing video: ATTACK of the KILLER FEATURES

3 Likes

Ooof, that’s a horrible miscompilation. I’ve tested it in godbolt and it started on version 0.10. Looking at the generated assembly, it’s clear the compiler decided to pass by reference (which makes no sense to me, as it’s a one-byte sized structure). When you take the variable’s address, it creates the variable on the stack and passes that address to function, while on the other case it simply passed the address of the global variable.
The aliasing issue is a lot bigger than what the core team is giving it credit.

The issue happens in all levels of optimization here. I tried something else and I was also able to mitigate the issue by declaring the first b as a var instead of const and that fixes the issue, too. Just throwing this out, but it seems like the compiler wants to essentially “remove” b and just go directly from a.

pub fn main() void {
    a.x = 1;
    {
        var b = a;
        f(b);
        b.x += 1; // for "unmodified var" error
    }
    a.x = 1;
    {
        const b = a;
        f(b);
        _ = &b;
    }
}

So that said… I wonder at what level is that decision getting made? As soon as b is given some non-obvious path (it’s used in someway beyond the use of a), the problem seems to go away.

Edit: one more experiment…

This also removes the aliasing issue, too:

fn g(b: A) void {
    print("b.x = {}\n", .{b.x});
}

// later...

a.x = 1;
{
    const b = a;
    f(b);
    g(b);
}

This example removes the aliasing too:

fn h() A {
    return a;
}

// later...

a.x = 1;
{
    const b = h();
    f(b);
}

@LucasSantos91, I wonder if there’s some way to guarantee a non-aliasing copy? Maybe we should open a thread about that?

BTW, all of these examples I have compiled above are on ReleaseFast.

Here’s what I’m wondering… you can form an identity function like so:

pub fn identity(comptime T: type, x: T) T {
    return x;
}

Here we can see that T as our return type is not a pointer. As such, it makes me wonder if that influences the compiler to make a deep copy instead of a shallow one? That still works on release fast too with no aliasing:

const b = identity(A, a);

Edit - it also works with types that are larger than a usize.

I don’t think there is, because that is antithetical to what Zig is working towards. Consider parameter reference optimization (PRO), for instance. Zig is trying at all levels to take the decision of when to make a copy away from the programmer. This should lead to more ergonomics to the programmer, who doesn’t have to worry about this, and better performance, as the compiler can really micromanage every byte. Overall, this works great, but examples like this show that it can fail in very subtle ways. In a large code base, this bug would be nearly impossible to catch. Specially worrying is how fickle it is, a simple _ = &b, that was made some lines away, was all it took to make the bug disappear (temporarily, as any change to the code around this could make it reappear).
In the identity function that you showed, the compiler could inline it and perform all the same optimizations. Even if you forbid it from inlining, given that Zig is agressively working towards eliminating extra copies, it could still deduce that the value returned will be the same the as the argument, and make code transformations based on this. If you think about it, the line const b = a is equivalent to the identity function you showed, with a being the input and b the return value. Here b is a value that, from a logical standpoint, is a copy of a. The compiler could implement this as a reference, but it would have to ensure that the illusion of it being a copy was never broken. The fact we can see what’s happening behind the curtains is a compiler bug. If the compiler broke the line const b = a, it will break anything that achieves the same result, even the identity function.

They are well aware of these issues. What we’re seeing here is just a variation of the aliasing issue. See this post from Andrew. In the post, and in various other media, Andrew has argued that this problem is not as big as it seems, which is why it hasn’t been properly dealt with, but I disagree.
Matklad has already commented that in TigerBeetle they stopped using value arguments altogether to avoid aliasing issues, passing pointers everywhere, which is the opposite of what PRO was trying to achieve. They are going out of their way to purposefully give up all of value semantics, therefore giving up all of the performance and ergonomic benefits that PRO was supposed to bring. They would get better performance and ergonomics by explicitly deciding when to make a copy, like in C.
Anyone programming in Zig is (or should be) always afraid that a bug like shown in the example will suddenly appear in our codebase. One small change in the compiler, or one simple line in your own code, could make the aliasing issue appear and create a massive Heisenbug. Nobody would be crazy to bet a really large project on a language where you can’t have the most basic faith that your arguments will be passed correctly, specially if the code may work today and break tomorrow based on the compiler’s humor that day. When you can’t have faith in such a basic aspect of the language, it shatters the faith in the language itself.
To me, the aliasing issue is the most important issue currently in Zig, to the point that it may break the entire language. Specially given that we don’t know if it’s even solvable. If it can’t be solved, we’re gonna have to make some drastic changes, like impose limits on certain things like global variables or even go back to the C way of passing arguments. Whether this is solvable or not will completely shape the language, which is why it should be tackled on as early as possible.

4 Likes

This is a solvable problem, without disrupting the basic language choice to allow the compiler to make most of those decisions.

For example, there could be a built-in @copy, which guarantees that var a = @copy(b) is always a (shallow) copy. This would solve some aliasing issues by ensuring that aliasing isn’t happening, although of course not all of them, if b has a pointer than a can alias that memory. Aliasing is a hard problem, but some aspects of that problem belong to the user-level domain, any language which works directly with memory will be susceptible to it.

But this would be a useful adjunct to pointer-taking, which guarantees the opposite, that no aliasing can occur. For the usual case, Zig’s combination of pass-by-either-value-or-copy, combined with the immutability of those parameters, strikes a good balance.

Adding the ability to brute-force a copy when that’s what’s needed would help a lot here. If the compiler can prove that there is in fact no reason for a copy to take place, it doesn’t have to do it. But the existence of such a builtin would take the sample code from being ambiguous, to being a straightforward bug if @copy were used in the first case.

1 Like

Yup, that’s a fair point. I also get your point about it being a Heisenbug (I’m going to start using that term, it’s great). Any time I see errors like this, I get the feeling of the cold hand of death but I also believe @andrewrk when he mentions that they have these in mind and there’s a lot of common cases that can be solved and that would handle the vast majority of problems with this. I can’t prove that, but I’m willing to bet they’ve looked into this deeply

I was considering this a bit earlier when I was thinking about the function approach. I wonder what the opposition is to this this (if there is any).

3 Likes

Maybe @copy is the solution, but we need to start using it now, rather than waiting until we have a whole bunch of legacy code to fix, or worse, hidden bugs. This is why I say that it is past time to put massive effort into studying this problem. Still, I think @copy could end up being less ergonomic than just doing things the C way.
Also, from a language design perpespective @copy goes against what Zig is trying to do, as it’s already Zig’s default to make everything a value (from a logical standpoint, even if they are implemented as references). So const b = a should be enough to create a value, without any intrinsic. Without the intrinsic, what is the type of b? Isn’t it A? That should be a value, not a pointer or, worse, a reference (which doesn’t even exist in Zig).

3 Likes

Builtin, not function. Just to be clear. This would be as much a compiler directive as it would be a function, it isn’t something which can be implemented in userspace, by the current design of the language.

I have a strong preference for the Zig way of doing things over the C one here.

A const is where you don’t want a copy, I would think, given that you can’t mutate it, the distinction shouldn’t matter. Unless b is mutable, in which case the ambiguity starts showing up again.

The most important part of this is that the behavior of the abstract machine is predictable, and predictable means simple rules, consistently followed. For example: if you take a pointer, then you’re sharing memory with the object you took the pointer to. The resulting code may not be easy to understand, but the rule certainly is.

A @copy builtin/intrinsic would also be predictable, and would let the language continue to work with the “as if copied” semantics of constant reference arguments, result location semantics, and all the other good stuff that comes from allowing the compiler to be flexible about how to do those things when it can get away with it.

One of the reasons I decided to get involved with Zig is because of the intention that the language be standardized at some appropriate point. I’ve seen the suffering and chaos (no exaggeration!) that lack of a standard can cause. What a standard brings to the table is that all these questions will be answered because they must be answered.

A @copy intrinsic solves the “how to get the code I intended” problem of the example, but not the “what is this even supposed to do” problem. I think these are both important. But I don’t think regressing to C’s reference model is the way forward. It’s not exactly renowned for its lack of aliasing bugs, and the fact that the abstract machine requires struct arguments to be copies, and the resulting promiscuous use of pass-by-pointer, has a lot to do with that.

1 Like

What do you think about this… if you declare the type as you’re assigning, it would guarantee that type:

const a = b; // up to the compiler

const a: A = b; // must be a type A, not a reference

That seems like magic to me. I would prefer that type declarations not get conflated with value/reference semantics, since there isn’t a logical connection between them. Because at that point it isn’t a type declaration, it’s a type-and-this-is-copied declaration.

I don’t have a solid sense of what the rule should be, or I’d be willing to sketch it out. I don’t even have a very solid sense of what the current behavior entails, I watched the Killer Features video awhile back and perhaps I’ll return to it tonight.

I did hit a bug in my own code, where I took what turned out to be a copy of a slice. I had assumed that this meant that it consistently would be a copy, and fixed the problem by taking a pointer instead, but it does make me nervous if which is which turns out to be up to the whim of the compiler. I considered that a bug on my part, because in C, it would definitely be a copy, and I was just thinking in terms of languages with a more “object” and less “memory” model of program behavior when I wrote it.

It’s tough stuff to get right. Julia, for instance, has mutable and immutable structs, it’s a type-level distinction, and, being a dynamic language, mutable structs always go on the heap. It works… ok, mostly, but there are frequent posts on the forum about ways to work around the resulting lack of flexibility. But it does have the advantage of being a consistent rule: mutable structs have identity semantics, immutable structs have value semantics, and you can’t change them anyway. I don’t think it’s the right rule for a language with pointers, and where const-ness is a variable-level as well as type-level distinction.

So something like a @copy intrinsic might turn out to be a punt, something the language shouldn’t need and which then remains around as a spandrel, or has to be removed again. But it would form a dual of taking a pointer: now you can guarantee shared memory, and you can also guarantee unique memory. That carves the problem at the joints, as it were, giving the design the flexibility to settle on what makes the most sense when neither of these things are the case.

My remaining observation here is that this ambiguity hasn’t created problems for me yet. The example code in this thread, for instance: while I know it’s intended to demonstrate something, making a local variable out of a global variable, passing it into a function, and then mutating the global variable directly in that function, would be anathema to me in any language. The code I write doesn’t have anything vaguely like that, so I haven’t tripped on it.

But I would indeed like to know for sure what happens if I write var a_slice: []const u8 = ref[idx];. My presumption was that this is a copy, but it would be nicer to be sure of it, one way or the other.

3 Likes

Thanks everybody for your replies. There is a lot I didn’t know about aliasing and the avoiding of copies by the compiler. And of course I still don’t understand all of it.

In my real code a is a field of a struct and f is a method. But I don’t depend on passing b as a parameter. Instead I can define b in the function itself. Like this in my simplified example:

fn f() void {
    const b = a;
    a.x = 2;
    print("a.x = {}, b.x = {}\n", .{a.x, b.x});
}

Now b should be really a copy of a, I hope?

Sadly, in the current state of Zig, this can’t be answered… It’s a Heisenbug, it is both correct and incorrect until we look at the generated machine code, and it could change at any moment.
I think everybody here was hoping your example was just synthetic and not real code. If you need to do this for real, maybe use this advice: Zig really hates global variables. Avoiding them makes these problems a lot smaller.

2 Likes

I think global or not is not the problem because the real code is without globals, but it does the same thing.
Simplified version:

const std = @import("std");
const print = std.debug.print;

const A = struct {
    x: u8 = 0,
};

const S = struct {
    a: A = .{},

    fn f(self: *S, b: A) void {
        self.a.x = 2;
        print("a.x = {}, b.x = {}\n", .{self.a.x, b.x});
    }
};

pub fn main() void {
    var s = S{};
    const b = s.a;
    s.f(b);
//    print("{}\n", .{b});
}

Taking the address of b or uncommenting the print statement helps again.

2 Likes

So it’s clearly not great that the meaning of a line of code can change based on a later line of code. But I do see how the compiler is getting fooled here, because you never use b again after passing it in, and when you do use it in the same block, a copy happens. It’s a tricky bit of alias analysis, and the behavior of all these interactions is currently under-specified.

Basically, you promise the compiler twice that b is immutable: once when you make it const, and twice when you pass it to f as a reference, which are always const. Taking a const of a var is just a bad idea, you’re asking for trouble with that, even with pointers.

Basically, const is a promise that goes both ways: the compiler promises not to let you mutate that variable, but you also promise that it won’t get mutated some other way. If you break your side of the bargain, bad things can and will happen, because codegen will assume that a const value can never change.

Something which should always work is this:


const A = struct {
    x: u8 = 0,
};

const S = struct {
    a: A = .{},

    fn f(self: *S, b: A) void {
        self.a.x = 2;
        std.debug.print("a.x = {}, b.x = {}\n", .{ self.a.x, b.x });
    }
};

test "anti-alias" {
    var s = S{};
    const b = A{ .x = s.a.x };
    s.f(b);
    // std.debug.print("{}\n", .{b});
}

Now, commenting and un-commenting the last print line has no effect, as it shouldn’t. I would be upset if this idiom lead to b being a reference to a.

This is manually doing what the proposed @copy built-in would do. I don’t have the time to comprehensively test it, but I’m concerned that building a fresh struct using a function might not be enough, although if the struct is passed as a pointer then it damn well should be. But doing it inside a block like this should work for sure, assigning a primitive to a field or variable is always a copy, no exceptions.

1 Like

I don’t think we can say this for certain, as the compiler might be able to realize that you’re just copying fields from one variable to another and perform the problematic code transformations. You’re drawing a distinction on how the compiler should behave based on lexical rules, like blocks and field access, but the compiler operates at a higher level than this, it thinks in terms of objects. Assigning all the fields of variable a to variable b should be the same to the compiler as copying a to b.
The problem here is that we are dealing with a miscompilation. Trying to reason about the behavior of the compiler based on the rules of the language is not going to work because, by definition, a miscompilation is when the compiler is not following the rules of the language.
Even if it what you’re saying is always true, it only works for primitives, so we would have to always expand every field down to primitives when initializing them.
Maybe @kudu can use this temporarily and hope it will work while Zig fixes aliasing… I don’t know what else to suggest here. You could try opening an issue, but they’ll likely just answer “duplicate 12251” and close it.

3 Likes

This is a nice trick with const b = A{ .x = s.a.x };.
Your idea using a copy function seems to work also, at least as of now:

inline fn copy(a: A) A {
    return a;
}

and then

const b = copy(s.a);

This would be simpler if A has many fields.

Anyway, maybe my lesson for now is that I should better avoid passing a copy of a field to a function that changes that same field.

I meant it as a normative rather than a descriptive statement. When I say I would be upset to get aliasing with an idiom like that, I meant it.

The documentation doesn’t directly specify what happens here, but it does say this:

Primitive types such as Integers and Floats passed as parameters are copied, and then the copy is available in the function body. This is called “passing by value”. Copying a primitive type is essentially free and typically involves nothing more than setting a register.

I would file a bug if constructing a fresh struct this way were to still produce aliasing, and I would like the eventual standard to require that it not alias under any circumstances.

This? This I would not count on working. It’s extremely fragile, you don’t actually make a copy here. The documentation explicitly reserves the right to pass this by reference if it chooses, so the code you wrote is a straightforward bug in Zig status-quo (and probably it always should be).

Something like this should be reliable, again, normative statement:

fn copy(a: *A) A {
     return A{.x = a.x};
}

// then

const b = copy(&s.a);

But @LucasSantos91 is suggesting that even this might not be consistently a copy with the current compiler, and he knows more about this particular quirk of Zig’s model than I do.

The documentation strongly supports this as always copying:

fn copyByFields(x: u8) A {
    A{.x = x}; 
}

// Then
const b = copyByFields(s.a.x);

As long as every field passed to copyByFields is a primitive value type. If you made a member function which took the receiver as a pointer, and had that call an internal non-member function which takes every field and constructs a fresh struct, which it returns, that should always copy according to my best understanding of the documentation and status-quo behavior.

This is why I suggested that a @copy would have to be an intrinsic: it’s possible with great metaprogramming wizardry to make a generic copy method that should be status-quo compliant in terms of definitely making a copy, but until things settle down, it’s always operating on hard-mode to do it. This shouldn’t be as hard as it apparently is.

2 Likes