Apparently, Zig’s handling of function arguments (passing by value or by reference at will) leads to weird side effects when address of the parameter changes every time it is taken so that (&v == &v)
is false
, see issue #18194. To me it looks like a serious bug.
Credit to @tgirod for pointing this out (at least this was how I discoverd it)!
I think the original credit goes to @IntegratedQuantum and their issue #16343 from July 7-th.
The real problem is that the bug has been out in the wild for at least 6 months and who knows how long it would take for Andrew and the team to fix it.
Well, that is definitely a good issue, but it doesn’t mention that the copying seems (potentially?) unrelated to PRO for big structs and occurs for all parameters. And the behavior has existed for longer than 6 months, I tested going back to 0.10.1 with no -fstage1
(also noted in that original July 7th issue).
I am an experience coder (C/C++/Rust) but new to Zig. I am not aware of any language where taking address of a function parameter has side effects and leads to different values each time address operator is applied. This is an absolute no go.
It looks like you know more about this bug than me. Please add your comments to the issue #18194.
This is so wild. I have added some print
s to the example from the Github issue, and I don’t think I understand how or why or even what is going on:
fn pro_cmp_address(v: i32) bool
{
std.debug.print("\n", .{});
const v_ptrs = [_]*const i32 { &v, &v, &v, &v };
for (v_ptrs) |p|
std.debug.print("p = {}, p.* = {}\n", .{ p, p.* });
std.debug.print("\n&v = {}\n", .{ &v });
std.debug.print("&v = {}\n", .{ &v });
std.debug.print("&v = {}\n", .{ &v });
std.debug.print("&v = {}\n", .{ &v });
return (&v == &v);
}
prints out:
p = i32@7ffe774f1528, p.* = 123
p = i32@7ffe774f152c, p.* = 123
p = i32@7ffe774f1530, p.* = 123
p = i32@7ffe774f1534, p.* = 123
&v = i32@7ffe774f157c
&v = i32@7ffe774f158c
&v = i32@7ffe774f159c
&v = i32@7ffe774f15ac
It does look like a new variable is created and initialized with v
’s value each time &v
is accessed
It’s one of those things that compilers do internally that would normally be fully transparent to the user, but that become weird and confusing when they don’t work.
As an example, functions can (and generally speaking, also desire to) receive arguments via CPU registers instead of data on the stack, the problem though is that you can’t take the address of something in a register, so when a compiler generates a function like that, it must silently copy the value on the stack.
I’m sure there’s plenty of more “weird” things that compilers do for us behind the scenes that help us maintain the illusion that we’re actually “calling” a function, or taking a pointer of a “variable”.
C compilers (gcc, clang) also try to pass small values into registers. But when programmer explicitly requests the address of the variable, the compiler puts in on the stack only once and always returns the same address for duration of the stack frame. What Zig does is a bug.
Oh, I’m not disputing that it’s a bug, just saying that it’s one of those operations that compilers do.
I understand the necessity of creating a variable when the compiler could just use a register, but the programmer wants its address.
This is not weird or confusing.
A variable whose address changes each time you access it, is.
I think @kristoff just meant that this compiler bug is happening due to the fact that the compiler needs to put the variable on the stack in order to take its address. The compiler seems to be forgetting that it already put the variable on the stack, leading to this multiplication of variables on the stack.
I think it’s also fair to point out that for this to be an ongoing issue (it was apparently noticed quite some time ago), it must not be of dire importance or consequence.
I agree with the complaint, however. Caching the variable that was translated from the register makes sense because that matches what people would expect to see when performing well defined operations on addresses.
It’s also important to mention that this still doesn’t do what people think it does. We aren’t getting an address to the parameter itself which could exist in a register. So exactly what the intended behavior is still stands as somewhat mysterious. The best we can do is what @kristoff is mentioning and do some compiler sleight-of-hand to silently copy something to the stack.
Once again, it is a bug (an odd one, but still a bug) that could be addressed to make it’s behavior closer to expectation.
Actually I consider it of dire importance. I am trying to be a Zig advocate on my team. Now it is getting harder to advocate for a language with a non-deterministic address operator. BTW, &x
works fine for var
and const
. Function arguments seem to be the only weird special case.
And you certainly have the right to assign whatever importance you see fit to whatever issue you find concerning. It’s not non-deterministic though, it’s just unexpected because there’s a misunderstanding about what the compiler is doing - creating new copies isn’t non-deterministic. In fact, if it does create a new address for each operator invoked, then I can determine that equality will always be false.
Again, I agree it should be fixed.
Not directly related to this, but I think it’s worth mentioning that the Carbon language is implementing arguments that cannot have it’s address taken. While Zig tries to optimize parameter passing by making all arguments const
and letting the compiler decide between passing by value or reference, Carbon has value parameters and reference parameters. This is supposed to make it easier for the compiler to pass arguments in registers. Foonathan has nice article about it.
It’s curious to think that parameter passing is one the most basic things we do as programmers, and yet we have not found the optimal way of doing it.
I think what @LucasSantos91 posted here is the most compelling insight I’ve heard so far.
Here’s the thing… with what is being proposed, the address operator will have side effects - it still creates a singular copy instead of a new copy for each one. I agree this is a step in the right direction but there’s a bigger discussion behind all this.
There’s two ways you can go here - the implicit or explicit route. The implicit route potentially has a rule like “if you take the address of the parameter, the compiler now has to do x to make it behave according to the average programmer’s expectations”. This is a fine approach and introduces very little cognitive overhead.
The other way to do it is to make the programmer be very explicit about things and make the compiler’s job easier. Now the expectations are aligned but you have a bigger cognitive burden for an issue that most people don’t tend to notice.
At this point, I imagine the first option will be taken (if this gets changed) because the syntax around parameters is fairly well established by now.
You are right. I used a sloppy language. It is deterministic. But it is an address operator with side-effects such that (&v == &v) is true
for var
and const
, and false
for function parameters (passed both by value or by reference).
Right, it’s inconsistent and that’s definitely annoying - I’m glad we have people advocating for higher standards with our programming languages such as yourself. I’ll definitely watch the issue on Git and throw in my two cents because I’d like to see this resolved, too.
This is what C compilers do. The moment you take address of a variable compiler makes it an l-value and puts one-and-only-one copy on the stack.