This piece of code is modifying a value and then checking if the result is 0.
With the Zig compiler it produces the following assembly code (see godbolt):
As you can see the address is fetched twice. Why doesn’t the compiler optimize this?
I guess the compiler is assuming that self and self.val may alias, but why does Zig make these assumptions while other languages don’t?
Also I checked with a C compiler, and in C this extra code isn’t there (see godbolt). Why does clang optimize this, but not Zig?
Just curious how much was the performance penalty had to pay. Since it’s a single extra mov instruction normally you wouldn’t expect it to spend so much time, moreover it’s accessing same location so cache line miss is less likely to happen. And if it’s a bug I wonder this type of extra unnecessary mov instruction were appended all over the place.
You’ve correctly identified that this is to do with aliasing. @dimdin seems to have assumed it’s about val pointing to fun, but that’s not true: the concern is with val pointing to val, through a cast pointer, such as like this:
var ex: Example = .{
.val = undefined,
.fun = 123,
};
ex.val = @ptrCast(&ex.val);
In this case, the assignment ex.val[0] -= 1 would change the pointer ex.val, so that the next comparison has to work differently. Thus, since for an export fn the compiler has no useful provenance data for any parameters, the optimizer has no choice but to load the pointer again, in caseval aliased it.
The last piece of the puzzle, then: why does this not happen in C? The answer is a piece of UB in the C standard called Strict Aliasing. This is a form of TBAA (type-based alias analysis) – in short, it states that it is UB to use an A * which “actually” points in memory to a different type B (unless either A or B is a char type, which is an explicit exception). So, in your second Godbolt snippet, Clang notices that a pointer to val has type int ** while the pointer we’re accessing (self->val[idx]) has type int *; since the pointee types int * and int are different and neither is a character type, the optimizer is allowed to assume they won’t alias! Clang and GCC allow using the compiler flag -fno-strict-aliasing to disable this class of optimizations; if you pass that, you’ll see you get the same code as in the Zig case.
One question. val is of type [*]u32 (a 64-bit pointer), but the expression ex.val[0] -= 1 is modifying a 32-bit value. The compiler still thinks that this 32-bit value could overlap with a pointer? Because it sounds like the compiler is being excessively loose with types.
It does think that, because it literally can. Sure, it’d be referring to a 4-byte sub-region of the pointer value, but that still affects the pointer value. The example snippet I sent is exactly the code which would trigger this case (if you passed idx as 0).
I was under the impression that Zig’s type system was stricter than C’s. In particular, type punning is legal in C but illegal in Zig. So, from this point of view, using a [*]u32 as an alias to a sub-region of a pointer would be illegal, wouldn’t it? Shouldn’t Zig leverage that to enable the same optimizations that Clang did?
Where did you get that impression? Zig is generally more permissive than C regarding type punning. C is notoriously strict about this; the only well-defined way to type pun in C is with memcpy (unless the source or destination type is char, in which case pointer casts are fine per the aliasing rules I described above). This is because C has a somewhat bizarre memory model which kind of assumes that values aren’t actually “made of” bytes in general.
The only thing I can think of which you might be talking about is the fact that Zig doesn’t allow you to use a union to type pun. This is because union types in Zig have ill-defined layout. Using an extern union, which is the equivalent of a C union, allows you to type pun (which is notably not allowed in C, even if everyone does it!).
I want to add to this that reworking zig’s rules about aliasing is still one of the big language design areas left. The challenge is to ensure that whatever rules are instated are enforceable as cheap safety checks.
Yeah, type punning through unions is an odd case in C. There is a footnote in the C11 (and C99) standard that specifically states punning through unions is ok. But, as I understand it, footnotes are non-normative and the normative body of the standard seems to state that such type punning is UB a little unclear.
Edit: as pointed out below, the footnote doesn’t fully clarify type punning and the standard body is more clear than I recalled and doesn’t forbid it.
if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined
As often happens, the specific standard-ese on this topic has muddied over time. As a matter of practice, no compiler would dare break type punning through unions, because too many programs depend on it functioning correctly.
The footnote you’re referring to, which does expliticly mention type punning, points to this section:
When a value is stored in a member of an object of union type, the bytes of the object representation
that do not correspond to that member but do correspond to other members take unspecified values.
Which is one of those aggravating negative-space statements which is just less clear than C89 was. But we can connect all of the dots: two members of a union both have well-defined memory layouts (essential in C), so the non-padding bytes common to both unions have a specified value, even when accessed through the non-active tag.
So not quite UB, although it’s possible to reach UB by, in particular, reading a padding byte from a field of another member of the union. But if all members of the union are dense, with no padding, then behavior when accessing any one through any other is implementation defined, which in practice means that it’s predictable given the known endianness of the platform.
We took a bit of a detour here, but returning to OP, it’s possible to generate the optimal machine code in two ways: storing the pointer as a const or using noalias. Godbolt
Thanks for explaining, and thanks for the noalias trick. It works great. I think having noalias by default would be great for cases like this.
Well it was actually a good amount. In total it saved 6 seconds out of a 44 second function. And according to my profiler the second mov + cmp is actually almost as expensive as the jump: