This piece of code is modifying a value and then checking if the result is 0.
With the Zig compiler it produces the following assembly code (see godbolt):
As you can see the address is fetched twice. Why doesn’t the compiler optimize this?
I guess the compiler is assuming that self and self.val may alias, but why does Zig make these assumptions while other languages don’t?
Also I checked with a C compiler, and in C this extra code isn’t there (see godbolt). Why does clang optimize this, but not Zig?
Just curious how much was the performance penalty had to pay. Since it’s a single extra mov instruction normally you wouldn’t expect it to spend so much time, moreover it’s accessing same location so cache line miss is less likely to happen. And if it’s a bug I wonder this type of extra unnecessary mov instruction were appended all over the place.
You’ve correctly identified that this is to do with aliasing. @dimdin seems to have assumed it’s about val pointing to fun, but that’s not true: the concern is with val pointing to val, through a cast pointer, such as like this:
var ex: Example = .{
.val = undefined,
.fun = 123,
};
ex.val = @ptrCast(&ex.val);
In this case, the assignment ex.val[0] -= 1 would change the pointer ex.val, so that the next comparison has to work differently. Thus, since for an export fn the compiler has no useful provenance data for any parameters, the optimizer has no choice but to load the pointer again, in caseval aliased it.
The last piece of the puzzle, then: why does this not happen in C? The answer is a piece of UB in the C standard called Strict Aliasing. This is a form of TBAA (type-based alias analysis) – in short, it states that it is UB to use an A * which “actually” points in memory to a different type B (unless either A or B is a char type, which is an explicit exception). So, in your second Godbolt snippet, Clang notices that a pointer to val has type int ** while the pointer we’re accessing (self->val[idx]) has type int *; since the pointee types int * and int are different and neither is a character type, the optimizer is allowed to assume they won’t alias! Clang and GCC allow using the compiler flag -fno-strict-aliasing to disable this class of optimizations; if you pass that, you’ll see you get the same code as in the Zig case.
One question. val is of type [*]u32 (a 64-bit pointer), but the expression ex.val[0] -= 1 is modifying a 32-bit value. The compiler still thinks that this 32-bit value could overlap with a pointer? Because it sounds like the compiler is being excessively loose with types.
It does think that, because it literally can. Sure, it’d be referring to a 4-byte sub-region of the pointer value, but that still affects the pointer value. The example snippet I sent is exactly the code which would trigger this case (if you passed idx as 0).
I was under the impression that Zig’s type system was stricter than C’s. In particular, type punning is legal in C but illegal in Zig. So, from this point of view, using a [*]u32 as an alias to a sub-region of a pointer would be illegal, wouldn’t it? Shouldn’t Zig leverage that to enable the same optimizations that Clang did?
Where did you get that impression? Zig is generally more permissive than C regarding type punning. C is notoriously strict about this; the only well-defined way to type pun in C is with memcpy (unless the source or destination type is char, in which case pointer casts are fine per the aliasing rules I described above). This is because C has a somewhat bizarre memory model which kind of assumes that values aren’t actually “made of” bytes in general.
The only thing I can think of which you might be talking about is the fact that Zig doesn’t allow you to use a union to type pun. This is because union types in Zig have ill-defined layout. Using an extern union, which is the equivalent of a C union, allows you to type pun (which is notably not allowed in C, even if everyone does it!).
I want to add to this that reworking zig’s rules about aliasing is still one of the big language design areas left. The challenge is to ensure that whatever rules are instated are enforceable as cheap safety checks.
Yeah, type punning through unions is an odd case in C. There is a footnote in the C11 (and C99) standard that specifically states punning through unions is ok. But, as I understand it, footnotes are non-normative and the normative body of the standard seems to state that such type punning is UB a little unclear.
Edit: as pointed out below, the footnote doesn’t fully clarify type punning and the standard body is more clear than I recalled and doesn’t forbid it.
if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined
As often happens, the specific standard-ese on this topic has muddied over time. As a matter of practice, no compiler would dare break type punning through unions, because too many programs depend on it functioning correctly.
The footnote you’re referring to, which does expliticly mention type punning, points to this section:
When a value is stored in a member of an object of union type, the bytes of the object representation
that do not correspond to that member but do correspond to other members take unspecified values.
Which is one of those aggravating negative-space statements which is just less clear than C89 was. But we can connect all of the dots: two members of a union both have well-defined memory layouts (essential in C), so the non-padding bytes common to both unions have a specified value, even when accessed through the non-active tag.
So not quite UB, although it’s possible to reach UB by, in particular, reading a padding byte from a field of another member of the union. But if all members of the union are dense, with no padding, then behavior when accessing any one through any other is implementation defined, which in practice means that it’s predictable given the known endianness of the platform.
We took a bit of a detour here, but returning to OP, it’s possible to generate the optimal machine code in two ways: storing the pointer as a const or using noalias. Godbolt
Thanks for explaining, and thanks for the noalias trick. It works great. I think having noalias by default would be great for cases like this.
Well it was actually a good amount. In total it saved 6 seconds out of a 44 second function. And according to my profiler the second mov + cmp is actually almost as expensive as the jump:
I’m having a 2 AM thought on this I’m not sure where to put, so I’ll write it here.
Right now the language permits any aliasing of types with well-defined layout. I propose instead to allow aliasing side-effects in exactly four circumstances:
through any two pointers of the same type (modulo CV-qualification, precisely what C++ allows) unless forbidden by noalias,
one of the aliasing pointers originates from another and is created inside the function that observes the side effect regardless of the pointer types,
one of the pointers originates from an integer or a pointer that was cast to an integer,
one of the pointers is part of a global variable.
Cases (2-4) also require the load/store to go through types with well-defined memory layout.
The main point is that it allows this optimization to happen by default. Another important consequence is that two many-item pointers of different types cannot alias if they appear as a function’s parameters (or part of parameter structs/unions).
Now, let’s address how the compiler can enforce all this. I propose to check for aliasing aggressively at the function boundary using liveness analysis. Namely, reject at comptime any attempt to create two pointers of different types that are visible (I) to a function at the creation of its frame or (II) visible to its caller after the destruction of function’s frame.
Edit: On second thought, the check (II) is too strict and makes Allocator.remap implementations clunky if not impossible. The other thing that’s impossible to check at all is whether a pointer comes from a global variable. So, calling a function on global variables would make an unchecked strict aliasing assumption.
Note that UB-free C libraries cannot break the aliasing model outlined here (unless done at the API boundary, which is illegal) unless one of the types aliased is c_char. This can also be special cased.
Ho, what is going on here? Reading this thread I immediately think: should I noalias all my (mutable) pointer arguments which are not marked as const?
Am I doing an insane amount of redundant pointer chasing in my code?
…also the situations in C where the restrict keyword (e.g. promising to the compiler that two pointers don’t alias) makes a measurable difference are rare.
It’s another tool in the optimization toolbox, but one that’s deeply buried below other much more useful tools.
I really appreciate that the plan is to avoid aliasing UB by default (in safe modes). It is really valuable to assume reinterpretation via pointer casting cannot cause UB for extern struct types without pointers (plain old data). I can’t find another language that provides this, except perhaps C with -fno-strict-aliasing.