There’s no lto if you compile to an .so
. Yes, LLVM could do more optimiations if you feed it more information. But it should optimize better with the info it already has in this example.
If the call to eff()
is dynamic, then it can’t move the function barrier, period.
Why? Because eff could a) exit b) exhibit undefined behavior or c) go into an infinite loop.
The C standard defines something called a sequence point: before any of a), b), or c) happens, everything before the last sequence point has to have happened.
Sequence points include:
There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call. Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function.
And it’s good that the standard requires this. In an embedded context, one might write to volatile memory, then enter an infinite loop and wait for a wakeup. If the write didn’t happen before the loop (and volatile isn’t atomic: it guarantees that a write happens, it doesn’t sequence the write), the signal for the wakeup might never be sent.
So, no, LLVM can’t reorder that function call if it’s in a .so
.
If eff.c
and lib.c
get linked together into a single .so
, then g
can be reordered with respect to the eff
call. I’ve determined to my own satisfaction that it would do so if it makes sense.
This is incorrect. All side effects have to happen. x+y is side-effect free, so it could be reordered whenever. If this wasn’t the case, than it would be incorrect to constant fold f(4, 5)
to call eff(); return 9
It isn’t.
Constant folding is covered by
An
actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
It can only deduce that the value isn’t used if it knows what eff()
does! Otherwise the value gets returned. The uncertainty creates a double bind. Wanting the optimization to be available doesn’t mean that it is.
As an example of what is allowed, it has this:
an implementation might perform various optimizations within each translation unit, such that the actual semantics would agree with the abstract semantics only when making function calls across translation unit boundaries. In such an implementation, at the time of each function entry and function return where the calling function and the called function are in different translation units, the values of all externally linked objects and of all objects accessible via pointers therein would agree with the abstract semantics.
“function call across translation unit boundaries”, yes, that sounds familiar.
Moreover, your argument is that a very simple optimization on what becomes five to eight machine instructions, which I’ve demonstrated can be optimized under the right conditions, isn’t being optimized under the wrong ones… why, exactly? Because everyone who contributes to both compilers just, missed it? But only when the standard doesn’t allow it, I mean, only sometimes? But not when the standard does allow it, that is, other times?
It makes more sense to me to not start with the premise that every contributor to clang and GCC is an idiot, and that leads to a conclusion that, in fact, the function barrier can’t be moved if certain information (specifically, what the function does or can do) isn’t available.
If you prefer to believe that they just screwed this up for no reason, you’re welcome to do so. Submit a patch to fix it! See what they say about it.
What interests me a great deal more is this: Zig, today, doesn’t have an abstract machine, and it doesn’t have a standard. Eventually it will have the latter, I hope, and that probably implies the former as well.
It’s a great opportunity not to repeat mistakes which were fixed in stone a good 40 years ago.
It doesn’t matter what eff
does, because x
and y
are stack variables and no pointer has been taken. eff
can’t modify them, so all optimizations are still valid. Also, even external functions still need to play by C rules. If eff
access x
or y
through some stack trickery, then eff
is incompatible with C.
So, you and @matklad have both expressed this interpretation of the C standard. I don’t think it’s correct, and have shared mine.
I got curious what was going on here. I wrote a simple test program and compiled it a few ways, and spent some time reading the latest draft of the C standard, and I’ve come to the conclusion that the simplest explanation is the correct one: a compiler can’t rearrange code in a function call across a translation unit, which the various blind pointer variations of this problem are.
Mostly, I confirmed this experimentally. When the functions are available, rearrangement (generally through inlining or dead code elimination) does happen. When the functions aren’t available, it doesn’t happen. So that demonstrates that the compiler knows how to do it, and leads to my conclusion that it’s making an affirmative choice not to do so, in order to comply with the C standard. That makes more sense to me than “both LLVM and GCC aren’t doing an obvious thing just because …” and you can fill in your reasons there, it’s hard to think of ones which aren’t fairly disparaging of the maintainers.
But the opinions of the three of us on what’s a legal compilation of a C program aren’t really load-bearing, because we don’t work on C compilers. We’re kibitzing.
So if you’d like to resolve the question definitively, there’s a simple way to do that. Two, really, that amount to the same thing.
One is to hop on the GCC or LLVM repo, and find a maintainer who is prominent enough to catch your attention. Send them a polite email, link them to this discussion, and report back what they have to say. I’ve done this kind of thing in other contexts, and I always get a reply, people like talking about what they’re expert in.
The other way is to file the f and g
code as a bug report on GCC or LLVM, and see what happens. I suggested the other way first, because I don’t think it is a bug. But if you do: performance bugs are a valid thing to put on an issue tracker. Eventually it will be triaged, and we’ll have our answer.
But I think the two of you are leaning a bit too heavily on how you think C should work. That is in fact much more interesting on a Zig discussion board! Someday Zig will have a standard, I hope: what should the standard say about this kind of rearrangement?
We’ll need an answer to that, but it can’t be based on a simple/obvious program like this one, because it needs to be a rule, not just “any code which you can eyeball and determine is ok to rearrange, should be”. Things which make the program produced from source code less predictable should be approached very carefully.
It certainly looks like it would be good to have a standard where this specific optimization is available. But I also thought it would be good to make semicolons optional in Zig, until I found an earlier issue where some counterexamples of code which is legal now, which would change meaning if semicolons were optional, were exhibited. That doesn’t make it impossible, but it makes it a breaking change: the consequences are different.
So a question like that has to survive adversarial examples, and the process needs to produce a consistent rule, one which multiple compilers can interpret the same way, so that a Zig program compiled with two different systems will always have the same effects. That’s hard to do.
I feel like the inferential distance is to high here for me to be able to explain my thinking in a relatively short amount of words, so let me just note that these two functions do in fact optimize to identical machine code under gcc:
extern void eff();
int f(int x, int y) {
int r = x + y;
eff();
return x > 0 ? 0 : r;
}
int g(int x, int y) {
eff();
int r = x + y;
return x > 0 ? 0 : r;
}
That’s certainly a suggestive find. I’m not sure what all it suggests, but it does clearly indicate that moving the calculation across potentially-undefined behavior is happening.
So I withdraw that interpretation, the section I was largely basing that on is about optimizations based on actually-undefined behavior affecting operations before undefined behavior occurs.
I still wonder why both clang and GCC, on both ARM64 and x86, do the same thing for both examples. Because another thing your example shows is that it isn’t because the compiler is missing some ability to determine when things are independent: if that claim were true, we’d see the same behavior in your new example as in the old one.
I’m leaning toward “annoying cost model which doesn’t generalize well to a three-line function” at this point, but I’m becoming interested in what an actual compiler engineer who works on one of these projects would say about it.
Now that you have two contrasting examples, you should file an issue about it with one of those compilers, and see what they have to say.
Just another data point. MSVC compiles both functions to the same machine code.
All of this is a good reminder to me that searching for a counterexample should come before searching for an explanation. I doubt at this point that there is a more concrete explanation other than “the LLVM and GCC heuristics arrive at something slightly suboptimal for one particular pair of identical functions”, possibly because the heuristic is actually better for “real programs”, and possibly because the tail of compiler optimizations is very long indeed.
I think there is a misunderstanding of the role of undefined behavior in C. The C standard doesn’t impose any requirements to programs that contain undefined behavior. So, if the program contains at least 1 instance of undefined behavior, literally everything is spoiled.
What does it mean to compiler? This statement allows compiler assuming that the compiled program doesn’t contain undefined behavior.
- If it indeed undefined behavior-free, then this assumption is true and the resulted binary is standard-compliant.
- If it actually has undefined behavior, then compiler may emit anything. Including a version of the binary that is compiled with an ussumption that the program doesn’t have undefined behavior.
So, exactly for this reason undefined behavior allows making optimizations. It may simplify the code with ussumptions like “signed integer overflow never happens” or “data races never happen”.
So your point
doesn’t imply that rearrangement is prohibited. Indeed, eff
may in fact exhibit undefined behavior, but compiler is free to assume that it does not.
By the way,
Infinite loop without side effects is also an undefined behavior.
I agree with @LucasSantos91
AFAIK, there is no non-undefined-behavior way to acces x
, y
and r
from inside eff
. Thus, compiler may assume that it doesn’t happen. Even if there is a call to exit
or an infinite loop with side effects, compiler may assume that they can’t depend of x
, y
and r
. So, compiler is free to rearrange those evaluations.
Regarding sequence points. Indeed, here
int r = x + y;
eff();
the evaluation of r
is sequenced before the call eff()
. But it doesn’t prohibit rearrangement. The compiler is free to rearrange evaluations if it doesn’t change semantics of the program. Since r
is inaccessible inside eff
, and since evaluation of r
doesn’t have side effects, semantics won’t change if r
is evaluated after the call eff()
.
Well, no, the standard was recently clarified to make it clear that undefined behavior later in the program is not actually allowed to screw up things before the undefined behavior. People got tired of dealing with this.
That’s interesting, I didn’t know that, thank you. But even with that correction, my point stands, if I’m not missing something.
You might enjoy this interview with Ben Titzer where he talks about his experience with sea of nodes.
My recollection is that the message is roughly “if you’re going to delete all the scheduling information from the original program and create your own schedule from scratch, you better be damned good at scheduling. Prepare to spend the next 5 years chasing down weird edge cases where your compiler makes code slower.”