Just heard a rumor that LLVM toolchain is much well optimized towards C++ and not other programming languages which includes Rust and our very beloved Zig. Is it true? Please only answer if you understand the LLVM back-end really well.
Likely true, simply given the vastly longer time LLVM has been building older languages like C and C++ than it has been building Rust or Zig.
I would lean towards saying that that’s a nonsense statement. C++, Rust and Zig are all going to be compiled into LLVM IR. If you express the same thing in each language, it could be translated to the same LLVM IR, but it might not, depending on the compiler implementation. LLVM might be “used to” C/C++ in the sense that the specific way Clang works is most compatible with certain optimizations and a new language might not copy the exact same way Clang works. So the question kind of has it backwards. It’s not about LLVM so much as it is the language compiler that lowers to LLVM IR.
Not an expert on everything but here are some things I’ve encountered:
Clang sets “noundef” on (all?) function parameters for some reason, Zig does not, which for some reason disables certain optimizations when exporting a function (if it got inlined, it should work fine). (I think Zig is probably the one that’s technically in the right here, not C/C++)
Zig also has a few optimization bugs that need to be fixed to get some runtime performance back. E.g. right now packed structs are emitted in the IR as having their size rounded up and the padding bits are not adequately considered undefined or even zeroed.
That said, Zig definitely has faster default semantics than C in some cases. See: Zig: great design for great optimizations - Zig NEWS
Also Zig does what other people call “Link Time Optimization”, and static binaries are the default too.
Zig also has native vector types, which gives a much nicer way of interacting with LLVM primitives than every other language offering.
I also think Zig shows you performance problems that are hidden in other languages. Recently I was reading about people complaining that in Zig you have to recursively deallocate complex data structures and the compiler doesn’t automatically do it for you. Of course, when I read that, I thought, yeah, that sucks, use an arena instead lol.
In my view you should let Zig push you away from object-oriented thinking and start thinking in terms of groups of objects, dealing with large buffers at once instead of individual objects at once, and maybe try out a MultiArrayList, and think in SIMD terms if possible. So Zig may also be faster in terms of how it pushes you to think/program, and where it applies friction. Of course, if you knew what you’re doing already you could have done it in C, I just happen to think it’s way easier to express in Zig.
But the pessimist in me would point out that LLVM itself is missing a lot of basic optimizations, it uses a lot of optimization algorithms that seem to be obviously not the best choice, and it doesn’t even try to provide good support for a lot of architectures. It never included a SWARizer, it only recently after I’ve been nagging people has what I’d consider the most bare minimum support for vector compression and expansion for x86 and RISC-V. But all backends lack support for some of the most basic SIMD operations, with PowerPC’s being the most embarrassing. And, a pet peeve of mine is that we’re still in the “thinking about it” phase of adding pdep
and pext
support as architecture-agnostic primitives.
So people that like to act like LLVM is magic or nearly perfect are wrong on that point too.
The Zig plan is for LLVM to become optional. You cannot assume the Zig will forever be based on LLVM.
YMMD
That’s true, but there’s a wide achievement gap between Zig having a fast debug-mode compiler for the bigger chipsets, and Zig having a native compiler which can target everything LLVM targets, and emits optimized code which is competitive with LLVM.
We’d love to see both! But I expect that how well LLVM can compile Zig will be relevant for quite some time to come.
For sure, and probably effectively forever because of the huge variety of embedded targets.