a bit unfortunate the camera was on me during most slide transitions - it kinda made things awkward. still, thanks for watching ![]()
Excellent talk, and nice touch on the as above, so below āatthairā
The c++ critique was on point. ![]()
Great presentation as always.
Really good presentation.
My knowledge is a bit lacking when it comes to discerning the optimal methodology to get the best machine output, but this talk really made it simple to understand for this particular use-case, and answered the āwhyā that my mind was failing to fully grasp.
Great talk, organized, fun and delivered well.
Iām not sure I understand what you mean by this, or how it works.
Sink supports vectors and splats, including together. Now a splat means to repeat the last buffer n times, which In short summary means you can logically send a memset across a chain of syncs without redundantly writing out and copying that memory.
I understand that the memory can be pipelined so that memory isnāt copied⦠it kinda makes sense in my head, but I donāt think i have a good mental grasp on it. Perhaps playing around with the IO interface will give more intuition on it.
I always enjoy a little ostream of C++ trashing. Really cool presentation
really nice lecture!
instructions unclear my stdio is now full of shit
Regarding the C + MUSL + putc example and not being able to inline across compilation units, I wonder what it looks like when MUSL is linked statically and LTO is enabled (which is good and āslimā enough these days that I generally use it on all my C projects in release mode).
Usually LTO manages to strike a good balance between inlining across compilation units while at the same time reducing binary size (which sounds like an impossibility, but only until one considers that LTO also strips all dead code and data, and also allows to eliminate code thatās only to be discovered dead after inlining)ā¦
Speaking of size, I currently have a curious binary size puzzle where this project (GitHub - floooh/sokol-zig-imgui-sample: Sample to use sokol-zig bindings with Dear ImGui) compiled with ārelease=small on macOS yields a native executable that is almost twice as big as the WebAssembly āexecutableā (700 vs 400 KBytes all uncompressed sizes).
Both link statically with the C++ stdlib and Dear ImGui (which should be the biggest contributors to size). The main difference is that the WASM build is linked with the Emscripten linker (which does much more than linking, like also running wasm-opt), vs the Zig linker.
For comparison, the equivalent C project build with cmake MinSizeRel mode is 515 KBytes, but this links the Macās system C++ stdlib dynamically, so it makes sense that it is smaller than the Zig build (but not that it is bigger than the WASM build).
Iāll have to investigate why the WASM build is actually so surprisingly smallā¦
ā¦of course WASM byte code might simply be more compact than ARM64 machine code⦠but nearly twice as much?
FWIW, I think one of the ābugsā of Rust toolchain is that --release does not imply thin lto by default: Is "`#[inline(const)]`" possible? - #9 by matklad - compiler - Rust Internals
It just makes sense from the compilation-model point of view.
Can we find the slides anywhere to fix it ourselves? ![]()
I was thinking about doing a blog post version of the talk
I have a couple questions about the language comparison part:
-
When talking about the C musl interface, you say
I can confirm that the buffering at least happens before it calls the function pointers
Then later you say this about many languages, including C:
None of these mainstream languages manage to get buffering into their interfaces
Is this a mistake?
-
When analysing Rustās Writer, you say
I did notice that Rust was extraordinarily good at devirtualization [ā¦] Thatās cheating though, this analysis is specifically for the cases where the stream implementation is runtime known.
But you commend C, C++ and Go for avoiding virtual/indirect calls. How would these other languages be able to avoid indirect calls without devirtualization, and how would the perform devirtualization if the stream implementation were runtime known?
Either this is an unfair comparison, or I donāt understand devirtualization well enough. Iām hoping itās the latter.
musl libc does get the buffer into the struct, which almost succeeds at being transparent to the optimizer, but the C language then falls short due to all the libc functions being across a compilation unit boundary. Related, we just saw in the news FILE became opaque in OpenBSD which takes it even a step further away from being in the interface.
The key consideration is about the hot path of I/O methods that only operate on the buffer and do not make vtable calls. The functions in the vtable will be runtime-known, however the hot path logic that operates on the buffer should be fully concrete, optimized code, with no virtual function calls.
In my analysis, Rust was good at devirtualization when it had access to all the code statically in the same crate. But if the stream implementation was across a crate boundary, then even the hot path accessing the buffer went through an indirect call.
However, there was a bigger thing that I missed in my analysis, pointed out by @matklad, which is io::BufWriter<dyn io::Write>. I guess people donāt really do this in Rust since devirtualization usually does the trick, but I imagine it could be a valuable pattern when used across crate boundaries.
This makes things clearer, thank you.
This still confuses me because Rust doesnāt buffer in the interface, but Iām guessing youāre talking about provided method like write_all which presumably uses indirect calls to write/flush but should itself be optimizable?
Shouldnāt that be taken care of by LTO though? I donāt think that the concept of ācompilation unitsā is all that relevant in C anymore, the only remaining āoptimization boundariesā are syscalls and calling into DLLs.
It would be if C compilers provided their standard libraries in source form like Zig does. However, they donāt! They ship them pre-compiled, which is also why they canāt cross-compile.
They also often dynamically link, and you certainly canāt inline a function that isnāt provided until runtime.
even if they did provide the source, it would be at link time, which I assume might be less capable of optimisations than the compiler. I assume they donāt do nearly as much analysis as a compiler.
Or am I underestimating linkers.