In TigerBeetle, there’s one problem which bites us again, and again, and again — Zig makes it easy to implicitly copy large structs, and that hurts. Consider this example progam:
const Big = struct {
ballast: [4096]u8 = undefined,
small: Small,
};
const Small = struct {
ballast: [128]u8 = undefined,
};
export fn foo(xs_ptr: [*]const Big, xs_len: usize) callconv(.C) void {
const xs: []const Big = xs_ptr[0..xs_len];
for (xs) |x| {
bar(&x.small);
}
}
noinline fn bar(x: *const Small) void {
_ = x;
}
If I compile it with
$ zig build-lib stack-copies.zig -Drelease-fast -fstrip -femit-llvm-ir
then the resulting .ll
file contains
Block5: ; preds = %Then3
%16 = extractvalue { ptr, i64 } %10, 0
%17 = getelementptr inbounds %stack-copies.Big, ptr %16, i64 %12
call void @llvm.memcpy.p0.p0.i64(ptr align 1 %3, ptr align 1 %17, i64 4224, i1 false)
%18 = getelementptr inbounds %stack-copies.Big, ptr %3, i32 0, i32 1
call fastcc void @stack-copies.bar(ptr %18)
br label %Block6
That’s a needless copy of 4224
bytes! It can be fixed by changing |x|
to |*x|
, but its a subtle thing, hard to notice without looking at the asm / ll.
I know that there are some long-term approaches to improve the situation here, like pinned structs, but I am wondering if there are some short-term remedies here?
Q: Can we somehow lint our existing code to warn about large copies?
I think ideally such a lint should live in the Zig compiler, and implemented in a backend-independent way. However, I don’t think it is easy to plug custom code there — afaiu, zig doesn’t expose it’s internal IRs for analysis.
My next thought here is to look at LLVM IR. The .ll
files are easy to produce, fairly stable, and easy to work with, because they are text.
Q: Would detecting memcpys on the LLVM level work? Or are there any non-obvious problem on that path, like false positives or something?
Preliminary investigation shows that look at .ll
files can uncover at least some problems
$ λ rg -F 'llvm.memcpy' tigerbeetle.ll | rg '.*(i64 \d+).*' -r '$1' | sort -n -k 2 | tail -n 32
i64 11904
i64 11904
i64 11904
i64 11904
i64 11904
i64 12032
i64 12032
i64 12032
i64 12032
i64 12032
i64 12032
i64 13568
i64 13568
i64 13568
i64 13568
i64 13568
i64 37376
i64 37376
i64 47616
i64 47616
i64 58832
i64 60624
i64 101888
i64 101888
i64 125136
i64 255808
i64 255808
i64 255856
i64 255856
i64 255856
i64 265248
i64 265248
We do indeed seem to copy quarter-of-a-megabyte here and there! Manually looking at the last entry even attributes it to this specific line:
What happens here is that we want to intialize a struct with many fields, so we use self.* = .{}
syntax to statically check that all fields are set. But, because the struct is large, and we need stable addresses, some fields are initialized separately using self.state_machine = try StateMachine.init(...)
, and then are set-to-themselves in this last line which produces memcopies.
I am thinking about automating this process, for which I need to find all .ll
functions which call memcpy
with large comptime-know length argument, demangle them, and print their names and locations. I think llvm-ir-analysis can help here, but that is in Rust, and I’d rather we stick to a single language in TigerBeetle
Q: is there an existing Zig library which allows me to conveniently load .ll
files, build call-graphs, and run ad-hoc analysis?
Another other thoughts or ideas here?