Documenting the awful code I unintentionally wrote to initialize a giant sparse buffer

I’m not quite sure which category to put this topic in; this topic is just to share a mistake I accidentally made while coding.

When I realized that the binary I generated was hundreds of Mb, I was sure something was wrong. After troubleshooting, I finally found the problematic code, something like this:

// sparse_bad.zig
const Sparse = struct {
    array: [1 << 20]struct {
        udf: [120]u8,
        important: u64,
    },
    other: u128 = 0,
    pub const init: @This() = .{ .array = @splat(.{ .udf = undefined, .important = 0xFFFF_FFFF }) };
};
pub fn main() !void {
    const gpa = std.heap.page_allocator;
    const sparse = gpa.create(Sparse) catch unreachable;
    sparse.* = .init;
    defer gpa.destroy(sparse);
}

const std = @import("std");

pub fn panic(msg: []const u8, error_return_trace: ?*std.builtin.StackTrace, ret_addr: ?usize) noreturn {
    _ = msg;
    _ = error_return_trace;
    _ = ret_addr;
    std.os.linux.exit(1);
}

The panic function overrides the original implementation just to reduce interference from code outside the problematic code.
The problem code is in sparse.* = .init;. When I do this, the entire initial structure with sparse data is actually embedded in the .rodata of the binary file, and this initialization actually performs a giant data copy.

Changing it to the following in-place construction solves the issue.

// sparse_good.zig
const Sparse = struct {
    array: [1 << 20]struct {
        udf: [120]u8,
        important: u64,
    },
    other: u128 = 0,
    fn init(self: *@This()) void {
        self.* = .{ .array = undefined };
        for (&self.array) |*a| {
            a.important = 0xFFFF_FFFF;
        }
        return;
    }
};
pub fn main() !void {
    const gpa = std.heap.page_allocator;
    const sparse = try gpa.create(Sparse);
    sparse.init();
    defer gpa.destroy(sparse);
}

const std = @import("std");

ReleaseSmall also cannot handle this huge read-only data copy. This was a lesson for me.

npc1054657282@localhost:~/projects/zig-arena-benchmarks$ zig build-exe sparse_bad.zig -OReleaseSmall -fsingle-threaded
npc1054657282@localhost:~/projects/zig-arena-benchmarks$ zig build-exe sparse_good.zig -OReleaseFast
npc1054657282@localhost:~/projects/zig-arena-benchmarks$ ls -lh
total 132M
-rwxrwxr-x 1 npc1054657282 npc1054657282 129M Apr 17 17:59 sparse_bad
-rw-rw-r-- 1 npc1054657282 npc1054657282  628 Apr 17 17:34 sparse_bad.zig
-rwxrwxr-x 1 npc1054657282 npc1054657282 3.4M Apr 17 18:00 sparse_good
-rw-rw-r-- 1 npc1054657282 npc1054657282  498 Apr 17 17:34 sparse_good.zig
3 Likes

This is exactly what I would expect given the code you wrote. Well worth documenting as a warning to others! But you did say “make me a constant of this huge thing, I’m going to use it later”, so Zig did that for you.

Separate question whether perhaps the language should take the opportunity to ‘inline’ the use, because of all the undefined in it. Maybe?