Why does the tag on my union(enum) take 16 bytes?

netvor · January 17, 2025, 4:22pm

Coming to low-level programming just recently (Zig is the gateway for me), I still can’t wrap my head around alignment.

This has been bugging me for last few days.

So I have this tagged union with tag of a few members, and just save it in the middle of a buffer:

const std = @import("std");

const Tag = std.meta.Tag;

const Thing = union(enum(u4)) {
    foo: void,
    bar: void,
    baz: u16,
    pyramid: struct {
        cc: u8,
        dd: u16,
        ee: u16,
        ff: u116,
    },
};

const MEM = 48 * 4;

pub fn main() !void {

    // some meta info
    std.log.err("@alignOf(u116)={},        @sizeOf(u116)={}", .{ @alignOf(u116), @sizeOf(u116) });
    std.log.err("@alignOf(u16)={},          @sizeOf(u16)={}", .{ @alignOf(u16), @sizeOf(u16) });
    std.log.err("@alignOf(u8)={},           @sizeOf(u8)={}", .{ @alignOf(u8), @sizeOf(u8) });
    std.log.err("@alignOf(Tag(Thing))={},   @sizeOf(Tag(Thing))={}", .{ @alignOf(Tag(Thing)), @sizeOf(Tag(Thing)) });
    // get memory
    var buff: [MEM]u8 = .{0xAA} ** MEM;
    var fba = std.heap.FixedBufferAllocator.init(&buff);
    var ts = try fba.allocator().alloc(Thing, 4);

    // write a thing
    ts[1] = Thing{ .pyramid = .{
        .cc = 0xCC,
        .dd = 0xDDDD,
        .ee = 0xEEEE,
        .ff = 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFF,
    } };

    // dump
    const stdout = std.io.getStdOut();
    try stdout.writeAll(&buff);
}

When I dump the memory and run it through hexdump -C:

@nauron:~$ zig run utags.zig | hexdump -vC
error: @alignOf(u116)=16,        @sizeOf(u116)=16
error: @alignOf(u16)=2,          @sizeOf(u16)=2
error: @alignOf(u8)=1,           @sizeOf(u8)=1
error: @alignOf(Tag(Thing))=1,   @sizeOf(Tag(Thing))=1
00000000  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00000010  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00000020  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00000030  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff 00 00  |................|
00000040  dd dd ee ee cc 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  03 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00000070  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00000080  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00000090  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
000000a0  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
000000b0  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
000000c0

I wonder: why does the tag – the 03 on the address 00000050 take up the whole 16 bytes?

My current understanding of alignment is limited and perhaps even misguided. Please bear with me, and feel free to point me to other materials, etc…

So by my understanding, alignment problem exists because the CPU can effectively only address chunks of memory of certain size, starting at certain places of the memory. Therefore a layout where a value crosses that boundary is guarranteed to be ineffective: the CPU would need to load both chunks and work on a combined result.

This does not mean that having more values within that chunk is necessarily as ineffective; if I want to alter just one of the values, I can imagine the CPU loading the chunk and just altering the relevant part. Hence it’s OK for my .dd, .ee. and .cc fields be stored just next to each other.

But what I can’t explain is why the tag 03 is stored separately, taking up the whole chunk for itself?

Why are we not seeing a layout like this instead:

...
00000020  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00000030  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff 00 00  |................|
00000040  dd dd ee ee cc 03 00 00  00 00 00 00 00 00 00 00  |................|
00000050  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
00000060  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
...

…which would have the whole Thing take only 32 bytes vs. 48 bytes, right?

Edit: I forgot to add: this is Zig 0.13.0 on Debian 12, x86_64.

IntegratedQuantum · January 17, 2025, 5:10pm

The problem is that each struct/union is treated as a separate, opaque unit.
Your inner struct has a size of 32 bytes and an alignment of 16 bytes, and the compiler assumes that the full 32 bytes are used.
I think the reason for this is that you might take a pointer to &thing.pyramid and write to it, potentially overwriting the padding bytes as well (otherwise the compiler might need to do extra work to reserve the padding bytes on every write to every pointer of any struct).

In your case you could use packed struct and packed union to ensure that your structs are tightly (bit-) packed in memory.