Padding struct to fixed size with usable bytes

Justus2308 · May 16, 2025, 11:10pm

Im trying to pad a struct with auto layout to a certain size (in this case to a cache line) and I want to use the padding as a buffer for allocations.
My code currently looks like this:

const Location = struct {
    state: std.atomic.Value(u32),
    descriptor: Descriptor, // has `auto` layout

    pub const Allocatable = struct {
        _: void align(std.atomic.cache_line) = {},
        location: Location,
        bytes: [bytes_left_in_cache_line]u8 = undefined,

        pub const bytes_left_in_cache_line = (std.atomic.cache_line - @mod(@sizeOf(Location), std.atomic.cache_line));
    };
    comptime {
        assert(@alignOf(Location.Allocatable) == atomic.cache_line);
        assert(@mod(@sizeOf(Location.Allocatable), atomic.cache_line) == 0);
    }
};

This works, but I am not using all the bytes that are actually free (at least on 64-bit systems) since Descriptor is pointer-aligned and I need Allocatable as an otherwise unnecessary wrapper.
I guess the obvious solution would be to make Location extern and use some @sizeOf()/@alignOf() calculations to get bytes_left_in_cache_line and make bytes a field in Location directly, however this would be pretty high-maintenance and I couldn’t have structs with auto layout inside Location.
Constructing a type from scratch at comptime also isn’t really an option because I want to have declarations.

Is there some straightforward way to achieve what I want that I’m missing?

(apologies for the duplicate topic/double post, the first one was on accident I’ve already flagged it)

Sze · May 17, 2025, 1:19am

Why are you putting the align on a void field instead of on the location field directly? With field order being allowed to be arbitrary that seems brittle.

Can you explain in simple terms what your goal is, to make sure this isn’t some xy problem?

Could you use something like this instead?:

const std = @import("std");

const CacheLine = struct {
    bytes: [std.atomic.cache_line]u8 align(std.atomic.cache_line),

    pub const zero: CacheLine = .{ .bytes = @splat(0) };

    pub inline fn get(self: *CacheLine, comptime T: type) *T {
        return std.mem.bytesAsValue(T, &self.bytes);
    }
};

const Location = struct {
    state: std.atomic.Value(u32),
    descriptor: Descriptor, // has `auto` layout

    pub const nowhere: Location = .{ .state = .init(0), .descriptor = .nowhere };
};

const Descriptor = struct {
    foo: u32,
    bar: u32,

    pub const nowhere: Descriptor = .{ .bar = 0, .foo = 0 };
};

pub fn main() !void {
    var cl: CacheLine = .zero;
    cl.get(Location).* = .nowhere;
}

I haven’t done any simd programming yet, but it seems easier to me to keep the cacheline aspect separate from what is placed within the cacheline and then use the get method and bytesAsValue to get a view into the cacheline with the right type. Similar to the get method you could have an getBytesAfter method which then returns the slice of the remaining bytes.

Or you even could have a method in Location that takes a self: *Location and then accesses the bytes behind itself (if it is assumed that this method will only be called on Location instances that are located at the beginning of a cacheline).

Justus2308 · May 17, 2025, 6:04am

This is actually something I picked up from the SmpAllocator source code. My rationale for it is that it would be unreasonable for the compiler to place the void field anywhere but at the base of the struct (I should probably double check this properly at comptime) and it still gives the compiler maximum flexibility in choosing the optimal field order because no actual field is ‘attached’ to the alignment of the entire struct which is ok since I really don’t care about field order here as long as everything is within the same cache line. I also think it’s easier to read/understand intent.

I have a bunch of Locations which will be accessed by many threads at once. Every thread is allowed/expected to modify state. To avoid false sharing I want to give every Location its own cache line. I have to encode this into the type itself since the Locations are allocated/managed by a std.HashMap.

Location only takes up 12-20 bytes and interacting with descriptor involves an allocation anyways, so I want to use the padding bytes I produce by giving Location its own cache line as a buffer to allocate into and only fall back to a separate heap allocation if the buffer is too small.
To achieve this I need to know how many bytes I have to work with.

I guess the core of my question is: How can I calculate the amount of bytes left in a struct until it hits a certain size while I am still defining it?
If I first define a struct with all the fields I want and then calculate the amount based on its size (like I’m currently doing) I might be missing out on some bytes the struct is now using for internal padding which I could have used to store data otherwise.
If I try to make bytes a field of Location directly I can never be sure how many bytes I actually have left (or can I?) because the memory layout of Location is not guaranteed.
If I make Location have a defined memory layout by making it extern I can certainly achieve what I want but then I’m back to C and I would have to be very careful if I ever change anything about the struct fields/their types.

Is there a ‘Zig’ way to achieve this?

I think this is an improvement over my code because it’s easier to understand what the goal is but this solution still suffers from the problem that the padding/buffer is not part of Location anymore and cannot use the padding bytes inside Location to store information.
I basically want a memory layout like this:

[Location]----------------------------------------...
[descriptor]     [state]  [bytes]
| _ _ _ _ _ _ |  | _ _ |  | _ _ _ _ _ _ _ _ _ _ _ ...

instead of this:

[CacheLine]------------------------------------------------...
[Location]-----------------------|
[descriptor]     [state]           [bytes]
| _ _ _ _ _ _ |  | _ _ |  x x x x  | _ _ _ _ _ _ _ _ _ _ _ ...

xash · May 17, 2025, 7:47am

You can use a function instead of a field to get the remaining padding.

const std = @import("std");

const Location = struct {
    _: void align(std.atomic.cache_line) = {},
    state: std.atomic.Value(u32),
    descriptor: Descriptor,

    const last_used_byte = blk: {
        var max: usize = 0;
        for (std.meta.fieldNames(Location)) |field_name| {
            const end = @offsetOf(Location, field_name) + @sizeOf(@FieldType(Location, field_name));
            max = @max(max, end);
        }
        break :blk max;
    };
    const trailing_bytes_size = @sizeOf(Location) - last_used_byte;

    pub fn trailingBytes(loc: *Location) *[trailing_bytes_size]u8 {
        const bytes: [*]u8 = @ptrCast(loc);
        return bytes[last_used_byte..@sizeOf(Location)];
    }
};

Hacky, unreliable version (but we’re trusting Andrew for placement of _: void align(std.atomic.cache_line) = {},, too (-: ) to get it as a field, thus hoping the auto layout doesn’t do something too surprising:

const Location2 = struct {
    _: void align(std.atomic.cache_line) = {},
    state: std.atomic.Value(u32),
    descriptor: Descriptor,
    bytes: [trailing_bytes_size]u8,

    const last_used_byte = blk: {
        const Layout = struct {
            state: std.atomic.Value(u32),
            descriptor: Descriptor,
        };

        var max: usize = 0;
        for (std.meta.fieldNames(Layout)) |field_name| {
            const end = @offsetOf(Layout, field_name) + @sizeOf(@FieldType(Layout, field_name));
            max = @max(max, end);
        }
        break :blk max;
    };
    const trailing_bytes_size = @sizeOf(Location) - last_used_byte;
};

But I’d stick with the first one.

Justus2308 · May 17, 2025, 8:09am

Thanks, this is great, didn’t think of using @offsetOf()! Having a function instead of a field is just fine for my use case.
Welcome to Ziggit btw

Justus2308 · May 17, 2025, 11:48pm

Unfortunately using a function and @ptrCast() doesn’t work after all since the compiler is allowed to do whatever it wants to the padding bytes of a struct (as confirmed by the man himself).
However your second suggestion has inspired me to come up with something which is still pretty cursed but at least avoids playing Russian roulette with the compiler (it’s playing Breakout with it instead):

const Location = blk: {
    // For good measure
    @setEvalBranchQuota(std.atomic.cache_line);
    var estimated_padding_size = std.atomic.cache_line;
    var Attempt = LocationType(estimated_padding_size);
    const initial_size = @sizeOf(Attempt);
    assert(initial_size > std.atomic.cache_line);
    while (@sizeOf(Attempt) == initial_size) {
        estimated_padding_size -= 1;
        Attempt = LocationType(estimated_padding_size);
    }
    assert(@alignOf(Attempt) == std.atomic.cache_line);
    break :blk Attempt;
};

fn LocationType(comptime padding_size: usize) type {
    return struct {
        _: void align(std.atomic.cache_line) = {},
        state: atomic.Value(u32),
        descriptor: Descriptor,
        padding_bytes: [padding_size]u8 = undefined,
    };
}