U8 alignment and storing data types as bytes

kALLEBALIK · October 2, 2024, 5:34am

So I’m writing data structure that can store data types in []u8’s as raw bytes. But I’m wondering how to handle alignment.

My idea is to group types by alignment and basically store them in a hashmap where the key is the alignment to reduce fragmentation. This means the type that stores each group can’t be generic because then it can’t go into a hashmap.

Since i have access to the type when i add data to each group i can use alignedAlloc but i cant free that memory later without getting an error: Allocation alignment 4 does not match free alignment 1 (because i no longer have access to the type in the deinit function).

I don’t know if it’s reasonable to maybe just store the []u8 as []align(8) u8 or []align(16) u8 (which seems to be the biggest alignment on my platform).

Right now i just do it like this:

const mem = try allocator.alloc(u8, new_size);

And later i just cast it back

const a1: *A = @alignCast(@ptrCast(the_slice));

When i think about it i don’t even know why this works. Why does @alignCast not assert? Is it because the allocator always return a block thas is the biggest supported alignment an therefore its safe to cast it to a smaller alignment?

I don’t know what the correct way is to go about this and would be thankful for any help you can provide.

LucasSantos91 · October 2, 2024, 9:26am

It does assert.

That’s not guaranteed. You should really use alignedAlloc here. You’re probably getting your memory from either malloc or page allocator. Malloc is always aligned to the worst case primitive, and page allocator is always page aligned, so it’s probably working by accident.

I think the best thing would be to not do this. If you can explain what your program is trying to achieve, we might be able to find a better solution. Maybe what you need is just a bunch of ArrayLists, one for each type.

kALLEBALIK · October 2, 2024, 10:04pm

Thanks for your reply. Yeah i guessed that it worked by accident, thanks for confirming my suspicion.

What I’m trying to do is store data types that might only appear once, is created by the user and is added to the storage during runtime. That is the reason for not creating an ArrayList for each type, i simply don’t know what types might be stored.

I’ve been thinking about getting the alignment from the ptr address and then just pad the slice with a couple of bytes in the beginning.

I feel like creating raw data storage should be a quite common problem. Maybe it’s more common for non zig types where alignment is not a problem, or where you actually parse the data and create a type from that.

I found one hacky solution to free the memory but i don’t like it.

const alignment: Log2PageSizeType = @intCast(std.math.log2(self.alignment));
switch (alignment) {
    inline else => |a| {
        allocator.free(@as([]align(@as(usize, 1) << a) u8, @alignCast(self.memory)));
    },
}

Thanks for taking your time to answer my question.

david_vanderson · October 2, 2024, 10:31pm

What we do in dvui is store the alignment alongside the data. We aren’t trying to super optimize it yet.

Here’s the SavedData struct that shows how we free it:

github.com

david-vanderson/dvui/blob/4e2bbf0210e137e6a105f808aa17476d4296584a/src/dvui.zig#L2194


      
              used: bool = true,
              modal: bool = false,
              stay_above_parent_window: ?u32 = null,
          };
          
          const SavedSize = struct {
              size: Size,
              used: bool = true,
          };
          
          const SavedData = struct {
              used: bool = true,
              alignment: u8,
              data: []u8,
          
              type_str: if (builtin.mode == .Debug) []const u8 else void = undefined,
              copy_slice: if (builtin.mode == .Debug) bool else void = undefined,
          
              pub fn free(self: *const SavedData, allocator: std.mem.Allocator) void {
                  if (self.data.len != 0) {
                      allocator.rawFree(self.data, @ctz(self.alignment), @returnAddress());

And here’s how we create them:

github.com

david-vanderson/dvui/blob/4e2bbf0210e137e6a105f808aa17476d4296584a/src/dvui.zig#L3305


      
                          try pathStrokeRaw(ps.closed, ps.thickness, ps.endcap_style, ps.color);
                          ps.path.deinit();
                      },
                  }
              }
          
              queue.clearAndFree();
          }
          
          // data is copied into internal storage
          pub fn dataSetAdvanced(self: *Self, id: u32, key: []const u8, data_in: anytype, comptime copy_slice: bool, num_copies: usize) void {
              const hash = hashIdKey(id, key);
          
              const dt = @typeInfo(@TypeOf(data_in));
              const dt_type_str = @typeName(@TypeOf(data_in));
              var bytes: []const u8 = undefined;
              if (copy_slice) {
                  bytes = std.mem.sliceAsBytes(data_in);
                  if (dt.Pointer.sentinel != null) {
                      bytes.len += @sizeOf(dt.Pointer.child);
                  }

Hope that helps!

kALLEBALIK · October 2, 2024, 10:52pm

So you are using rawFree?

I took this to heart:

/// This function is not intended to be called except from within the
/// implementation of an Allocator

What are the risks of this and what should i think of if i use rawFree?
Ah and here you are counting traling zeros @ctz(self.alignment) to get the log 2 base?

Sze · October 2, 2024, 11:04pm

I am currently using this in my work in progress ecs:

fn alignedAlloc(allocator: std.mem.Allocator, alignment: u29, n: usize) ![]u8 {
    return switch (alignment) {
        inline 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096 => |a| allocator.alignedAlloc(u8, a, n),
        else => unreachable,
    };
}
fn alignedFree(allocator: std.mem.Allocator, alignment: u29, memory: []u8) void {
    return switch (alignment) {
        inline 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096 => |a| allocator.free(@as([]align(a) u8, @alignCast(memory))),
        else => unreachable,
    };
}

I am also using a trick I learned from the implementation of MultiArrayList where it sorts segments of continous arrays of one size sorted by their alignment size from biggest alignment to smallest alignment.

That way you can just allocate one big batch of memory using the biggest alignment because after the array of values with alignment 8, you will then have the array of values with alignment 4, but the first of those will actually have alignment 8 which is okay.

kALLEBALIK · October 2, 2024, 11:09pm

Yeah this seems similar to my “hacky solution”

david_vanderson · October 2, 2024, 11:09pm

I’m not sure what the risks are - if it breaks we’ll change. I can’t quite remember why I settled on using rawFree - I was having trouble using free (maybe the same issues you are having).

You’re right on how we are using @ctz to get the log2 base.

Also note that if you accept slices, you need to handle the possible sentinel yourself. I have a PR open for this: sliceAsBytes: include sentinel by david-vanderson · Pull Request #19984 · ziglang/zig · GitHub

dimdin · October 2, 2024, 11:32pm

There is an easy way to detect if a number is a power of two. Subtract one from the number to get a binary mask, then use “binary and” between the mask and the number. If the result is zero then the number is a power of two.

    assert(alignment & (alignment - 1) == 0);

gonzo · October 3, 2024, 12:02am

You can also count the number of 1 bits – it will be one for a power of two. @popCount does this efficiently in Zig.

kALLEBALIK · October 3, 2024, 12:13am

Seems like my “hacky solution” is quite ok.

I tested my solution in godbolt to see the generated code.

Since zig doesn’t handle alignment over the minimum page size we can use std.mem.page_size as the maximum alignment.

// The Zig Allocator interface is not intended to solve alignments beyond
// the minimum OS page size. For these use cases, the caller must use OS
// APIs directly.

Link to comment

code (godbolt test)

export fn alignedAlloc(alignment: u32) usize {
    const Log2PageSizeType = std.math.IntFittingRange(0, std.math.log2(std.mem.page_size));
    const aligned: Log2PageSizeType = @intCast(std.math.log2(alignment));
    var b: usize = 0;
    switch (aligned) {
        inline else => |a| {
            b = 1 << a;
        },
    }
    return b;
}

generated assembly

alignedAlloc:
        lzcnt   eax, edi
        and     eax, 15
        xor     eax, 7
        mov     rax, qword ptr [8*rax + .Lswitch.table.alignedAlloc]
        ret

.Lswitch.table.alignedAlloc:
        .quad   256
        .quad   512
        .quad   1024
        .quad   2048
        .quad   4096
        .quad   8192
        .quad   16384
        .quad   32768
        .quad   1
        .quad   2
        .quad   4
        .quad   8
        .quad   16
        .quad   32
        .quad   64
        .quad   128

So the solution could look something like this:

pub fn alignedFree(allocator: Allocator, alignment: u29, memory: []u8) void {
    const Log2PageSizeType = std.math.IntFittingRange(0, std.math.log2(std.mem.page_size));
    const aligned: Log2PageSizeType = @intCast(std.math.log2(alignment));

    assert(std.mem.isValidAlign(alignment));
    assert(alignment <= std.mem.page_size);

    switch (aligned) {
        inline else => |a| {
            allocator.free(@as([]align(@as(usize, 1) << a) u8, @alignCast(memory)));
        },
    }
}

pub fn alignedAlloc(T: type, allocator: Allocator, alignment: u29, n: usize) ![]u8 {
    const Log2PageSizeType = std.math.IntFittingRange(0, std.math.log2(std.mem.page_size));
    const aligned: Log2PageSizeType = @intCast(std.math.log2(alignment));

    assert(std.mem.isValidAlign(alignment));

    switch (aligned) {
        inline else => |a| {
            if (builtin.mode == .Debug or builtin.mode == .ReleaseSafe) {
                if ((1 << a) <= std.mem.page_size) {
                    return try allocator.alignedAlloc(T, 1 << a, n);
                }
                panic("Alignment to big, maximum alignment is: {}.", .{std.mem.page_size});
            } else {
                return try allocator.alignedAlloc(T, 1 << a, n);
            }
        },
    }
}

Thanks for all the input, appriciate it!

Sze · October 3, 2024, 1:50am

I know, I just got some kind of error for alignments greater than 4096, I think it may have been something about not being able to allocate things with an alignment that is bigger than page size, I didn’t really care at the moment and decided to just hard code some sizes, because for that use case I don’t expect any bigger values in the near future anyhow and if that changes that code needs to be investigated in more detail.

I may have misunderstood the error but I was busy looking into other things, so hardcoding it at the time was practical.

kALLEBALIK · October 3, 2024, 1:58am

Yeah, you need a custom allocator if you want alignment bigger than page_size. I cant think of when this would even happen.

dimdin · October 7, 2024, 3:18pm

I always forget how good the zig standard library is: std.math.isPowerOfTwo

mnemnion · October 8, 2024, 3:03am

Pointer compression. If you get a big block of memory aligned to 2^24, you can store the common offset and keep pointers in a u32 with eight bits left over for tagging. Doesn’t come up much† but when you need it it can come in handy.

† Fairly common in JavaScript engines, although they tend to use 32 bits of the pointer.

kALLEBALIK · October 8, 2024, 9:49am

You also have, std.mem.isValidAlign