Allocator.createMany

I’ve been translating some C code, and it makes heavy use of a data design pattern which would be a good addition to Zig.

The way it works is, say you have a Frob type, and that has a pointer to a char * “array”, zero terminated: a string, basically.

So you malloc sizeof Frob + n where n is your chars, all at once, and cast it to a *Frob. Then you populate the Frob, and you get at the byte array next, with (char *)frob[1]. That works because C is crazy.

The point is that you need two blocks of data, which will be traveling around together, and you get them back as one contiguous chunk. Since malloc in C has absolutely no idea what a type is, you just give free the Frob pointer and it junks your string as well.

Now, Zig has an actual type system, so we can’t quite do it that way. Which is where we’re suffering for it: you’d have to request raw bytes, cast regions of them into discrete pointers, then get rid of all of it: and the allocator is going to want the whole thing gone at once, which would be… painful. You can do it, yes (maybe?), but what you’re doing is clearly illegal: you’d cast the lead pointer to a byte slice, then count your slice at the end and add that to the byte slice’s length, then hand it all off to free. It should work but I don’t like it, and it’s not clear that all allocators are even going to put up with that. Violates the contract is what I’m saying.

What I want is to be able to do this:

const frob, const frobname = try allocator.createMany(.{Frob, [14]u8});
frob.kind = .fraggle;
frob.name = frobname;
@memcpy(frob.name, buf[0..14]);
// Then later:
try allocator.destroyMany(.{frob, frob.name});
// That could be disguised within
// frob.destroy(allocator);

Where the contract specifies that any bytes given to a destroyMany must come from a call to createMany of the same tuple type. So the allocator can just add up all the memory and junk it.

Note that create can’t really do this, it would return a pointer to an anonymous tuple, when what you want is two pointers, one skinny one fat. You can dereference that, assign const frob = &retval.@"0";, but it’s ugly stuff, and that’s before you try and get rid of it. I’d rather use an API.

It’s a good technique, minimizes allocations and fragmentation, keeps data together. We should be able to do that in Zig, and it should be easier and cleaner than the C equivalent.

6 Likes

You probably want frob.name to be a function instead that returns pointer to the offset from the struct pointer to avoid paying for a cost of a pointer field.

I think it’s nice idea though.

2 Likes

Indeed, sometimes this technique is used that way and the pointer is created on demand. Other times it isn’t, I didn’t want to distract from the main point by going on that tangent but I’m glad you mentioned it.

The code needed to do this is much more verbose than (char *)frob[1] in Zig :slight_smile: that much is not a bad thing. Zig also likes slices more than zero-terminals, so something needs to keep the length around.

Since this would all be implemented in userland anyway, an approach you could take is writing it as a third-party library first.

As a library I’d imagine the usage would be something like:

const frob, const frobname = try createMany(allocator, .{Frob, [14]u8});

// ...

destroyMany(allocator, .{frob, frob.name});
2 Likes

I have considered this!

…I have a few thousand lines of C left to translate though, so maybe not over this weekend.

2 Likes

This may be of interest

1 Like

That approach wouldn’t work when you have multiple fields that needs to be allocated, a situation where such a function is most helpful.

I would do it like this:

const Avenger = struct {
    real_name: []u8,
    superhero_name: []u8,
    powers: [][]u8,
};

fn createAvenger(...) {
    const avenger = try allocator.createStruct(Avenger, .{
        .real_name = 24,
        .superhero_name = 9,
        .powers = .{ 5, 15, 22 },
    };
    // ...
}

If I understand your post right, C (since C99) has a special feature for this called ‘flexible array member’, e.g. you don’t need the &frob[1] trick to get to the data that follows the frob:

typedef struct {
    uint32_t magic;
    uint32_t width;
    uint32_t height;
    uint16_t payload[];
} header_t;

The payload flexible array must be last and can be accessed as hdr.payload[N].

I’m not sure if such a use case should have special allocator support though, after all it’s quite esoteric and the few situations where it’s needed can be handled via type-agnostic allocation (“give me N bytes please”) and a bit of pointer casting.

2 Likes

PS: Zig (and other languages) not having the concept of flexible array members is actually something I stumbled over in my Sokol shader-cross-compiler.

GLSL has the concept of ‘SSBOs’ which can have header items followed by array data, pretty much a C struct with a flexible array member (see code examples at the end: https://www.khronos.org/opengl/wiki/Shader_Storage_Buffer_Object).

The shader compiler generates CPU-side structs for those GLSL SSBOs, but since most non-C languages don’t have flexible array members, I cannot map this concept to other languages and had to ‘forbid’ this feature on the GLSL side, e.g. GLSL SSBOs cannot have a ‘header’ and instead can only have a flexible array in them (which then maps to a regular array of structs on the CPU side) - which is a bit of a shame, because ‘one-time header data followed by array data of undefined size’ is a very useful thing in shaders.

1 Like

I think It’s pretty straight forward, modified the example for a flexible array member equivalent

const std = @import("std");

const Kind = enum {
    fraggle,
    froggle,
};

const Frob = struct {
    kind: Kind,
    size: usize,

    fn name(frob: *Frob) []u8 {
        const raw = @as([*]u8, @ptrCast(@alignCast(frob))) + @sizeOf(Frob);
        return raw[0..frob.size];
    }
};

pub fn main() !void {
    var gpa = std.heap.DebugAllocator(.{}).init;
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    const buf = "hiiii";
    const frob, const frobname = try allocHeader(allocator, Frob, u8, buf.len);
    defer freeHeader(allocator, frob, frobname);

    frob.kind = .fraggle;
    frob.size = frobname.len;
    @memcpy(frob.name(), buf);

    std.debug.print("{}, {s}", .{ frob.kind, frob.name() });
}

pub fn allocHeader(allocator: std.mem.Allocator, Header: type, Elem: type, n: usize) std.mem.Allocator.Error!struct { *Header, []Elem } {
    const size = @sizeOf(Header) + (@sizeOf(Elem) * n);
    const alignment = @max(@alignOf(Header), @alignOf(Elem));
    const raw = try allocator.alignedAlloc(u8, alignment, size);
    const header: *Header = @ptrCast(@alignCast(raw.ptr));
    const data: [*]Elem = @ptrCast(@alignCast(raw.ptr + @sizeOf(Header)));
    return .{ header, data[0..n] };
}

pub fn freeHeader(allocator: std.mem.Allocator, header: anytype, data: anytype) void {
    const HeaderPtr = @typeInfo(@TypeOf(header)).pointer;
    std.debug.assert(HeaderPtr.size == .one);
    const Header = HeaderPtr.child;
    const Data = @typeInfo(@TypeOf(data)).pointer;
    std.debug.assert(Data.size == .slice);
    const Elem = Data.child;
    const alignment = @max(@alignOf(Header), @alignOf(Elem));
    const size = @sizeOf(Header) + (@sizeOf(Elem) * data.len);
    const raw: [*]align(alignment) u8 = @ptrCast(@alignCast(header));
    allocator.free(raw[0..size]);
}
2 Likes

This is intriguing. It’s essentially the flip side of the ArenaAllocator free. A bulk allocation of heterogeneous types. An arena allows for freeing heterogeneous types all at once. This would allow the opposite.

4 Likes

i just checked and am shocked ArenaAllocator doesn’t have initCapacity or ensureCapacity, as that would equally efficient as what I wrote, I think…

1 Like

I like this, but I don’t like that it wastes space on padding, I think it would be better if Kind was declared with enum(u1) and Frob was a packed struct where size is a uint with @bitSizeOf(usize) - @bitSizeOf(Kind), because it seems ridiculous to waste almost an entire word size as padding because of the kind flag.

And it is unlikely that such structs will ever use huge sizes.

1 Like

Totally possible to use the padding, It’s just a bit more math.

however it probably shouldnt be done in safe modes, as its planned to attach more safety info to types which will certainly take advantage of padding

1 Like

I think it is easier to avoid that there will be left over padding (by packing the enum together with the size so no padding will be added), than to do the math to use it; and also there are no guarantees that you are allowed to use the padding, as you have mentioned.

1 Like

I agree, I was thinking more generically.
It might be safe if we had a way to know the unused padding, it’d make sense to be able to do such optimisations in release safe.

3 Likes

Using padding to store data is in fact undefined/illegal behaviour: Is the compiler allowed to clobber struct padding?

2 Likes

As Andrew says, just make it a field :3

though the safety info could still be a problem if its “after” the padding field

Yeah, that’s what I ended up doing. The difficulty here is as you’ve mentioned that there’s no straightforward way to find out how large this field should be (or rather how much unused padding is available).

1 Like