Allocator.createMany

I don’t think that matters if you assert at comptime that your struct has chosen fields in a way so that it becomes densly packed without any unused padding.

Also if the struct is complex it may have some inner fields where you don’t actually care about a few missed bits that get wasted.

So maybe it would make sense to have a generic checkNoPadding(T) function for that. Maybe with deep and shallow variants.

2 Likes

I don’t think a padding field with a fixed size is applicable here anyway since there’s no guarantee that the padding field will actually be at the end of the array, you might as well get [padding] [header] [vla]. So if you want to save those extra bits you would indeed need to avoid padding entirely.

2 Likes

Correct, the code I’m translating is ANSI and was in fact written before 1999. But they’ve added said affordance since.

It was one of the things I was thinking about here actually, passing bytes to a C library which uses the zero-length array trick would be possible with this: you get back a pointer to an extern struct, and another pointer to a (nominally sentinel-terminated, actually undefined) byte region guaranteed to be in the same spot (which I think includes padding for ZLAs but I would have to check that).

Then fill it all in, pointer-cast it to the expected type and carry on. Not as easy to set up otherwise.

Very different interface here! Less flexible and general, more convenient if it’s what you happen to need. Not a bad idea, good one I’d say, but not the same idea. Close relative.

I wish I had the time to mull this over right now, I will later. It looks like the right kind of thing, it’s not fully type-generic of course but for a demo, yes something like this.

Anything other than “natural alignment” for this kind of witchery is a recipe for insanity. By which I mean, you give this thing a type, it gives you back allocated data of the natural size of that type, as reported by type reflection, and the next block of bytes starts on the smallest normal alignment for that next type, whatever it is. Users who want to tighten things up are responsible for understanding how these things work.

3 Likes

I realized a couple things sleeping on the question: one is that “pass array, get a slice”, while good, is not sufficiently general. Another is that allocMulti is a better / more accurate name than createMany for what we want.

So: the members of the tuple can be any type, or, one of two kinds of value. If the value is any sort of positive integer, you get a []u8 back of that many bytes.

If the value is any sort of slice []T, the pointer is ignored, and you get back a slice of T of the same length. If sentinel-terminated, len + 1, but returned as a sentinel slice so that doesn’t change the .len field. Any other value is a compile error.

This means you can pass the slice you need to copy, as a template, then memcpy the pointer you get back. Probably that also means we don’t check const-ness for slice values, just return the mutable equivalent, as a treat. Returning const pointers from an allocator is always wrong, returning mutable pointers to const pointers is fine though. Meaning: types with const in the name give back pointer-to-const, values with const in the type name (which are only slices) give back mutable slices.

Spot check for why this is ok: the input to a function can always relax a restriction, and the output can always tighten a promise. We honor the promise that pointers returned point to memory which can be used, so we can relax (by not imposing) the restriction that a slice value’s type must specify that the return value be mutable.

For any type, you get back a mutable valid pointer of that type. If the type is not scalar (array, vector†), the pointer is a slice. Note that slices are in fact scalar, so passing literally []u8 (rather than an instance of it) would get back a *[]u8, which is a not-totally-useless thing to do, just unusual.

† Maybe it’s better for a vector type specification to return a singular pointer to a vector of that length. Haven’t worked with them enough to be sure.

An API for allocating multiple dependent buffers as one single contiguous allocation, so that all the buffers can also be freed as one single operation, would be very useful, especially for copying data from awkward internal/external APIs when you don’t want to expose references to internal data structures.

(For a simple example of what I mean by “dependent buffers”, consider a slice of strings [][]u8, which needs one buffer for the character data and another to store the strings (the pointer to the first character + length for each string).)

I played around with the idea for a bit and arrived at the following interface:

pub fn groupAlloc(
    allocator: std.mem.Allocator,
    comptime Types: anytype, // tuple of types
    lengths: anytype, // tuple of usizes
) std.mem.Allocator.Error!AllocatedGroup(Types); // tuple of slices

pub fn groupFree(
    allocator: std.mem.Allocator,
    comptime Types: anytype, // tuple of types
    first_group: []const Types[0],
) void;

Example usage:

pub const Server = struct {
    id: u32,
    name: []u8,
    users: []User,
}

pub const User = struct {
    id: u32,
    name: []u8,
};

/// Returns details about all servers.
/// Pass the result to `freeServers()` to free the allocated memory.
pub fn getServers(allocator: std.mem.Allocator) !*Server {
    // calculate object counts and buffer lengths...

    const servers: []Server //
    const users: []User, //
    const strings: [][]u8, //
    const string_bytes_buf: []u8 //
    = try groupAlloc(
        allocator,
        .{ Server, User, []u8, u8 },
        .{ num_servers, num_users, num_strings, string_bytes_buf_required_len },
    );
    errdefer comptime unreachable;

    // initialize, populate and link up the buffers...

    return servers;
}

/// Frees server details previously allocated by `getServers()`.
pub fn freeServers(servers: []const Server, allocator: std.mem.Allocator) void {
    groupFree(allocator, .{ Server, User, []const u8, u8 }, servers);
}
Implementation (not thoroughly tested but seems to work):
pub fn groupAlloc(
    allocator: std.mem.Allocator,
    comptime Types: anytype,
    lengths: anytype,
) std.mem.Allocator.Error!AllocatedGroup(Types) {
    comptime std.debug.assert(Types.len != 0);

    var allocation_len: usize = @sizeOf(usize);
    comptime var alignment: usize = @alignOf(usize);
    inline for (Types, lengths) |T, len| {
        if (len != 0) {
            allocation_len = std.mem.alignForward(usize, allocation_len, @alignOf(T));
        }
        allocation_len += @sizeOf(T) * len;
        alignment = @max(alignment, @alignOf(T));
    }

    const allocation = try allocator.alignedAlloc(u8, .fromByteUnits(alignment), allocation_len);
    errdefer comptime unreachable;

    @as(*usize, @ptrCast(allocation.ptr)).* = allocation_len;
    var offset: usize = @sizeOf(usize);

    var result: AllocatedGroup(Types) = undefined;
    inline for (&result, Types, lengths) |*slice, T, len| {
        if (len != 0) {
            offset = std.mem.alignForward(usize, offset, @alignOf(T));
        }
        slice.ptr = @as([*]T, @ptrCast(@alignCast(allocation.ptr + offset)));
        slice.len = len;
        offset += @sizeOf(T) * len;
    }

    std.debug.assert(offset == allocation_len);

    return result;
}

pub fn groupFree(
    allocator: std.mem.Allocator,
    comptime Types: anytype,
    first_group: []const Types[0],
) void {
    comptime var alignment: usize = @alignOf(usize);
    comptime {
        for (Types) |T| {
            alignment = @max(alignment, @alignOf(T));
        }
    }

    const offset = @intFromPtr(first_group.ptr) - std.mem.alignBackward(usize, @intFromPtr(first_group.ptr) - 1, alignment);

    const allocation_ptr: [*]align(alignment) const u8 = @alignCast(@as([*]const u8, @ptrCast(first_group.ptr)) - offset);
    const allocation_len = @as(*const usize, @ptrCast(@alignCast(allocation_ptr))).*;

    allocator.free(allocation_ptr[0..allocation_len]);
}

fn AllocatedGroup(comptime Types: anytype) type {
    std.debug.assert(Types.len != 0);

    var fields: [Types.len]std.builtin.Type.StructField = undefined;
    for (&fields, Types, 0..) |*field, T, i| {
        field.* = .{
            .name = std.fmt.comptimePrint("{}", .{i}),
            .type = []T,
            .default_value_ptr = null,
            .is_comptime = false,
            .alignment = 0,
        };
    }

    return @Type(.{ .@"struct" = .{
        .layout = .auto,
        .fields = &fields,
        .decls = &.{},
        .is_tuple = true,
    } });
}

Given groupAlloc(allocator, .{ Foo, []u8, u8 }, .{ 2, 4, 256 }), this will allocate a region of memory equivalent to:

extern struct {
    allocation_len: usize = @sizeOf(@This()),
    @"0": [2]Foo,
    @"1": [4][]u8,
    @"2": [256]u8,
}

When the allocation is later freed by groupFree(allocator, .{ Foo, []u8, u8 }, foos), it uses the type information as well as the allocation_len value stored immediately before &foos[0] to reconstruct the size of the original allocation and pass it to allocator.free().

You could probably extend this idea further to support alignments and sentinels like all the regular allocation functions.

5 Likes

That does seem less magical than allowing a mixture of types and values (although Zig handles that kind of thing just fine). I like it.

thats what i thought of implementing, but i figured a header and data would more clearly show how to do this kind of thing

If you want to use the padding you ofc need to control the order of fields, so you’d use an extern or packed type

If we just want to make sure there aren’t unused bits or padding we can use std.meta.hasUniqueRepresentation

It would be nice if we don’t have to waste a usize to store the length of the original slice. Allocators generally can free the chunk using the address alone. If memory serves, DebugAllocator does perform a comparison against the length that it has stashed somewhere.

1 Like

EDIT: I misunderstood the implementation, but I think the last part of this answer is still a good idea

I mean, the size has to get stored somewhere if you don’t store it yourself you’re just moving the responsibility to the allocator. There might be some optimizations like memory regions that only contain allocations of one fixed size but this is still generally true. This is generally not expected of implementations of Allocator, DebugAllocator just does it for redundancy.
If you have some other way in your program to know the size of your allocations you can still extract the pointers from the slices returned by groupAlloc() just like you can with regular alloc().
What might make things easier is if groupAlloc() returned something like a std.MultiArrayList(struct { ptr: *anyopaque, len: usize }).Slice but with type safety, then you could still get() single slices but also strip all len fields at once with .items(.ptr) without needing new memory to store the pointers in.

Yes, speaking for myself I was deliberately ignoring the ‘back end’ here, just because there’s a lot to explore in terms of the interface for getting back a useful composite allocation.

But in C, you’re just malloc’ing a slice of bytes and doling it out, and free just wants the pointer to the head of that slice. Ideally we would get that simplicity of freeing memory as well.

That’s one of the reasons I want this in the Allocator interface actually, because it’s not at all clear to me that this is guaranteed to happen. allocator.free gets a slice, after all, not a pointer. So I don’t actually know that the contract includes storing metadata behind the pointer which makes it clear what region of memory has become invalid, given a particular pointer.

The doc comment is not promising, I’m afraid:

/// Free and invalidate a region of memory.
///
/// `memory.len` must equal the length requested from the most recent
/// successful call to `alloc`, `resize`, or `remap`. `alignment` must
/// equal the same value that was passed as the `alignment` parameter to
/// the original `alloc` call.
///
/// `ret_addr` is optionally provided as the first return address of the
/// allocation call stack. If the value is `0` it means no return address
/// has been provided.

That makes it clear that an allocator implementation is free to treat the length field as binding, that is, it does not have to track the region separately.

So we’d probably need to have freeptr, something like:

Frees all memory last allocated for this pointer. The allocator implementation may make this memory available again, and user code must assume it is no longer valid to reference.

That’s phrased so that something like an arena is not obliged to do anything at all, but most allocators will track this with metadata, the way free works now.

Perhaps it could have a nullable length field, so that the VTable would no longer need free as currently written, it could just pass the length back when that’s known. But it would be tricky to specify under what conditions it’s allowed to be null, without making explicit reference to functions in Allocator. Maybe that’s ok though.

All in all, perhaps the most Ziggish thing to do is delegate this to userspace, by making the freeWhatever function more complex as originally suggested.

The Zig std.mem.Allocator interface places the responsibility of keeping track of the allocation size and alignment on the caller instead of the allocator implementation. This is good actually, because it allows for a wider design space of efficient allocator implementations, unlike an interface like malloc/free which only deals in type-erased void * pointers and therefore by design requires the allocator implementation itself to do this bookkeeping. But it also means that in situations like these, where you want to return the caller a simple pointer without any extra information that they can later pass back to a “free”/“destroy” function, you need to find a clever stateless way to store information about the allocation so that is can be retrieved later.

If you passed all the original buffers to groupFree you technically wouldn’t need to store the allocation size, you could instead reconstruct it by iterating through all the buffers. std.process.ArgsFree works like this. But I think the additional 8* bytes is an okay tradeoff so that you only need to preserve the pointer to the start of the first buffer and can let your program forget about everything else.

If you wanted to simplify the groupFree signature further and remove the comptime Types parameter, you could also use std.mem.Allocator.rawAlloc/rawFree instead of alloc/free (the latter require the alignment to be comptime-known) and store away the alignment in the same way as the allocation size.

*I think the implementation I posted doesn’t currently handle alignments greater than @alignOf(usize) correctly. I believe a correctly sized “header” must be @max(@sizeOf(usize), alignment) bytes in size in order for the first buffer to be correctly aligned.

2 Likes

I wrote that on my phone while sitting at a bar. Now that I’m reading the source code of DebugAllocator on my computer, I can see that the length is actually necessary for the free operation. No choice but to preserve it somewhere. Kinda sucks since the C allocator obviously wouldn’t need it and its usage will not be infrequent.

But having to keep size meta data in the allocator can suck even more, depending on the situation. I think in the end there are always the 2 different approaches and theoretically you could have 2 different kinds of allocator interfaces to support both approaches.

Personally I much prefer Zigs default, but I think depending on what kind of code you are dealing with, that can nudge the ideal interface in either direction.

1 Like

I’d prob store lengths only in that case (or could also zero terminate)

1 Like

as long as the underlying buffer is aligned to the highest alignment among the types, there is no problem.

1 Like

I think this code that I wrote would be of interest here: message.zig « src - zaprus - do THINGS with NETWORKS in ZIG

I used a field of type void at the end of a packed struct to access the variable length portion of my data.

2 Likes

Indeed, same kind of thing. I like using a zero-sized field to get at the address, very similar to the C99 construct. Those willing to dole out bytes manually can get something like this right now, I’d just like to lower the fiddling-about threshold substantially.

2 Likes

Note that zero sized field only works for extern structs as normal structs has no guaranteed layout

1 Like