Resizable structs in Zig

16 Likes

I think that more than VLAs this pattern is a substitute for flexible array members (with extra flexibility).

One mostly useless comment on hacker news had one small interesting question in it:

You should (unless you have specific performance data showing otherwise) store the sizes in the object header, not in an obese pointer to it. (It’s bigger than even a fat pointer.)

I’d be curious to know if this was considered and, if not, whether this suggestion could be good or not.

2 Likes

yes, I thought so too. I nearly took the bait on HN, but decided the OP would not be interested in having a useful conversation :stuck_out_tongue:

it wasn’t considered directly as a solution, but the use-case was on my mind. I think maybe the problem here is the expectations that the naming of the type (and the article) created. naming is hard, of course. it got it some attention at the very least :wink:

ultimately it’s just a small but handy utility for calculating the offsets of the data behind the pointer. the lengths certainly could be stored there; but as someone else on HN (or maybe Lobsters) said, people are usually pretty picky about where they put those lengths when they’re writing these data structures. with this non-intrusive approach, they can be as picky as they want.

let’s clumsily name it ObeseSliceUtility and ObeseSliceUtilityField for the sake of removing the bias those names seemed to create, then implement the use-case:

const MyUtil = ObeseSliceUtility(struct {
    head: struct {
        magic_number: u8,
        tail_len: u8,
    },
    tail: ObeseSliceUtilityField(u64),
});

const bytes = MyUtil.init(allocator, .{ tail = 1024 });
defer bytes.deinit(allocator);

const head = bytes.get(.head);
head.magic_number = 0x42;
head.tail_len = tail.len;
const tail = bytes.get(.tail);

from this point on you can just work with the slices and pointers if you want, pass the components elsewhere, whatever. store it like a slice if you want, but like others have pointed out this is chunky.

you might also astutely observe that head is align(1) and tail is align(8). that is for the sake of example. since the struct provided has an .auto layout the utility would actually place the tail array first. passing the bytes.ptr onward would not be very useful, making this incompatible for an ABI use case.

I want to expand the API to support extern struct. with an extern struct in this example, the head field would be kept at the head. the tail array would be unaligned. you could pass the pointer to a C library expecting a defined layout.

unless you have specific performance data showing otherwise

that road goes both ways :slight_smile:

1 Like

well, I engaged there, and it went exactly how I expected

1 Like

Note that even if you move the length metadata into the payload, you haven’t saved any memory. That data still has to live somewhere.

Perhaps you could find use cases where storing that metadata next to the payload is better, but the null hypothesis is storing it next to the base pointer, like how ArrayList, MultiArrayList, and slices of dynamically allocated memory do. This has some clear benefits: same alignment as the base pointer, bounds-checking, and ability to access that metadata without chasing the pointer.

All in all, I’d wager that the commenter is wrong, that the way @tristanpemble already coded it will be better memory layout and performance in most cases, to the point where it’s not even worth having an alternate data structure that puts the metadata in the payload.

But hey, why not, it’s worth a try.

2 Likes

I know that guy is a jerk, but I unironically think ObeseSlice is a better name than ResizableStruct right now…

StructuredSlice maybe?

Oh, I have a relevant link here!

A while ago Rust experimented with changing layouts of vectors and hashmaps (?) to store metadata in the allocation, rather than next to the pointer, and it wasn’t worth it.

5 Likes

Not sure if you had seen the previous discussion here but there does seem to be demand to improve the ergonomics of an improved version of flexible array members.

1 Like

I had not. After reading through it, I still like my approach better than those discussed in that thread for a number of reasons:

  • It is in user-space
  • It is conceptually simple
  • It is generalized and unopinionated about your data
  • It does not increase the Allocator interface surface area
  • As Andrew put better than I could, it’s probably the better data structure than a header

Now that I see this as a bigger, structured form of a slice, I cannot unsee it.

1 Like

I just merged support for extern structs, so you can now create well-defined in-memory layouts. from the tests:

const Bytes = ResizableStruct(extern struct {
    a: u8,
    b: u8 align(4),
    c: ResizableArray(u8),
    d: ResizableArray(u128),
});

var val = try Bytes.init(testing.allocator, .{
    .c = 21,
    .d = 1,
});
defer val.deinit(testing.allocator);

try testing.expectEqual(1, Bytes.sizeOf("a", val.lens));
try testing.expectEqual(0, Bytes.offsetOf("a", val.lens));

try testing.expectEqual(1, Bytes.sizeOf("b", val.lens));
try testing.expectEqual(4, Bytes.offsetOf("b", val.lens));

try testing.expectEqual(21, Bytes.sizeOf("c", val.lens));
try testing.expectEqual(5, Bytes.offsetOf("c", val.lens));

try testing.expectEqual(16, Bytes.sizeOf("d", val.lens));
try testing.expectEqual(32, Bytes.offsetOf("d", val.lens));

I believe this is all that is necessary to support sending the pointer over an ABI, unless someone sees any reason otherwise. the Bytes type in this example is not extern, but as slices aren’t either, I’m not sure I want to make it so anyways.

1 Like

Very cool! I think I’ll probably have some use cases for this :slight_smile:

I also want to share this pattern in case it is overlooked: for the simple case where you only want a single variable length trailing array at the end of a struct, you can use a void value at the end of the struct, and cast the address of the void value to a many-item pointer.

const Relay = packed struct {
    dest: @Vector(4, u8),
    payload: void,

    pub fn getPayload(self: *align(1) Relay) []u8 {
        const len: *u16 = @ptrFromInt(@intFromPtr(self) - @sizeOf(u16));
        return @as([*]u8, @ptrCast(&self.payload))[0 .. len.* - @bitSizeOf(Relay) / 8];
    }
};

Source: message.zig Ā« src - zaprus - do THINGS with NETWORKS in ZIG

In this struct, I only access the payload data through the getPayload method, which casts to a many-item array, and then slices that array based on the metadata (from the containing struct (I couldn’t get @fieldParentPtr to work here for some reason, but probably an ID: 10T error)). I allocate the struct elsewhere, backed by a []u8 of the size that I need.

1 Like

I don’t think you want packed there, you want extern.

Alright, I did some additional work on this. Bikeshedded the naming a bit, added a few utility methods for working with byte slices, including now supporting stack allocated backing with buffer arrays:

    var buf: [1024]u8 align(2) = undefined;

    const Bytes = StructuredSlice(struct {
        len: u16,
        data: FlexibleArray(u16),
    });

    const bytes = try Bytes.fromBuffer(@ptrCast(buf[0..]), .{ .data = 4 });
    const data = bytes.get(.data);
    data[0] = 0xD00D;
    data[1] = 0xCAFE;
    data[2] = 0xBEEF;
    data[3] = 0xDEAD;

I removed the resize method and replaced it with copy - resize managed the memory which wasn’t compatible with the idea of a buffer backed slice. This way the struct is ambivalent about where the backing data is stored in memory. To resize, just init a new one (or create a new buffer backed value), then new.copy(old). I’m open to revisiting that.

The current API is available here; or just read the source code, it’s pretty minimal.

In my eyes this is pretty much close to complete. The only outstanding implementation question is whether it’s worth the bother of supporting packed structs. You can see my thought process on what that even means, but I am leaning towards not doing it.

Nice work! StructuredSlice is a great name.

Question: if using heap-allocated memory, must a separate allocation/alignment be tracked? Or is there enough data in StructuredSlice to be able to free the memory later?

short answer, no, I don’t think so, unless I am misunderstanding your question.

  • init(allocator, lens) - allocates an owned slice
  • fromOwnedBytes(bytes, lens) - takes ownership, returning an owned slice
  • fromBuffer(bytes, lens) - borrows, returning a borrowed slice
  • deinit(allocator) - deallocate an owned slice

assuming you created it with one of the owned methods, the deinit will take care of it with just the pointer and lengths. the from* methods expect an aligned slice, and the ptr is an aligned pointer, both aligned to @alignOf(Layout)

in the case that you used fromBuffer with a slice borrowed from a heap allocation, then yeah, you’re gonna have to hold onto the owned slice and manage it yourself

A couple ideas that might be of interest:

1

While discussing this concept on stream Andrew Kraevskii came up with a draft for a design that doesn’t wrap the entire struct, which has the main upside of allowing you to give decls to your type. Here’s the code

2

antirez/rax is a radix tree implementation whose nodes have this structure:

+----+---+--------+--------+--------+--------+
|HDR |xyz| x-ptr  | y-ptr  | z-ptr  |dataptr |
+----+---+--------+--------+--------+--------+

This one of 2 variants, but this is the one that is more interesting for now. What’s interesting here is that a node has some struct fields (HDR), followed by N characters (x, y, z, …) and then N pointers, one for each char.

This is the struct (HDR) definition (it’s a bitfield):

uint32_t iskey:1;     /* Does this node contain a key? */
uint32_t isnull:1;    /* Associated value is NULL (don't store it). */
uint32_t iscompr:1;   /* Node is compressed. */
uint32_t size:29;     /* Number of children, or compressed string len. */

I think that this code would be significantly more easy to reason about if implemented using Zig, but the C version has two interesting things going for it:

  1. size is not repeated twice: since the two ā€œflexible arraysā€ are always the same length, it gets stored only once
  2. size is a u29 in order to fit nicely in the bitfield

Additionally, this pattern is also incompatible with the idea of keeping lengths stored in the slice (and not in the heap buffer), as the tree needs to point from node to node (i.e. length information cannot be stored elsewhere).

I don’t have any specific suggestion, also because it’s up to debate whether this use case should be solved by StructuredSlice or not (my hunch is that it should not). I just wanted to point out interesting stuff I stumbled upon.

Tomorrow I’m going to spend some time while live on Twitch to flesh out Andew Kraevskii’s design and see how that lends itself to a Zig rewrite of antirez/rax.

1 Like

That’s pretty cool — on my phone right now but I’ll take a look when I’m home later and try to hop on the stream chat tomorrow

what do you think about this?

const Packet = extern struct {
    host_len: u8,
    buf_lens: u8,
    tail: Tail(@This(), struct {
        host: TailSlice(u8, .host_len),
        read_buf: TailSlice(u8, .buf_lens),
        write_buf: TailSlice(u8, .buf_lens),
    }),
};

const host = "foobar";

var buffer: [100]u8 align(@alignOf(Packet)) = undefined;
buffer[0] = @intCast(host.len);
buffer[1] = 16;

const packet: *Packet = @alignCast(@ptrCast(&buffer));
packet.tail.copy(.host, @ptrCast(host[0..]));

try std.testing.expectEqualSlices(u8, host[0..], packet.tail.get(.host));
try std.testing.expectEqual(16, packet.tail.get(.read_buf).len);
try std.testing.expectEqual(16, packet.tail.get(.write_buf).len);

implementation on zigbin: The Zig Pastebin

thoughts on the design:

  • demarcating the real vs ā€œimaginaryā€ (tail) fields is conceptually significant
  • afaict supports extern/packed parent without much work
  • we now have the ability to generate helper structs for utility functions since we’ve wrapped the slice marker types
  • I can imagine expanding support to store length both inside and outside the tail
  • explicit length placement means it supports zero copy de-serialization

I can’t put my finger on how it would work, yet, but I think there might be potential for nesting these types now that the tail is demarcated. to make the implications of that clearer: I see a vague possibility of supporting a fully self describing, type safe zero copy deserialization framework in Zig using an approach like this

edit: moved it into GitHub and added support for placing length in the tail

Here is my take on this idea, after playing around with it for a bit, haven’t tested it super in depth, but I think the basic idea is illustrated. I didn’t like that tail was re-constructed from a struct that serves as a template (because I prefer manual structs because you can put decls/methods on them).

I also thought that packet.tail.get(.read_buf).len wasn’t necesarily better than packet.tail.read_buf.get().len, and with the latter you can put these virtual slices wherever you want.

The basic idea with my version is that you add the slices somewhere within the struct, you pass the type that represents the root-type and a config that defines how to get from the root to the virtual slice and also from the root to the length field.

Another idea was that you could override the offset for where the slice begins explicitly, but I haven’t really implemented / thought through that yet, so I would likely throw it out, for a simplified version.

I also think it could make sense to add methods that allow you to resize lengths, but it would have to consider fields that share lengths and come up with a good api for that. I guess I would have to use this data structure more to find out how that should work.

I like that with this data structure you could potentially put a lot of variable length data into a form that almost looks like a normal struct, without needing to contain any pointers. I guess the downside is that eventually you need to accumulate and calculate a lot to find where the specific subsection of a slice starts, however for data that contains a lot of fields which are rarely used I think this could be pretty useful, for example if you have a bunch of different kinds of strings or a lot of variable length things that can be empty.

zig 0.14.0:

const Packet = extern struct {
    host_len: u8 = 0,
    host: Slice(Packet, ._lenSuffix(u8, &.{.host})),
    tail: extern struct {
        buf_lens: u8 = 0,
        read_buf: Slice(Packet, .lenPath(u8, &.{ .tail, .read_buf }, &.{ .tail, .buf_lens })),
        write_buf: Slice(Packet, .lenPath(u8, &.{ .tail, .write_buf }, &.{ .tail, .buf_lens })),
    },
};

pub fn main() !void {
    const host = "foobar";

    var buffer: [50]u8 align(@alignOf(Packet)) = undefined;
    buffer[0] = @intCast(host.len);
    buffer[1] = 16;

    const packet: *Packet = @alignCast(@ptrCast(&buffer));
    packet.host_len = host.len;
    @memcpy(packet.host.get(), host);

    try std.testing.expectEqualSlices(u8, host[0..], packet.host.get());
    try std.testing.expectEqual(16, packet.tail.read_buf.get().len);
    try std.testing.expectEqual(16, packet.tail.write_buf.get().len);

    std.debug.print("hex buffer: {x}\n", .{buffer});

    const str_read = "readreadreadread";
    const str_write = "writewritewritew";
    @memcpy(packet.tail.read_buf.get()[0..str_read.len], str_read);
    @memcpy(packet.tail.write_buf.get()[0..str_write.len], str_write);

    std.debug.print("hex buffer: {x}\n", .{buffer});
    // Note that this doesn't print non printable characters like our length bytes, look at the hex print for that
    std.debug.print("buffer: {s}\n", .{buffer});

    std.debug.print("Packet: {}\n", .{packet});
    std.debug.print("host: {s}\n", .{packet.host.get()});
    std.debug.print("read_buf: {s}\n", .{packet.tail.read_buf.get()});

    std.debug.print("host offset: {}\n", .{packet.host.offsetOf()});
    std.debug.print("read_buf offset: {}\n", .{packet.tail.read_buf.offsetOf()});
    std.debug.print("write_buf offset: {}\n", .{packet.tail.read_buf.offsetOf()});
}

pub fn Slice(comptime Parent: type, comptime Options_: SliceOptions) type {
    return struct {
        // TODO think about embedding in packed structs, does it make sense, could it be supported?

        pub const Options = Options_;
        pub const Element = Options.Element;
        const path = Options.accessor_path;

        const Self = @This();

        pub fn len(self: *const Self) Options.LenType(Parent) {
            const parent = recursiveFieldParentPtr(Parent, path, self);
            return switch (Options.varying) {
                ._len => |s| @field(parent, s.len_field),
                .other => |s| fieldPath(parent, s.len_field_path).*,
            };
        }
        pub fn sizeOf(self: *const Self) usize {
            return self.len() * @sizeOf(Element);
        }

        pub fn get(self: *Self) []Element {
            Options.check(Parent);
            const parent = recursiveFieldParentPtr(Parent, path, self);
            const bytes: [*]align(@alignOf(Parent)) u8 = @ptrCast(parent);
            const offset = Options.explicit_offset orelse self.offsetOf();
            const size = self.sizeOf();
            std.debug.print("offset: {} size: {}\n", .{ offset, size });
            return @ptrCast(@alignCast(bytes[offset..][0..size]));
        }

        pub fn offsetOf(self: *const Self) usize {
            const parent = recursiveFieldParentPtr(Parent, path, self);
            return accumOffsetOf(
                Parent,
                parent,
                @sizeOf(Parent),
                path,
            );
        }

        pub fn format(
            _: *const Self,
            comptime _: []const u8,
            _: std.fmt.FormatOptions,
            writer: anytype,
        ) !void {
            try writer.print("Slice({s}, ", .{@typeName(Parent)});

            // writes using decl literal syntax
            // writes pathes in simplified syntax .foo.bar instead of
            // &.{ .foo, .bar }
            switch (Options.varying) {
                ._len => {
                    try writer.print("._lenSuffix({s}, ", .{@typeName(Options.Element)});
                    inline for (Options.accessor_path) |a| try writer.print(".{s}", .{@tagName(a)});
                    try writer.writeAll(")");
                },
                .other => |s| {
                    const single = s.len_field_path.len == 1;
                    const fn_name = if (single) "Field" else "Path";
                    try writer.print(".len{s}({s}, ", .{ fn_name, @typeName(Options.Element) });
                    inline for (Options.accessor_path) |a| try writer.print(".{s}", .{@tagName(a)});
                    try writer.writeAll(", ");
                    if (single) {
                        try writer.print(".{s}", .{@tagName(s.len_field_path[0])});
                    } else {
                        inline for (s.len_field_path) |a| try writer.print(".{s}", .{@tagName(a)});
                    }
                },
            }
            try writer.writeAll(")");
            if (Options.explicit_offset) |o| try writer.print(".offset({})", .{o});
        }
    };
}

pub const EnumLiterals = []const @Type(.enum_literal);
pub const SliceOptions = struct {
    Element: type,
    accessor_path: EnumLiterals,
    explicit_offset: ?usize = null,
    varying: union(enum) {
        _len: struct {
            len_field: []const u8,
        },
        other: struct {
            len_field_path: EnumLiterals,
        },
    },

    pub fn offset(self: SliceOptions, offset_: usize) SliceOptions {
        var options = self;
        options.explicit_offset = offset_;
        return options;
    }

    pub fn _lenSuffix(Element: type, accessor_path: EnumLiterals) SliceOptions {
        return .{
            .Element = Element,
            .accessor_path = accessor_path,
            .varying = .{ ._len = .{
                .len_field = @tagName(last(accessor_path)) ++ "_len",
            } },
        };
    }

    pub fn lenField(Element: type, accessor_path: EnumLiterals, len_field: @Type(.enum_literal)) SliceOptions {
        return .lenPath(Element, accessor_path, &.{len_field});
    }
    pub fn lenPath(Element: type, accessor_path: EnumLiterals, len_path: EnumLiterals) SliceOptions {
        return .{
            .Element = Element,
            .accessor_path = accessor_path,
            .varying = .{ .other = .{
                .len_field_path = len_path,
            } },
        };
    }

    pub fn check(comptime self: SliceOptions, Parent: type) void {
        switch (self.varying) {
            ._len => {
                const suffix_name = @tagName(last(self.accessor_path)) ++ "_len";
                if (!@hasField(Parent, suffix_name)) {
                    @compileError(@typeName(Parent) ++ " is missing a field named " ++ suffix_name);
                }
            },
            .other => {},
        }
    }

    pub fn LenType(comptime self: SliceOptions, comptime Parent: type) type {
        return switch (self.varying) {
            ._len => |s| @FieldType(Parent, s.len_field),
            .other => |s| FieldTypePath(Parent, s.len_field_path),
        };
    }
};

pub fn PathTypes(comptime Parent: type, comptime path: []const @Type(.enum_literal)) []const type {
    var current: type = Parent;
    var types: [path.len]type = undefined;
    for (path, 0..) |enum_literal, i| {
        const T = @FieldType(current, @tagName(enum_literal));
        types[i] = T;
        current = T;
    }
    return &types;
}

fn last(slice: anytype) std.meta.Elem(@TypeOf(slice)) {
    if (slice.len == 0) @panic("Slice needs to have at least one element.");
    return slice[slice.len - 1];
}

pub fn ParentType(comptime Parent: type, comptime path: []const @Type(.enum_literal)) type {
    const types = PathTypes(Parent, path);
    return switch (types.len) {
        1 => Parent,
        else => types[types.len - 2],
    };
}

pub inline fn recursiveFieldParentPtr(
    comptime Parent: type,
    comptime path: []const @Type(.enum_literal),
    self: anytype,
) AutoPtr(@TypeOf(self), Parent) {
    return switch (path.len) {
        1 => @fieldParentPtr(@tagName(last(path)), self),
        else => recursiveFieldParentPtr(
            Parent,
            path[0 .. path.len - 1],
            recursiveFieldParentPtr(
                ParentType(Parent, path),
                &.{last(path)},
                self,
            ),
        ),
    };
}

pub fn AutoPtr(input: type, Output: type) type {
    return switch (@typeInfo(input)) {
        .pointer => |p| if (p.is_const) *const Output else *Output,
        else => @compileError("not supported"),
    };
}

pub fn FieldTypePath(comptime T: type, path: EnumLiterals) type {
    const first = @FieldType(T, @tagName(path[0]));
    return switch (path.len) {
        1 => first,
        else => FieldTypePath(first, path[1..]),
    };
}
pub fn fieldPath(parent: anytype, path: EnumLiterals) AutoPtr(@TypeOf(parent), FieldTypePath(std.meta.Child(@TypeOf(parent)), path)) {
    const first = &@field(parent, @tagName(path[0]));
    return switch (path.len) {
        1 => first,
        else => fieldPath(first, path[1..]),
    };
}

pub fn isContainerType(T: type) bool {
    return switch (@typeInfo(T)) {
        .@"struct", .@"enum", .@"union", .@"opaque" => true,
        else => false,
    };
}

pub fn isSliceWithRoot(comptime Root: type, comptime T: type) bool {
    return isContainerType(T) and
        @hasDecl(T, "Options") and
        Slice(Root, T.Options) == T;
}

const builtin = @import("builtin");

pub fn accumOffsetOf(
    comptime Root: type,
    parent: anytype,
    start: usize,
    comptime path: EnumLiterals,
) usize {
    if (path.len == 0) return start;

    var offset = start;
    const Parent = std.meta.Child(@TypeOf(parent));
    inline for (std.meta.fields(Parent)) |f| {
        const field_name = @tagName(path[0]);
        const is_slice = comptime isSliceWithRoot(Root, f.type);
        if (is_slice) {
            offset = std.mem.alignForward(usize, offset, @alignOf(f.type.Element));
            if (comptime std.mem.eql(u8, f.name, field_name)) {
                if (path.len == 1) {
                    return offset;
                } else {
                    if (std.debug.runtime_safety) {
                        @panic("nested slices/accessors aren't currently supported");
                    }
                }
            }
            offset += @field(parent, f.name).sizeOf();
        } else {
            if (comptime std.mem.eql(u8, f.name, field_name)) {
                offset = accumOffsetOf(
                    Root,
                    &@field(parent, f.name),
                    offset,
                    path[1..],
                );
            }
        }
    }
    return offset;
}

// const print = std.debug.print;
fn print(comptime fmt: []const u8, args: anytype) void {
    _ = fmt;
    _ = args;
}
pub fn printPathLn(comptime path: EnumLiterals) void {
    printPath(path);
    print("\n", .{});
}
pub fn printPath(comptime path: EnumLiterals) void {
    inline for (path) |p| {
        print(".", .{});
        print("{s}", .{@tagName(p)});
    }
}

pub fn formatPath(comptime path: EnumLiterals) []const u8 {
    comptime var str: []const u8 = "";
    inline for (path) |p| {
        const n = @tagName(p);
        str = str ++ "." ++ n;
    }
    return str;
}

test formatPath {
    try std.testing.expectEqualStrings(".foo.bar", comptime formatPath(&.{ .foo, .bar }));
}

const std = @import("std");
const Allocator = std.mem.Allocator;
const FieldEnum = std.meta.FieldEnum;
const FieldInfo = std.builtin.Type.FieldInfo;
1 Like

This is what I have so far: GitHub - kristoff-it/zig-flex: Like C's flexible array members but better :^)

Tomorrow I’m going to start implementing antirez/rax using it.

3 Likes