Reading structs (that have Variable Length Arrays) from a buffer

CuckooEXE · July 11, 2025, 12:13pm

I’m using an API that returns a large buffer that contains structures with a variable length array. Take the following C example:

typedef struct PersonHdr {
    int age;
    bool in_school;
    size_t name_len;
    // Immediately After: char* name;
} PersonHdr;

typedef struct Person {
    PersonHdr hdr;
    char* name;
} Person;

// Pretend there's some ArrayList-like API in C since that's what I'll be using in Zig
typedef void ArrayList;

ArrayList* readPersons() {
    unsigned char* raw = someAPICall();
    ArrayList* persons = ArrayList.init();

    unsigned char* cursor = raw;
    while (cursor < raw) {
        PersonHdr* ph = cursor;
        Person p = { 0 };
        p.name = malloc(ph.name_len);
        strcpy(p.name, cursor + sizeof(*ph), ph.name_len);
        persons.append(&p);

        cursor += sizeof(*ph) + ph.name_len;
    }

    return persons;
}

I know the example is a little contrived, but effectively I want to return from my function the “header” and “data” (PersonHdr and Person) - and they can be in the same struct as well, that’s totally fine instead of the two structs), while I read from this large blob of memory.

_sh · July 11, 2025, 12:27pm

Here is some Zig-like pseudocode (not tested) that you could use as inspiration

const Person = struct {
    age: u32,
    in_school: bool,
    name: []const u8,
    
    pub fn deinit(self: Person, allocator: Allocator) void {
        allocator.free(self.name);
    }
}

fn readPersons(allocator: Allocator) ![]Person {
    const buffer: []const u8 = apiCall();
    var fbs = std.io.fixedBufferStream(buffer);
    var reader = fbs.reader();
    
    var result: ArrayList(Person) = .empty;
    
    while (try readPerson(allocator, reader)) |person| {
        try result.append(allocator, person);
    }
    
    return try result.toOwnedSlice(allocator);
}

fn readPerson(allocator: Allocator, reader: anytype) ?!Person {
    const age = reader.readInt(u32, .big) catch return null;
    const in_school = reader.readByte() == 1 catch return null;
    const name_len = reader.readInt(u32, .big) catch return null;
    
    var name = try allocator.alloc(u8, name_len);
    reader.read(name) catch {
        allocator.free(name);
        return null;
    };
    
    return Person{ .age = age, .in_school = in_school, .name = name };
}

const Allocator = std.mem.Allocator;
const ArrayList = std.ArrayListUnmanaged;

const std = @import("std");
const apiCall = @import("api").apiCall;

CuckooEXE · July 11, 2025, 12:39pm

std.io.fixedBufferStream is exactly what I was looking for. The entire time I was thinking “wow I wish I could convert this buffer to a store” lol. Thank you!

hvbargen · July 12, 2025, 7:20am

Consider a data-oriented approach

This might be overkill, depending on your application, but hey, you’re using Zig, so I assume performance matters.

If you need all the data, you could also try a zero-copy data-oriented approach.
I mean something similar to what Andrew showed in his video about the topic.

Given that the string representation is ok for you (I suppose they are not NUL-terminated), you need just two or a few more allocations:
One for the Blob (you have it anyway, you just need to keep it as long as you use the data)
and one for an ArrayList of byte offsets of the headers (or pointers, but I think bytes offsets are usually much smaller, so maybe u32 or even u16 is enough, depending on your application).

Then you could scan through the blob just once to find the byte offsets of the headers and while doing so, insert them into the ArrayList.
Note that you only store the offsets and don’t need to copy the data.
To access the data, you first create a pointer into the blob at the corresponding offset and get the value from there. For the name, create an u8 slice from name_len and the offset + the constant offset of the name.
Just be careful with alignment.
I think this approach uses way less memory and could thus even be faster.
I (as a Zig newbie) would use little functions for accessing the fields and the []u8 for the name.
So the inconvenience for using the data later is basically adding (). The functions will pretty sure be compiled inline and cause neglectable overhead.
Thanks to the Zig language built-ins the functions themselves will look trivial.
Maybe Zig pros can come up with something more elegant, but I think that would be good enough for me.