What's a good fast pattern for outputing data in a serializer?

creationix · May 27, 2023, 6:58pm

I’m porting a serialization format to zig and want the serializer to be really fast. I’m having a hard time figuring out the best way to handle the temporary memory structures. The pattern is something like

recurse into all child nodes and serialize them, collect output as a linked list or something
write the header for this section and return that to my caller somehow (a linked list)?

Is a linked list the best way to generate the data? It’s generated out of order so I can’t just stream the output.

But once I do have the final linked list of chunks (or some other data structure) how do I quickly emit that? I think my library will write to some stream like stdout or a file descriptor.

Also if I have all this heap allocated memory and most of it will be used till the very end, I assume I want to use an arena allocator for the whole operation right? I’m don’t know how much total space I’ll need up though. Do arena allocators support this?

creationix · May 27, 2023, 7:23pm

Also I just realized that Zig doesn’t have FAM (flexible array members). Isn’t it a lot of overhead to alloc twice (for the linked-list node and again for the payload)? In C I would do something like:

struct node {
  struct node* next;
  usize_t len;
  uint8_t data[];
};

Is this possible in zig where I heap allocate various nodes with different runtime lengths?

AndrewCodeDev · June 7, 2023, 10:04pm

Have you looked at the implementation of the arena allocator in the standard library?

github.com

ziglang/zig/blob/master/lib/std/heap/arena_allocator.zig

const std = @import("../std.zig");
const assert = std.debug.assert;
const mem = std.mem;
const Allocator = std.mem.Allocator;

/// This allocator takes an existing allocator, wraps it, and provides an interface
/// where you can allocate without freeing, and then free it all together.
pub const ArenaAllocator = struct {
    child_allocator: Allocator,
    state: State,

    /// Inner state of ArenaAllocator. Can be stored rather than the entire ArenaAllocator
    /// as a memory-saving optimization.
    pub const State = struct {
        buffer_list: std.SinglyLinkedList(usize) = .{},
        end_index: usize = 0,

        pub fn promote(self: State, child_allocator: Allocator) ArenaAllocator {
            return .{
                .child_allocator = child_allocator,

This file has been truncated. show original

From the code:

    fn resize(ctx: *anyopaque, buf: []u8, log2_buf_align: u8, new_len: usize, ret_addr: usize) bool {
        const self = @ptrCast(*ArenaAllocator, @alignCast(@alignOf(ArenaAllocator), ctx));
        _ = log2_buf_align;
        _ = ret_addr;

        const cur_node = self.state.buffer_list.first orelse return false;
        const cur_buf = @ptrCast([*]u8, cur_node)[@sizeOf(BufNode)..cur_node.data];
        if (@ptrToInt(cur_buf.ptr) + self.state.end_index != @ptrToInt(buf.ptr) + buf.len) {
            // It's not the most recent allocation, so it cannot be expanded,
            // but it's fine if they want to make it smaller.
            return new_len <= buf.len;
        }

          if (buf.len >= new_len) {
              self.state.end_index -= buf.len - new_len;
              return true;
          } else if (cur_buf.len - self.state.end_index >= new_len - buf.len) {
              self.state.end_index += new_len - buf.len;
              return true;
          } else {
              return false;
        }

        const bigger_buf_size = @sizeOf(BufNode) + new_end_index;
        const log2_align = comptime std.math.log2_int(usize, @alignOf(BufNode));
        if (self.child_allocator.rawResize(cur_alloc_buf, log2_align, bigger_buf_size, @returnAddress())) {
            cur_node.data = bigger_buf_size;
        } else {
            // Allocate a new node if that's not possible
            cur_node = self.createNode(cur_buf.len, n + ptr_align) orelse return null;
        }

So it looks like it does support what your asking for (unless I have seriously misunderstood your needs… totally possible).