How to free only the overhead for the arena allocator?

kj4tmp · December 26, 2024, 3:21am

I’ve got a recursive descent parser for message pack that I want to allocate with.

The user provides their type, and the generics take care of the rest. Whatever I return, the user will just have to clean up themselves.

pub fn decodeAlloc(allocator: std.mem.Allocator, comptime T: type, in: []const u8) error{ OutOfMemory, Invalid }!T {
    var fbs = std.io.fixedBufferStream(in);
    var arena = std.heap.ArenaAllocator.init(allocator);
    errdefer arena.deinit();
    const res = decodeAny(T, fbs.reader(), fbs.seekableStream(), arena.allocator()) catch |err| switch (err) {
        error.OutOfMemory => return error.OutOfMemory,
        error.Invalid => return error.Invalid,
        error.EndOfStream => return error.Invalid,
    };
    if (fbs.pos != fbs.buffer.len) return error.Invalid;
    return res;
}

test "decode slice bools" {
    const decoded = try decodeAlloc(std.testing.allocator, []bool, &.{ 0b10010011, 0xc3, 0xc2, 0xc3 });
    defer std.testing.allocator.free(decoded);
    const expected: []const bool = &.{ true, false, true };
    try std.testing.expectEqualSlices(bool, expected, decoded);
}

Did you spot the bug? The arena allocator has “overhead”. On each allocation, it needs to allocate some extra stuff to track what it has allocated. On the happy path, I leak memory in decodeAlloc due to this extra memory usage.

How do I fix this? The primary reason I am using an arena is that I can happily decode a stream without errors, but if the stream does not end at the correct byte, I want to return an error. Is this a flaw in my decoding API? Should I just not return an error if the stream is too long, i.e. delete this line?

if (fbs.pos != fbs.buffer.len) return error.Invalid;

Maybe I could add an is_end: bool parameter to my recursive decent parser but then I have to clutter every single parsing function with a check to see if its at the end.

kj4tmp · December 26, 2024, 4:35am

I see std.json just returns the entire arena along with the type.

github.com

ziglang/zig/blob/42dac40b3feeabe39b5f191d1e72d247327133ba/lib/std/json/static.zig#L56


      
              /// When parsing to a `std.json.Value`, set this option to false to always emit
              /// JSON numbers as unparsed `std.json.Value.number_string`.
              /// Otherwise, JSON numbers are parsed as either `std.json.Value.integer`,
              /// `std.json.Value.float` or left as unparsed `std.json.Value.number_string`
              /// depending on the format and value of the JSON number.
              /// When this option is true, JSON numbers encoded as floats (see `std.json.isNumberFormattedLikeAnInteger`)
              /// may lose precision when being parsed into `std.json.Value.float`.
              parse_numbers: bool = true,
          };
          
          pub fn Parsed(comptime T: type) type {
              return struct {
                  arena: *ArenaAllocator,
                  value: T,
          
                  pub fn deinit(self: @This()) void {
                      const allocator = self.arena.child_allocator;
                      self.arena.deinit();
                      allocator.destroy(self.arena);
                  }
              };

kj4tmp · December 26, 2024, 5:03am

well it makes my API uglier but maybe the memory management is easier for the user so they don’t have to write their own deinit methods

pub fn Decoded(comptime T: type) type {
    return struct {
        arena: *std.heap.ArenaAllocator,
        value: T,
        pub fn deinit(self: @This()) void {
            const allocator = self.arena.child_allocator;
            self.arena.deinit();
            allocator.destroy(self.arena);
        }
    };
}

pub fn decodeAlloc(allocator: std.mem.Allocator, comptime T: type, in: []const u8) error{ OutOfMemory, Invalid }!Decoded(T) {
    var fbs = std.io.fixedBufferStream(in);
    const arena = try allocator.create(std.heap.ArenaAllocator);
    errdefer allocator.destroy(arena);
    arena.* = .init(allocator);
    errdefer arena.deinit();
    const res = decodeAny(T, fbs.reader(), fbs.seekableStream(), arena.allocator()) catch |err| switch (err) {
        error.OutOfMemory => return error.OutOfMemory,
        error.Invalid => return error.Invalid,
        error.EndOfStream => return error.Invalid,
    };
    if (fbs.pos != fbs.buffer.len) return error.Invalid;
    return Decoded(T){ .arena = arena, .value = res };
}
test "decode slice bools" {
    const decoded = try decodeAlloc(std.testing.allocator, []bool, &.{ 0b10010011, 0xc3, 0xc2, 0xc3 });
    defer decoded.deinit();
    const expected: []const bool = &.{ true, false, true };
    try std.testing.expectEqualSlices(bool, expected, decoded.value);
}

zig std lib wins again!