How to structure memory management and JSON parsing

Hello friends of Zig,
I’m toying with Zig trying to implement Gossip Glomers. I’ve managed to implement the JSON parsing using std.json but I’m unsure how to structure memory management around it. Maybe my overall approach is wrong, so I’d be happy to hear feedback on this.

The first challenge was that the protocol in maelstrom creates messages like this:

{"src":"n1",  dst:"c2", "body": { "type": "init", ...}}

You’ll notice that the body has a type field and the structure of the body depends on that type. So I’ve created a RawMessage type:

pub const RawMsg: type = struct {
    src: []u8,
    dst: []u8,
    body: std.json.Value,
}

I can then extract the type via the body field, and parse conditionally. That’s fine, if not a bit verbose. Is there a nicer way to do this?

What I’m struggling with now is memory management. I’m using std.json.parseFromValue with an allocator to create structs of the type specific messages. I’ve placed all this in a method:

    pub fn msgBody(self: *const @This(), allocator: std.mem.Allocator) !MsgBody {
        switch (try self.msgType()) {
            MsgType.Init => {
                const parsedBody: Parsed(...) = try std.json.parseFromValue(T: InitMsg, allocator: allocator, source: self.body, options: .{});
                defer parsedBody.deinit();

                return .{
                    .Init: InitMsg = parsedBody.value,
                };
            },
            MsgType.Echo => {...},
        }
    }

You can probably already spot the bug; at the end of the scope the parsed message is deallocated via defer parsedBody.deinit(). I see two ways of dealing with this:

  • copy the parsed body
  • defer allocation

Copying the parsed body seemed obvious but recursively cloning structs is a pain. Maybe I’m missing a useful std lib method to do that? But it was annoying enough that I thought about the second option: delegate the deallocation to the caller by adding a deinit method on the MsgBody, but I don’t see how to do that short of holding parsedBody in MsgBody. Is there a better way? Am I thinking completely wrong about this?

1 Like

Hi Welcome to ziggit, I think if you want to have a simpler experience, using the leaky variants with your own arena, and manage memory via the arena would be a good way to achieve what you are trying to do. :slight_smile:

1 Like

Thanks for the welcome. :slight_smile:

Oh, that’s what the leaky variant is for? That’s a great idea! I’d have an arena allocator per parsed message and just clean it up at then end! I’ll give that a try, thank you!

If I needed the parsed struct outside of the message processing cycle, is there a better way to copy structs recursive?

1 Like

Your welcome, so from the doc-comment.

/// Parses the json document from `s` and returns the result.
/// Allocations made during this operation are not carefully tracked and may not be possible to individually clean up.
/// It is recommended to use a `std.heap.ArenaAllocator` or similar.
pub fn parseFromSliceLeaky(
    comptime T: type,
    allocator: Allocator,
    s: []const u8,
    options: ParseOptions,
) ParseError(Scanner)!T {
    var scanner = Scanner.initCompleteInput(allocator, s);
    defer scanner.deinit();

    return parseFromTokenSourceLeaky(T, allocator, &scanner, options);
}

Like this function (which I know is not the one you are using, but the meaning behind the leaky suffix is the same, meaning, the “Leaky” variants, don’t track memory for you, you are supposed to either leak memory (plz don’t ahah), or pass it an arena allocator, and batch free it.

For the recursive copy, that I’m not entirely sure I understand, what you meant :slight_smile:

1 Like

Got it. I suppose if I were to use an arena allocator, it wouldn’t even matter whether I use a leaky or non-leaky variant, right?

For the recursive copy

Suppose I have a struct:

struct {
  echo: []u8,
  bla: []u8,
}

If I were to parse corresponding JSON into this struct, i would see at least 2, if not 3 allocations: one for each of the []u8, and possibly one for the struct itself (or that might be stack allocated?).

I know i can copy structs itself by assigning, but the pointers to the slices remain unchanged, right? If they point at something that’s cleaned up elsewhere, I’d get a segfault. So instead, I tried:

const original = try std.json.parseFromSlice(...);
defer original.deinit();

// make a copy:
const copy = MyStruct{
  .echo = try allocator.dupe(u8, original.echo),
  .bla = try allocator.dupe(u8, original.bla),
};

There must be a better way to do this, right?

1 Like

It would matter a little, because The arena can’t call free unless it’s the last value, so it’s probably best to use the appropriate version.

I’m by no mean an expert on the matter, but all you have to think about is the lifetime of your memory, and how much space it takes. If you have a ton of small short lived cyclic allocation, an arena allocator, combined with arena.reset(.retain_capacity);. is really good. If you have an upper bound or better a fixed size for your allocations, then a pool_allocator is really good.

So I would try to think whether you really need to dupe the value, or if you can actually just let it live long enough for you to use it. If you are concerned about efficiency and what you want to copy is small enough, you can very cheaply make a copy on the stack of one of your function, using a small buffer, and using that to init a FixedBufferAllocator.

As for a better way, maybe there is, but I think the code is fine as is, of course if you have more complex struct it can be a bit tiresome, but then again, maybe if there is too much friction, it’s an invitation to rethink how you do stuff.

That’s something Zig is really good about, I often find myself implementing what I think is a good way of doing something. Later I begin to feel like there is just too much friction and my implementation is not ergonomic to use. Which is when I realize maybe what I thought was a good way to do it actually isn’t, and I think harder and get to a better solution.

2 Likes