Unavoidable memory leak?

namespace_miata · December 30, 2023, 7:50am

Earlier this year I was writing some Zig for a project I had and came up with this code:

fn getJson(uri: std.Uri, allocator: Allocator) !json.Parsed(json.Value) {
    var client: http.Client = .{ .allocator = allocator };
    defer client.deinit();
    var req = try client.request(.GET, uri, .{ .allocator = allocator }, .{});
    defer req.deinit();
    try req.start();
    try req.wait();
    const reader = req.reader();
    var lst: std.ArrayList(u8) = std.ArrayList(u8).init(allocator);
    defer lst.deinit();
    var writer = lst.writer();
    while (true) {
        const byte = reader.readByte() catch |err| switch (err) {
            error.EndOfStream => break,
            else => |e| return e,
        };
        try writer.writeByte(byte);
    }
    const parsed = std.json.parseFromSlice(json.Value, allocator, lst.items, .{ .allocate = .alloc_always });
    return parsed;
}

As far as I was able to figure out, this code has a required memory leak in the lst arraylist. From what I was able to figure out, if lst if freed before the parsed JSON, the JSON becomes corrupt and can no longer be accessed. However, the JSON also does not take ownership of the memory, so if lst is not freed, it will be allocated for the duration of the program. I became very frustrated at this fact when I wrote this code, as there was no documentation regarding how the parsed JSON relates to the slice passed in, and I had a bit of trouble reading the source of the stdlib where it is define (ie I couldn’t figure out what was causing the problem). I’m not sure if this is a bug of some sort or if I am making some sort of obvious blunder, but either way I would really appreciate any insight that one could give me. (Side note, I just recently learned about arena allocators and could see that as a potential solution, however I would still like to know if this could be solved in a general way (not relying on a specific allocator))

markus · December 30, 2023, 8:33am

std.json.parseFromSlice returns a Zig Documentation, which has the function deinit. This will free all memory it has been given.
After the slice has been parsed, everything else can safely be deinitialized (except of course the allocator passed in). Calling deinit on parsed will then free its memory.

There’s a code sample for this on Zig Documentation

Also: Welcome to the forum!

namespace_miata · December 30, 2023, 7:54pm

Unless I’m not understanding you, I think this is where my problem is. This is my understanding right now: the parsed JSON returns a hashmap of values which are the JSON keys and values (or at least it does from the urls I pass in). The hashmap that is created I am able to access with (parsedJSON).value.object.get(key), however with the line defer lst.deinit() (which if I’m understanding you, should be perfectly safe to do), the program crashes with a seg fault. Importantly, without this line, the get works perfectly fine but it leaks the memory of the arraylist.

Also: Thank you!

markus · January 1, 2024, 2:31pm

I would think so too, to be honest. I guess I am wrong too.
I don’t really know if theres another function that behaves as you expect, but a solution would be to either use an arena allocator per json-involving-task or to split up the function and let the requests memory live outside

EDIT:
https://ziglang.org/documentation/master/std/#A;std:json.ParseOptions

Based on this you code should work, too

namespace_miata · January 1, 2024, 7:03pm

Good to know I’m not going crazy then, do you think this is intentional behavior or some sort of bug? For instance, something similar happens with std.Uri.parse, however that function is documented as " The return value will contain unescaped strings pointing into the original text .", so that would be the expected behavior.

nurpax · January 1, 2024, 9:05pm

Sounds almost as if the JSON allocs end up referring to your ArrayList somehow. I wonder if you could reproduce this in a simple standalone zig file? Maybe it’s a bug in the JSON parser.

namespace_miata · January 1, 2024, 9:52pm

This code crashes with line 15, however line 14 runs just fine and with line 15 commented out it also runs fine aside from the memory leak detected by the allocator (line 7).

1 | const std = @import("std");
2 | const json = std.json;
3 | const Allocator = std.mem.Allocator;
4 |
5 | pub fn main() !void {
6 |   var gpa = std.heap.GeneralPurposeAllocator(.{}){};
7 |   defer std.debug.assert(gpa.deinit() == .ok);
8 |   const allocator = gpa.allocator()
9 |   const num = "bc";
10|   var key = "ab";
11|   const json_str = try std.fmt.allocPrint(allocator, "{{\"{s}\":\"{s}\"}}", .{ key, num });
12|   std.debug.print("{s}\n", .{json_str});
13|   const parsed = try json.parseFromSlice(json.Value, allocator, json_str, .{ .allocate = .alloc_always });
14|  std.debug.print("{s}\n", .{parsed.value.object.get("ab").?.string});
15|   allocator.free(json_str);
16|   std.debug.print("{s}\n", .{parsed.value.object.get("ab").?.string});
17|   parsed.deinit();
18| }

It would appear from the documentation that the alloc_always option should eliminate this issue, however that’s not what happens in practice. Not sure if that intended behavior or not but here it is in a standalone file.

squeek502 · January 2, 2024, 2:33am

Seems like this might have been fixed in latest master. I can reproduce the segfault with Zig 0.11.0 but runs fine with the latest master version of Zig.

EDIT: Was likely fixed by std.json: avoid stale pointers when parsing Value by ianprime0509 · Pull Request #16864 · ziglang/zig · GitHub