Getting the next JSON value from a Reader of mixed input

I am trying to parse a protocol that mixes plain text tokens and JSON input. Here is a test that demonstrates what I am trying to do:

test "Partial json stream parse" {
    const T = struct {
        x: ?bool,
        y: ?bool,
    };
    const input =
        \\foo {"x": true, "y": null} bar
    ;

    var reader: std.Io.Reader = .fixed(input);
    var arena: std.heap.ArenaAllocator = .init(std.testing.allocator);
    defer arena.deinit();
    const alloc = arena.allocator();

    var json_reader: std.json.Reader = .init(alloc, &reader);

    try std.testing.expectEqualSlices(u8, "foo", (try reader.takeDelimiter(' ')) orelse "");
    try std.testing.expectEqual(T{ .x = true, .y = null }, try std.json.parseFromTokenSourceLeaky(
        T,
        alloc,
        &json_reader,
        .{},
    ));
}

This fails due to an assertion that the JSON reader has reached the end of the input:

/home/robby/downloads/zig-x86_64-linux-0.16.0-dev.1303+ee0a0f119/lib/std/json/Scanner.zig:1317:34: 0x10611ce in skipWhitespaceCheckEnd (std.zig)
    if (self.stackHeight() == 0) return error.SyntaxError;
                                 ^
/home/robby/downloads/zig-x86_64-linux-0.16.0-dev.1303+ee0a0f119/lib/std/json/Scanner.zig:390:21: 0x106248d in next (std.zig)
                if (try self.skipWhitespaceCheckEnd()) return .end_of_document;
                    ^
/home/robby/downloads/zig-x86_64-linux-0.16.0-dev.1303+ee0a0f119/lib/std/json/Scanner.zig:1717:37: 0x1053f4e in next (std.zig)
                else => |other_err| return other_err,
                                    ^
/home/robby/downloads/zig-x86_64-linux-0.16.0-dev.1303+ee0a0f119/lib/std/json/static.zig:151:32: 0x104939e in parseFromTokenSourceLeaky__anon_3166 (std.zig)
    assert(.end_of_document == try scanner_or_reader.next());
                               ^
/home/robby/src/zits/src/server/message_parser.zig:187:60: 0x1041753 in test.Partial json stream parse (message_parser.zig)
    try std.testing.expectEqual(T{ .x = true, .y = null }, try std.json.parseFromTokenSourceLeaky(
                                                           ^

If I replace input with:

    // note missing 'bar' after the object
    const input =
        \\foo {"x": true, "y": null}
    ;

it runs fine.

This is not an assertion that I care about. What I care about is that the stackHeight() of the scanner has reached zero, but there does not seem to be a parser function that handles this. I looked at trying to collect the tokens manually in an ArrayList and terminating when I read a stackHeight of zero, but there does not seem to be a way to process a token source besides std.json.Scanner or std.json.Reader, or implementing them (which seems like an unnecessary effort).

If I knew the bounds of the JSON in the reader buffer, I could use std.json.parseFromSlice, but there does not seem to be a reasonable way to do that. (Even if I could do this, it would not be preferred since I would have to deal with running into the buffer boundary myself).

Has anyone run into this problem before? Have I just run into the boundary of what the standard library supports here?

From what I was able to understand, Zig’s JSON parser uses older/official standard that does not allow things such as these and comments as well.

EDIT: In case you want to have optional values, you can do that with ?. That works. You can have defaults as well inside a structure with optionals so you can omit them but still be used. I guess I should show an example that I was using from my code Might be helpful to some:

const Config = struct {
    client_endpoint: Socket,
    forwarder_socket: Socket,
    server_socket: SrvSocket = .{},
    switcher: Switcher,
    log_level: ?[]const u8 = null,

    const Socket = struct {
        address: []const u8,
        port: u16,
    };

    const SrvSocket = struct {
        address: []const u8 = "0.0.0.0",
        port: u16 = 0,
    };

    const Switcher = struct {
        enabled: bool,
        id: usize,
        timer: ?usize = null,
        endpoints: []const []const u8,
    };
};

EDIT 2:
You can have uknown fields used as comments for example, just need to specify the mode:

    var parsed = try std.json.parseFromSlice(
        Config,
        allocator,
        buf,
        .{ .ignore_unknown_fields = true },
    );
    defer parsed.deinit();

I’m not really sure what you mean by “things such as these”; this should not be a matter of the JSON standard. The 2013 standard* says “Conforming JSON text is a sequence of Unicode code points that strictly conforms to the JSON grammar”. In my code examples, {"x": true, "y": null} is a sequence of Unicode code points that strictly conform to the JSON grammar. I’m looking for an API to parse that, without asserting that the end of the std.Io.Reader was reached.

* the second and latest revision from 2017 says the same thing

I think your use case is not directly supported, but with little modifications you should be able to modify a copy of parseFromTokenSourceLeaky to stop after a valid JSON prefix.

The easy way might simply be to copy that function into your code base and delete the offending assert.

1 Like

If you use innerParse it should work, it is leaky though so you should use an arena.

innerParse does not work for my usecase, because it consumes from the Reader beyond the end of the next JSON value.

test "Partial json stream parse" {
    const T = struct {
        x: ?bool,
        y: ?bool,
    };
    const input =
        \\foo {"x": true, "y": null} bar
    ;

    var reader: std.Io.Reader = .fixed(input);
    var arena: std.heap.ArenaAllocator = .init(std.testing.allocator);
    defer arena.deinit();
    const alloc = arena.allocator();

    var json_reader: std.json.Reader = .init(alloc, &reader);

    try std.testing.expectEqualSlices(u8, "foo", (try reader.takeDelimiter(' ')) orelse "");
    try std.testing.expectEqual(T{ .x = true, .y = null }, try std.json.innerParse(
        T,
        alloc,
        &json_reader,
        .{ .max_value_len = std.json.Scanner.default_max_value_len, .allocate = .alloc_always },
    ));

    try std.testing.expectEqualSlices(u8, " bar", reader.buffered());
}
slices differ. first difference occurs at index 0 (0x0)

============ expected this output: =============  len: 4 (0x4)

20 62 61 72                                        bar

============= instead found this: ==============  len: 0 (0x0)

                                                  

================================================

Simple modification to parseFromTokenSourceLeaky is not possible, because the bulk of the parsing is done by innerParse, and runs into the same issue.

You can hack it this way

try std.testing.expectEqualSlices(u8, " bar", json_reader.scanner.input[json_reader.scanner.cursor..]);