Complex JSON handling

Zig supports JSON serialization. How can this be used for complex real-world tasks?

I need to parse an object, which contains strings and arrays of strings. I have the idea to use a Zig struct for the object. For the strings I use slices of u8. But how can I use what for an array of strings? My first idea is to use std.ArrayList([] u8) but I’m not sure if that is what the authors of the stdlib intend me to use here. What is it?

The Programming Without Pointers style comes to mind.
Basically, have a std.ArrayList(u8), and then have something like a second std.ArrayList(u32) for the string lengths, or perhaps offsets. (Offsets can be good; to slice the items, you just use the offset at the current index and the offset at the next index).
If you were storing an array of slices, you’d be storing a pointer with length to a list of pointers with lengths; by using this method, you’re storing one pointer with length, and then a list of lengths with no separate pointers.
On 64-bit architectures, you get a further benefit in that the string length/offset is stored smaller than it would be with a usize.

At deserialization time, when you encounter a new string, you appendSlice() it into the std.ArrayList(u8) and append() its length into the std.ArrayList(u32), and then how you store it in your representation of the JSON object is basically your own choice.
You can store a slice into the items of the std.ArrayList(u8), or you can store the index of the u32 length/offset as a cheaper type of handle.
In a similar sense, when you encounter an array of strings inside a JSON object, you can return a slice of the std.ArrayList(u32) as a compact representation of that array.

you can just use slices, the parser/serialiser understands them.

If you need anything it doesn’t understand, like an ArrayList, then you will need a custom jsonParse/jsonStringify function, they can be on the specific type it needs or a wrapper or on the larger type in question.

1 Like

Where can I find documentation what it understands and how it will generate JSON out of it?

What type do I deliver so it will generate an object? What type do I deliver so it will generate an array?

as always, std docs are a mess, there is some here and there, but the source is easier to find and is readable, innerParse is most useful here.

it understands primitive types, and has default logic for compound/user defined types, such types can override that with the functions I mentioned.

The mapping with json is:
object - struct, tagged union
array - slice, array, tuple, vector
string - slice of u8, enum
bool - bool
null - optional
number (can be in string form) - ints/floats, enum

std.json.Value is a tagged union for dynamic json values, implemented with the afore mentioned override functions.

jsonParse should have the signature fn(Allocator, token_source: anytype, json.Options) json.ParseError(@TypeOf(token_source.*)!T, T is the type being parsed and token_source is either *json.Scanner or *json.Reader (these should be combined eventually). It’s common to just infer the errorset.

jsonStringify is fn(json.Stringify) json.Stringify.Error!void
the stringify API is better documented as it was more recently overhauled.

I can also answer more specific questions if you need.

3 Likes

I wonder if (at some later point if not yet) the core team would be receptive to adding some sort of “Recipes” section to the std docs that’d include complete examples of how to do stuff. Loading a json file (like in this thread), loading a file line by line, etc would be super useful. I think it’d be possible to automatically test it somehow in CI. If not as part of the core compiler CI, maybe some sort of extra CI that wouldn’t block compiler changes.

I’d imagine the community would be willing to contribute examples in this docs section.

Right now this type of documentation is spread out all over the internet and most of those probably outdated. Outdated examples that don’t even compile may be more harmful than useful when looking for information on how to do something.

3 Likes

In which format is the doc created? We could set up a wiki in the same format, so the community has a chance to support with doc content.

In this case I dare to do this :wink:

I have the following problem:

I need to serialize a struct from and to JSON, which contains slices of structs, which are coming from C code. The C structs have char * as datatype for strings. I make a simple example:

C struct:

typedef struct {
    char *name;
    char *from;
} Person;

Zig struct:

const Team = struct {
    leader: c.Person,
    members: [] c.Person,
};

How can I parse this from JSON? How can I stringify this to JSON? If there is a solution to add member functions to struct Team this would be preferred. Or is it possible to add one such function pair for the unsupported data type [*c] u8 ?

For example, here’s what I do to convert JSON into a struct (the reverse is just as easy):

const Sensor = struct {
    frequencies: []f64,
    dir: []u8,
    gain: f64,
};

pub fn main(init: std.process.Init) !void {
    const io = init.io;
    const gpa = init.gpa;

...

    const json_buf = try gpa.alloc(u8, try data_file.length(io));
    defer gpa.free(json_buf);
    _ = try data_file.readPositionalAll(io, json_buf, 0);
    const parsed = try std.json.parseFromSlice(Sensor, gpa, json_buf, .{});
    defer parsed.deinit();
    const sensor = parsed.value;

...

It looks like c pointers [*c] are not supported by the API, so you’ll have to implement the override functions.

At the very least, if you make a manual binding for Person that uses [*:0]u8 pointers then it is supported by the stringifying API, but still not the parsing API.

In the custom parse function you’d just allocate the strings, and ensure they have a null terminator.

And in a custom stringify function you’d just slice = std.mem.span(ptr) to get a slice, then can just stringify.write(slice) for each string.

In both you would have to also do the logic for other fields and types, but it should be simple since the json api should support most of them and zigs reflection should help alot.

That’s exactly my question. How do I do this?

I did remove the differing behaviour based on options and probably some other functionality to keep it simple.

I’m not sure if char * will be translated to [*c]u8 or [*c]c_char, if the latter then you may need a pointer cast.

ofc you would put this on your zig type which will require some more logic since it is intended for the Person fields, unless you are using manual bindings.

const Person = extern struct {
    name: [*c]u8,

    pub fn jsonParse(allocator: std.mem.Allocator, source: anytype, options: std.json.ParseOptions) !Person {
        // a reduced copy of the struct parsing from `innerParse`, with ofc, added c string support

        // defining things that are in scope of the copied code,
        // instead of modifying each reference individually
        const structInfo = @typeInfo(@This()).@"struct";
        const Token = std.json.Token;
        const innerParse = std.json.innerParse;

        if (.object_begin != try source.next()) return error.UnexpectedToken;

        var r: Person = undefined;
        var fields_seen = [_]bool{false} ** structInfo.fields.len;

        while (true) {
            var name_token: ?Token = try source.nextAllocMax(allocator, .alloc_if_needed, options.max_value_len.?);
            const field_name = switch (name_token.?) {
                inline .string, .allocated_string => |slice| slice,
                .object_end => { // No more fields.
                    break;
                },
                else => {
                    return error.UnexpectedToken;
                },
            };

            inline for (structInfo.fields, 0..) |field, i| {
                if (std.mem.eql(u8, field.name, field_name)) {
                    // Free the name token now in case we're using an allocator that optimizes freeing the last allocated object.
                    // (Recursing into innerParse() might trigger more allocations.)
                    if (name_token.? == .allocated_string) allocator.free(field_name);
                    name_token = null;
                    if (fields_seen[i])
                        return error.DuplicateField;

                    @field(r, field.name) = if (field.type == [*c]u8)
                        // it already supports terminated slices, the pointer will coerce
                        (try innerParse([:0]u8, allocator, source, options)).ptr
                    else
                        try innerParse(field.type, allocator, source, options);
                    fields_seen[i] = true;
                    break;
                }
            } else {
                // Didn't match anything.
                if (name_token.? == .allocated_string) allocator.free(field_name);
                return error.UnknownField;
            }
        }
        for (fields_seen) |seen| if (!seen) return error.MissingField;
        return r;
    }
};

Wow. This is huge. And brings me to another problem: how can I add this to struct Person, which is included from a C file for obvious reasons. Isn’t there a solution, which is simpler and which I can use in Team?

How does stringify work?