Parsing JSON and ArrayList

ilx · March 31, 2024, 2:16am

Hi everyone! I’m new to the Zig programming language (three days in!) and manually managing memory in general. I was hoping someone might be kind enough to point me in the right direction regarding this JSON parsing question:

To try and learn some of the language fundamentals, I have an incomplete and contrived example of some code that reads in a JSON object (assumes it exists) that contains an array of todo items, parses it, and then exposes some utility functions to modify the list.

My goal was to end up with the list of todo items in an ArrayList. I was curious to know if my attempt at this is close to correct, or if it is wildly off target. Any advice regarding a correct / idiomatic solution or just general comments (especially if I am way off base about the need for it to end up in an ArrayList) would be greatly appreciated!

const std = @import("std");
const fs = std.fs;
const json = std.json;
const Parsed = json.Parsed;
const Allocator = std.mem.Allocator;
const ArrayList = std.ArrayList;

pub const TodoManager = struct {
    const Self = @This();

    allocator: Allocator,
    path: []const u8,
    todo_list: ArrayList(TodoItem),

    pub const Todo = struct { items: []TodoItem };

    pub fn init(allocator: Allocator, path: []const u8) !Self {
        const file = try fs.cwd().readFileAlloc(allocator, path, 512);
        defer allocator.free(file);

        const parsed = try json.parseFromSlice(Todo, allocator, file, .{ .allocate = .alloc_always });
        defer parsed.deinit();

        var todo_list = ArrayList(TodoItem).init(allocator);

        for (parsed.value.items) |item| {
            const owned_task = try allocator.dupe(u8, item.task);
            const new_item: TodoItem = .{
                .task = owned_task,
                .status = item.status,
            };

            try todo_list.append(new_item);
        }

        return .{
            .allocator = allocator,
            .path = path,
            .todo_list = todo_list,
        };
    }

    pub fn deinit(self: *Self) void {
        for (self.todo_list.items) |item| {
            self.allocator.free(item.task);
        }

        self.todo_list.deinit();
    }

    pub fn remove(self: *Self, idx: usize) void {
        const to_remove = self.todo_list.items[idx];
        _ = self.todo_list.orderedRemove(idx);
        self.allocator.free(to_remove.task);
    }

    pub fn add(self: *Self, task: []const u8, status: Status) !void {
        const owned_task = try self.allocator.dupe(u8, task);
        const item: TodoItem = .{
            .task = owned_task,
            .status = status,
        };

        try self.todo_list.append(item);
    }
};

pub const Status = enum(u2) {
    NotStarted = 0,
    InProgress = 1,
    Completed = 2,
};

pub const TodoItem = struct {
    task: []const u8,
    status: Status,
};

tensorush · March 31, 2024, 5:15am

Hi, welcome to the community!

Yeah, everything makes sense to me, I don’t see any logical errors or redundancies. There’re a few minor stylistic changes I’d make, though, like:

Status can be simplified like this and it’ll implicitly be the same thing:

pub const Status = enum {
    NotStarted,
    InProgress,
    Completed,
};

You should avoid defining Self in simple structs and instead use the name you’ve given them, see Don’t Self Simple Structs!
Also, you can reduce nesting for the big simple struct like TodoManager by turning it into a file struct (every file is already a struct in Zig). As a result, you’ll have a file named TodoManager.zig with its fields defined at top-level scope and other internal structs already nested inside its namespace. Btw, in that case you will need to use @This but you should still use the file struct’s name like this:

const TodoManager = @This();

todo_list: ArrayList(TodoItem),
allocator: Allocator,
path: []const u8,

pub const Todo = struct {
    items: []TodoItem,
};

pub fn init(allocator: Allocator, path: []const u8) !TodoManager {
...

Otherwise, it looks fine to me. Nice job!

ilx · March 31, 2024, 9:52am

@tensorush Thank you for taking the time to help me, I really appreciate it!

Excellent advice regarding the enum and the unnecessary @This inside the struct. The article you’ve linked was a great read and very insightful.

I actually had no clue it was possible to do a file-level struct, this is very interesting and a cool concept. Thank you a tonne for the example code here, I’ll be sure to go and have a play around with this idea!

As a quick follow up - if my goal was to end up with a mutable list of items after parsing the JSON - then allocating an ArrayList, looping over the slice returned from the call to json.parseFromSlice and populating the list that way seems reasonable? I guess, by reasonable I mean not too many allocations / unnecessary actions in general? I understand why we cannot parse directly into the ArrayList but there was something about the solution I wrote that didn’t seem … optimal.

AndrewCodeDev · March 31, 2024, 10:13am

Hey @ilx, I’m going to link you to something you may find interesting: How to use toOwnedSlice() and alternatives That topic went into a lot of details about using ArrayList effectively. I think you may find it interesting.

I’ll just add two cents here that may also help. First, you can do little things to squeeze a bit more performance out:

var todo_list = ArrayList(TodoItem).init(allocator);

One thing that immediately pops out is that you can use initCapacity instead of standard init to give yourself some more initial room to work with.

Another thing is the ordered remove:

_ = self.todo_list.orderedRemove(idx);

I can’t speak for your use case, but if order doesn’t matter, you can use swapRemove which is much faster. I can imagine that the order of tasks may actually be important. Which actually brings up the issue of “know thy problem”.

Let’s say that I have a list of tasks that I need to complete and I know that I’ll never do more than say 20 per day and they have to be in order. I could start by initializing 20 tasks and put them in the ArrayList in reverse. The reason for the reverse is that I can now pop them off the end of the list without needing to shuffle anything around. That advice only works though if I know that my problem allows me to make those assumptions.

I’d also like to point out that the allocator parameter depends on whether or not you actually need to allocate. In the code for json, it determines if an allocation is required (it’s passed options like alloc_if_needed).

Just some food for thought.

AndrewCodeDev · March 31, 2024, 10:22am

Another thing to consider is the allocator. Again, if you know that you do not need more than N tasks, you can create a FixedBufferAllocator with the amount of memory that you need and enforce that you don’t go over. That way, you can parse all of your tasks on a single allocation. You may need extra memory for the parser, but you can be generous and still get away with a single allocation.

To do that though, you’d need to call parseFromSliceLeaky instead of parseFromSlice because to make it “not leaky” it creates an arena allocator of its own. You can bypass that by calling the leaky version and giving it a stack-based allocator instead. If you want to use the Arena’d version, then by all means - try it out but you can go pretty far down the performance route without dramatically changing the appearance of your code or the loops you have.

ilx · March 31, 2024, 10:56pm

@AndrewCodeDev Thank you for the detailed and really well considered reply brother! This was immensely helpful. The Zig community has been so awesome - I’m very grateful that you all have taken the time to help me like this.

The article you’ve linked is excellent - I will dive deep into it

You’ve also raised some excellent points that I did not properly consider while I was putting that demo together - being sure to have a clear understanding of the problem so that you can tailor a specific solution and squeeze out some extra gains is an awesome note, thank you! I have not yet explored features like initCapacity on the ArrayList or the FixedBufferAllocator and it sounds for sure that I should dive deeper into the json module. I really appreciate the direction.

AndrewCodeDev · March 31, 2024, 11:16pm

No problem, and welcome to Ziggit