Small command-line application reading ZON config

Hi everyone, I tried to create a small command-line application which reads a ZON config file. My config file (named config.zon) looks like this:

.{
    .a = 42,
}

And then my program (named usesconfig.zig) looks like this:

const std = @import("std");

const Config = struct {
    a: i32 = 1,
    b: i32 = 2,
};

pub fn main() !void {
    // TODO: use default allocator when available
    const allocator = std.heap.smp_allocator;

    const stdout = std.io.getStdErr().writer();
    const stderr = std.io.getStdErr().writer();

    // Any way to write this easier?
    // See also https://github.com/ziglang/zig/issues/2792
    const config: Config = loadconf: {
        const args: [][:0]u8 = try std.process.argsAlloc(allocator);
        defer std.process.argsFree(allocator, args);
        if (args.len != 2) {
            // Should I use std.debug.print instead of stderr?
            // It's easier, but this is not debugging.
            try stderr.print(
                "Expecting config file name as single argument.\n",
                .{},
            );
            return std.process.exit(1);
        }
        const cfg_filename = args[1];

        // This seems straight forward:
        var cfg_string = std.fs.cwd().readFileAlloc(
            allocator,
            cfg_filename,
            1024 * 1024,
        ) catch |err| {
            try stderr.print(
                "Could not read config file \"{s}\": {s}\n",
                .{ cfg_filename, @errorName(err) },
            );
            return std.process.exit(1);
        };
        defer allocator.free(cfg_string);

        // But std.zon.parse.fromSlice requires a NULL-terminated slice.
        // So there is some overhead.
        // Is this the right way to add a zero byte? Feels verbose.
        cfg_string = try allocator.realloc(cfg_string, cfg_string.len + 1);
        cfg_string[cfg_string.len - 1] = 0;
        const cfg_string_term = cfg_string[0 .. cfg_string.len - 1 :0];

        var diagnostics = std.zon.parse.Diagnostics{};
        defer diagnostics.deinit(allocator);
        break :loadconf std.zon.parse.fromSlice(
            Config,
            allocator,
            cfg_string_term,
            &diagnostics,
            .{},
        ) catch {
            try stderr.print("Could not parse config file \"{s}\":\n{any}", .{
                cfg_filename,
                diagnostics,
            });
            return std.process.exit(1);
        };
    };
    defer std.zon.parse.free(allocator, config);
    try stdout.print("Configuration is {any}\n", .{config});
}

I can run with zig run usesconfig.zig -- config.zon, and then I get the following output:

Configuration is usesconfig.Config{ .a = 42, .b = 2 }

My questions in this context are:

  • Is it a good or bad idea to use ZON for configuring a program?
  • Is my code idiomatic? (Ignoring the choice of allocator)
  • When reporting errors, should I use std.debug.print or std.io.getStdErr? Is reporting error messages considered “debugging” in that matter?
  • Is it a good idea to read a configuration file all at once by using std.fs.Dir.readFileAlloc, or are there better alternatives?
  • Why does std.zon.parse.fromSlize require a null-terminated slice? Any easier way to go from having a filename of my configuration file to the parsed struct without manual reallocation to add the NULL byte?

Your help and opinions are appreciated.

Edit: I forgot defer diagnostics.deinit(allocator); which I added in the code above.

Edit #2: Apparently my code requires the current master branch as the 0.14.1 release has a different signature for std.zon.parse.fromSlice. The version I use is 0.15.0-dev.847+850655f06.

Edit #3: I also forgot this (and added it above), even if it seems to be a no-op with the given Config type:

     };
+    defer std.zon.parse.free(allocator, config);
     try stdout.print("Configuration is {any}\n", .{config});

Yes, zon is a good config format.

I would suggest to turn this into a function, ideally a member function of Config and if you want to be idiomatic, then you would call it init(). Additionally I would put the std.zon.parse.free(allocator, config); into a config.deinit() function, since at that point it’s no longer obvious that the config was allocated using the zon parser.

The convenient solution for this is to use allocator.dupeZ() which will make a null-teminated copy.
But in this case you could also just use readFileAllocOptions and pass 0 as the sentinel.

I personally always use std.log which covers more use-cases and gives you more options to configure its behavior, e.g. you can write your own log function and for example log things to a file as well.

Unless you are working with really large files, or you know that you won’t read the whole file, I think this is the right approach.

That for sure is an odd decision. I’m surprised as well that it does need null termination.
I think it’s probably some implementation detail of Zig’s parser.

2 Likes

You meant getStdOut.

If you’re printing on error and then immediately quitting, it’s fine to use debug.print. it’s unbuffered, so it’s not great to call it multiple times. It also silently ignores errors, which is probably fine in this case, as it’s unlikely you’ll handle an error inside an error.

Why not just return an error? Zig will pretty print it, which is nicer than just 1. It will also return a failure code.

Zon is intended to compete with JSON, so this is a supported use case.

It’s usually better to use a reader, but the Zon API doesn’t take readers. :man_shrugging:

I don’t know why it takes a null-terminated slice, but the easier way you asked is to call readFileAllocOptions.

1 Like

That sounds nice, but when I try, I run into several complications: I would need to store the allocator somehow (as it’s needed by deinit). I assume storing it directly in Config would confuse the parser. Also, error reporting (and accessing the file name for the error message) becomes more complicated. It sounds nice in theory, but feels like it brings overhead upon it, which doesn’t seem worth it. (But I’m okay to be convinced otherwise.)

Maybe I should take time to look into that. I haven’t done it yet.

Oh yes, that was an error. Thanks.

I assume that the way I wrote my code, there is also no buffering. I would have to manually call std.io.bufferedWriter as a wrapper, I assume.

Interesting question: How do handle I/O errors when reporting errors. :thinking: Probably ignoring them is best. However, what if some messages get silently dropped, while others don’t. That could cause confusion. Not sure if that ever could happen in practice in a reasonable scenario. Still, I feel like the proper reaction would be to terminate the program when error message output fails.

I would like to first print the config file name, then that it couldn’t be parsed, and then the details. I don’t see how I could do that with the default error handling.

I’ll keep that in mind when I write own functions that process a byte stream of some sort.

Perfect, thanks!

You could just pass it as a parameter to deinit as well.

Well, if you open the file and parse the arguments there as well, then you also have the file name there as well. In the end both command line parameters and associated files are part of the application’s configuration.

2 Likes

Ignoring seems to be standard, that’s what std.log does. If you really want to know if your logs errored as well, you could do the following:

errReturner() catch |err| {
    errReturningPrint("oh no!: {}\n", .{err}) catch return err;

    return err;
};

If the logging fails, the original error will still be returned, but its stack trace will point to the line with errReturningPrint.

You could still do all of that logging, but just replace the std.process.exit(1) with an error return.

1 Like

It simplifies the tokenizer implementation since it allows EOF (signified by NUL) to be handled as a switch case (example)

3 Likes

I tried to follow most advice given here, and this is my rewrite:

const std = @import("std");
const Allocator = std.mem.Allocator;

const Config = struct {
    a: i32 = 1,
    b: i32 = 2,
    const MaxFileLen = 1024 * 1024;
    const Error = error{ OutOfMemory, ConfigRead, ConfigParse };
    pub fn init(
        // What's the best name for the Allocator argument:
        allocator: Allocator,
        filename: []const u8,
        // Is it idiomatic to accept the log scope this way:
        log: anytype,
    ) Error!Config {
        const str = std.fs.cwd().readFileAllocOptions(
            allocator,
            filename,
            MaxFileLen,
            null,
            std.mem.Alignment.@"1",
            0,
        ) catch |err| {
            // I expect the caller to know that OutOfMemory isn't logged,
            // and that ConfigRead and ConfigParse errors are logged.
            // Is that a good idea?
            if (err == error.OutOfMemory) return Error.OutOfMemory;
            log.err(
                "Could not read config file \"{s}\": {s}",
                .{ filename, @errorName(err) },
            );
            return Error.ConfigRead;
        };
        defer allocator.free(str);
        var diag = std.zon.parse.Diagnostics{};
        defer diag.deinit(allocator);
        return std.zon.parse.fromSlice(
            Config,
            allocator,
            str,
            &diag,
            .{},
        ) catch |err| switch (err) {
            error.OutOfMemory => return Error.OutOfMemory,
            error.ParseZon => {
                // Now this is annoying, I have to strip the trailing
                // newline. Any easier way to do that, or a way to avoid
                // having to do it at all?
                var diag_str = std.ArrayList(u8).init(allocator);
                defer diag_str.deinit();
                const diag_writer = diag_str.writer();
                try diag_writer.print("{any}", .{diag});
                const diag_str_trimmed = std.mem.trim(u8, diag_str.items, "\n");
                log.err("Could not parse config file \"{s}\":\n{s}", .{
                    filename, diag_str_trimmed,
                });
                return Error.ConfigParse;
            },
        };
    }
    pub fn deinit(
        // Accepting a pointer here avoids being able to use "const"
        // later. Is that alright?
        self: *Config,
        allocator: Allocator,
    ) void {
        _ = allocator;
        self.* = undefined;
    }
};

pub fn main() !void {
    // TODO: use default allocator when available
    var debug_alloc = std.heap.DebugAllocator(.{}).init;
    defer _ = debug_alloc.deinit();
    const allocator = debug_alloc.allocator();

    var stdout_buf = std.io.bufferedWriter(std.io.getStdErr().writer());
    // Flush automatically? A good idea?
    defer stdout_buf.flush() catch {};
    const stdout = stdout_buf.writer();

    // I feel like this "blocklabel: {}" syntax becomes
    // unhandy if I set more than one value.
    var config: Config = loadconf: {
        const args: [][:0]u8 = try std.process.argsAlloc(allocator);
        defer std.process.argsFree(allocator, args);
        if (args.len != 2) {
            std.log.err(
                "Expecting config file name as single argument.",
                .{},
            );
            return std.process.exit(1);
        }
        break :loadconf Config.init(allocator, args[1], std.log) catch |err| {
            switch (err) {
                // Using the knowledge that CondigRead and ConfigParse do
                // not require any further debug output.
                error.ConfigRead => return std.process.exit(1),
                error.ConfigParse => return std.process.exit(1),
                error.OutOfMemory => return error.OutOfMemory,
            }
        };
    };
    defer config.deinit(allocator);
    try stdout.print("Configuration is {any}\n", .{config});
}

This brought up some more issues/questions:

  • How do you usually name the allocator argument? It goes as first argument (after any type arguments) usually? I have seen “gpa” in the standard lib, but that seems to be odd!?
  • Is it correct to use log: anytype to accept a logger (with a scope)?
  • Is it a good or bad idea to log only some errors and expect the caller to panic/escalate on other errors (e.g. OutOfMemory) or do some logging at the caller site for certain errors if needed?
  • The logging function, e.g. std.log.err, doesn’t seem to expect a newline (\n) at the end of the message. Now this got me in trouble as the string representation of the ZON parser Diagnostics contains a trailing newline. How to deal with that in an easy and idiomatic way?
  • Should I always expect a pointer passed to deinit? Or what should I do in case of my Config struct? (See also: When to use a struct vs a struct pointer)

Thanks for all your advice so far.

Don’t overthink it. It doesn’t really matter. The std library sometimes calls the allocator a gpa to mean it expects the allocator to be able to free memory out of order, like any general purpose allocator. In certain places, the library explicitly won’t keep track of its objects, and then it names the allocator arena. You don’t really need to pass a GeneralPurposeAllocator to an argument called gpa, any allocator that can allocate and free out of order is fine. Same for an allocator called arena, it doesn’t necessarily means an Arena, it could be, for instance, a FixedBufferAllocator.

Yes.

In libraries, usually just returning the error is fine. Logging is fine as long as you take the logger as a parameter, like you did, because the user could just pass a no-op if they don’t want logging.

Maybe don’t worry about it?

That depends a lot on architecture of your struct. If it’s reasonable to create a const instance of your struct, then your deinit should be able to handle a const instance, which means it needs to take either a const pointer or a value. If a const version of your struct is nonsensical, then it’s reasonable to take a mutable pointer to it.

I’m just trying to understand typical conventions as of yet. Maybe they will still change over time anyway as Zig evolves. It’s interesting to know that gpa here refers to the usage pattern of the allocator (allocating and freeing out of order).

Well, I’m also asking because I wonder if it is correct to pass multiline messages to the logging system. Maybe the interface doesn’t really define whether it’s okay or not. :thinking:

I overall wonder if for a stand alone command line application, using std.log.err is the best way to go, or if I should instead write directly to stderr (buffered or unbuffered) as I did in my first attempt.

1 Like

Multine logs are fine.

std.log is usually the way to go. You can enable and disable it as necessary, which is handy.
Writing to srderr is usually only necessary if you need to build your message piecewise. std.log can’t do this because it prints a header and a new line every time you call it.