Diagnostics Factory

11 Likes

Something I’ve been wondering about how to do in Zig is create a library with diagnostics that’s supposed to be used in different contexts. For example, say I have both a CLI and a server with a common dependency on my library foo.

One way to write this is to say ā€œfoo should be agnostic to which context it’s used inā€, in which case it probably shouldn’t be doing any of the error reporting e.g. it shouldn’t just write to stderr because in the server context that could be mixed in with logs unintentionally, or if logs are emitted to specific files the error could be lost entirely by writing to stderr.

Another way to structure this would be to have a build-time config, optionally populated Writer like what’s shown in the post, etc.

In general I’m so used to conflating error handling with error reporting that it’s taking some effort to pick them apart :sweat_smile:

@zmitchell If you just want to log, you can customize logFn at comptime. In matklad’s post, I’m not sure if the Errors struct really needs to have a custom emit fn, since that’s just what logFn already is. (I could be wrong about a detail here.)

If you need diagnostics info to possibly handle errors, not just log, and the place where you handle is a few layers removed from the source of the error, you may want to check out my diagnostics gist - it has some sample code in tests which explain the idea pretty well. (Mind you, Andy and matklad don’t seem to approve of this pattern.)

1 Like
fn emit(
    errors: *Errors,
    comptime fmt: []const u8,
    args: anytype,
) void {
    comptime assert(fmt[fmt.len - 1] == '\n');

This seems kind of stern, no?

Yes, we can use comptime to compel the computer to make us do work, but why not use it to compel the computer to work for us?

fn lastIdx(fmt: []const u8) usize {
    var last = fmt.len - 1;
    while (last > 0 and fmt[last] == '\n') : (last -= 1) {}
    return last + 1;
}

// later

    const fmt = format[0..comptime lastIdx(format)];
    try writer.print(fmt ++ '\n', args);

Or just

const fmt = if (comptime format[format.len - 1] == '\n')
    format
else
    format ++ "\n";

The first version was taken from a logging suite which sometimes uses blank lines for spacing things out, so part of the brief is to normalize several \n to exactly one.

I’ve also been mulling over a way to synthesize diagnostics with orthodox Zig control flow, in a logging-inspired way. ā€œMake a scoped logger, log errors copiously, return error unionsā€ is pretty close to the sweet spot, but you end up with a string, which is less flexible than I’d like.

I don’t like returning an error with a payload, for all the usual reasons, and I like the way that logging uses compile-time overrides to specialize the far side of the interface into whatever you want it to be. What I want is some kind of way for library code to shove any extra information in a usable form into a bag of holding, so that user code can a) choose to make the bag of holding real and b) reach in and retrieve error stuff, sometimes, or just have it get logged automatically, or what have you. An anytype tuple, like args, is not quite the thing, I’d like the error to imply a type known in advance.

Basically an error union ā€˜pattern’, but the payload gets shoved sideways instead of up, and the error itself serves double-duty as a way of alerting user code that there’s more available. But in a way which compiles out completely if that code doesn’t take advantage of said opportunity, as logging does (or, at least, is supposed to).

Anyway. I think there’s something to this notion, but I haven’t figured out how to make it nice.

1 Like

Hear me out, just because args is anytype doesn’t mean it can’t be type checked. You could make a collector logFn which, at compile time, tries to pull a tag out from the args according to some conventional field name, then tries to stash the args data in a type checked place determined by the tag. If the collector sees an unrecognized tag or if the type of the storage area doesn’t match you get a compile error.

That seems to be pretty much the best version of what you want (let’s not bring external code gen into the discussion.)

But now we have other issues/questions:

  • Is this an abuse of the logging system? (yes, but a similar non log interface could be added.)
  • How to manage memory of the collector?
  • Multiple threads?
  • How does code pull data from this collector? Maybe it can expose a slice of tagged extern struct or something.
  • How does code associate a call with data in the collector?
  • Who clears the data, the caller or the library? and when?

Error messages can be one of two flavors:

  • Either it is a phrase, intended to be decorated by prepending context:, like in Go:

    tracking parcel location: fetching order status: connecting to the DB
    
  • Or it is a multiline error message which is either not decorated, or is preceded by another multiline message.

Forcing \n at the call-site to emit makes it clear to the reader that this is the second case, without needing to consult the emit implementation.

That’s also the same reason why I use @import("./foo.zig"); rather than @import("foo.zig") — there, ./ is an obvious marker of a local path which makes it clear that there’s no any kind of lookup path involved.


There’s also a related micro-shift in my programming style at TigerBeetle, where we do a tone of assertions. Before TB, I used ā€œdefensive programmingā€ style where I tried to make my functions behave reasonably under all circumstances, returning some kind of monoid neutral value if appropriate. With assertions, I often find it more advantageous to just assert that function input is non-generate, and force an if check at the call-site, to make it clearer to the reader what is happening.

2 Likes

I was quite sure you (pl.) had your reasons; my reply was half bait to draw them out (success!), and half a thin excuse to show off a bit of loosely-related comptimery which I’ve been pleased with.

I do something similar with non-terminated and terminated error messages, which is one of the reasons why when errors end up in the log, instead of a panic, I need to normalize them. Horses for courses.

It’s not so much about type checking, which always happens in Zig, so much as making the payloads pleasant to use. If it’s a big hassle then no one will be inclined to use it. I have some hand-wavey notions about registering error type / payload duals but there are issues with retrieval anyway, because you can’t anytype a return value. That property also means that it’s easy to shove an arg-style anytype blob into the diagnostics hammerspace, but retrieving it is not so simple.

There are also problems with comptime global state, unfortunately.

None of this has gotten out of the hand-waving stage, I just think it has potential. I like the balance it strikes: the error itself, code has to deal with, the bundle of possibly-useful information is opt-in, with some control over default behavior when it isn’t wanted, including (and this is important to me) no runtime cost whatsoever if the decision is ā€œdo nothing at all with the payloadā€.

Why is this important to you?

An error should be an unlikely event. Given that, a little bit of overhead (even allocating) shouldn’t matter, at least unless you assume that it will be abused for DOS attacks against your program.

The heavy lifting is usually the formatting for humans, which you’ll need anyway somewhere.

Because paying for what you don’t use is unsatisfactory. At zero overhead, people like to use it, at epsilon overhead, they wonder ā€œhmm well am I / is someone else really going to use thisā€.

Economists have an aphorism, ā€œfree is not a priceā€. There’s a qualitative distinction which affects people’s behavior, whether or not it should.

1 Like

This seems particularly true for the Zig community, and this is a bit concerning.

To sacrifice error diagnostics for the last quantum of performance is a very bad choice, at least for professional software.

The time spent by humans for debugging an error in production systems is so incredible much more expensive than CPU time.

We’re talking about error diagnostics here, not about debugging messages (which are more often that not unhelpful and superfluous in my experience).