Recommended ways of handling errors across TU boundaries

cancername · August 12, 2024, 2:07am

I am building an extensible Zig project that heavily relies on runtime interfaces which are intended to work across translation unit boundaries. My concern is error handling. I am in agreement with Zig’s approach to error handling: control flow first, optional diagnostics separately. Ordinarily, Zig’s error sets and unions would be great for this, but they are not reliable across TU boundaries. Here are some approaches I’ve thought of, with upsides and downsides:

Enum

The simple way of doing it: just replicate Zig errors in userspace:

const Error = enum(u16) {
    foo,
    bar,
    baz,
};

Pros

very fast
simple
in line with existing Zig method
no dynamic allocation

Cons

not extensible

String table

Use freeform strings as errors:

const Errors = struct {
    map: [1 << 16]?[]const u8,
    nb_errors: u16 = 0,

    pub fn createError(errors: *Errors, error_desc: []const u8) u16 {
        const index: usize = if (errors.nb_errors != std.math.maxInt(u16))
            errors.nb_errors
        else
            findFirstNull(&errors.map, ) orelse @panic("skill issue");

        errors.map[index] = error_desc;
        defer errors.nb_errors +|= 1;
        return index;
    }

    pub fn getErrorDesc(errors: *Errors, value: u16) []const u8 {
        if (value >= errors.nb_errors) @panic("skill issue");
        defer errors.cleanup();
        defer errors.map[value] = null;
        return errors.map[value] orelse @panic("skill issue");
    }

    fn cleanup(errors: *Errors) void {
        var index = errors.nb_errors - 1;
        while (index != 0 and errors.map[index] == null) : (index -= 1) {}
        errors.nb_errors = index + 1;
    }
};

Pros

extensible
reasonably fast

Cons

either wastes a bunch of memory or requires dynamic allocation
somewhat complex
(semi-global) state

Error IDs

Using small arrays as errors:

const Error = [4]u8;

Pros

no dynamic allocation
fast
pretty simple

Cons

not very readable

I’m not quite satisfied with any of these approaches. Any suggestions?

squeek502 · August 12, 2024, 2:34am

What does “not extensible” mean?

My first thought would be to just go with what’s often done in C: an enum + a function that takes an enum value and returns a string literal (example from a C library I wrote: the enum and the function).

LucasSantos91 · August 12, 2024, 2:45am

What do you mean by this? Are you referring to error name clashes?

AndrewCodeDev · August 12, 2024, 3:03am

Minor point here about the performance issues related to dynamic allocations.

How often do you expect to be hitting errors? If it’s a “once in a while” problem, then I’m curious about whether or not you can actually afford to dynamically allocate. I also have a reflexive allergic reaction to overusing dynamic memory… but context matters.

Can you provide some context about how often you expect to hit these errors and how much information needs to be gathered/transported when you have an error?

For example - maybe the most useful thing is to log information when you hit an error instead of handing it back to the program to deal with. If it’s something like parsing an integer and someone includes the letter q in their string, that may not require as much information as a malformed header in an HTTP request or a failure to authorize a request.

cancername · August 12, 2024, 7:56am

Yes, exactly. Error sets aren’t stable across different translation units, even if they have the same names, their order isn’t defined.

Sze · August 12, 2024, 8:03am

You write for your variant Enum that it isn’t extensible, in what way would that be different from defining an explicit hard coded (closed to extension) errorset and then using that everywhere?

Something like:

const PluginErrors = error {
    InitFailure,
    LoadFailure,
    ReadFailed,
    RecompilePlugin,
    CustomError, // writes error to some predetermined buffer/memory or logfile
    OutOfMemory,
};

const Plugin = struct {
    pub fn init() PluginError!Plugin {
        return error.InitFailure;
    }
};

Specific error values could signal to the error handling end that extra diagnostics got written to somewhere else.

cancername · August 12, 2024, 8:06am

I appreciate the suggestion, but unfortunately, this wouldn’t work in this case, since one goal of mine is extensibility, meaning the library user should be able to load a shared library and use its functionality with the library seamlessly.

cancername · August 12, 2024, 8:07am

If I’m not mistaken, that wouldn’t work, since the values in the error set aren’t guaranteed to have specific values, right? Casting to anyerror doesn’t do some complicated name-based lookup, instead, the errors are just assigned integers from 1 “sequentially” (which depends on the other stuff in the TU).

cancername · August 12, 2024, 8:14am

I agree, that’s important to consider. In my case, the library would be parsing user-provided data, so errors would happen somewhat frequently, and handling them gracefully matters.

Generally, I’m fine with logging more granular information about the error to some kind of diagnostics struct, this would just indicate the kind of failure, for example, which parsing step failed.

Sze · August 12, 2024, 8:15am

I was thinking that there was some way to define a specific error set across multiple translation units, but now I can’t find something definite for that.

Almost seems like there should be some way to export and add error set definitions from and to compile steps.