Recommended ways of handling errors across TU boundaries

I am building an extensible Zig project that heavily relies on runtime interfaces which are intended to work across translation unit boundaries. My concern is error handling. I am in agreement with Zig’s approach to error handling: control flow first, optional diagnostics separately. Ordinarily, Zig’s error sets and unions would be great for this, but they are not reliable across TU boundaries. Here are some approaches I’ve thought of, with upsides and downsides:

Enum

The simple way of doing it: just replicate Zig errors in userspace:

const Error = enum(u16) {
    foo,
    bar,
    baz,
};

Pros

  • very fast
  • simple
  • in line with existing Zig method
  • no dynamic allocation

Cons

  • not extensible

String table

Use freeform strings as errors:

const Errors = struct {
    map: [1 << 16]?[]const u8,
    nb_errors: u16 = 0,

    pub fn createError(errors: *Errors, error_desc: []const u8) u16 {
        const index: usize = if (errors.nb_errors != std.math.maxInt(u16))
            errors.nb_errors
        else
            findFirstNull(&errors.map, ) orelse @panic("skill issue");

        errors.map[index] = error_desc;
        defer errors.nb_errors +|= 1;
        return index;
    }

    pub fn getErrorDesc(errors: *Errors, value: u16) []const u8 {
        if (value >= errors.nb_errors) @panic("skill issue");
        defer errors.cleanup();
        defer errors.map[value] = null;
        return errors.map[value] orelse @panic("skill issue");
    }

    fn cleanup(errors: *Errors) void {
        var index = errors.nb_errors - 1;
        while (index != 0 and errors.map[index] == null) : (index -= 1) {}
        errors.nb_errors = index + 1;
    }
};

Pros

  • extensible
  • reasonably fast

Cons

  • either wastes a bunch of memory or requires dynamic allocation
  • somewhat complex
  • (semi-global) state

Error IDs

Using small arrays as errors:

const Error = [4]u8;

Pros

  • no dynamic allocation
  • fast
  • pretty simple

Cons

  • not very readable

I’m not quite satisfied with any of these approaches. Any suggestions?

What does “not extensible” mean?

My first thought would be to just go with what’s often done in C: an enum + a function that takes an enum value and returns a string literal (example from a C library I wrote: the enum and the function).

What do you mean by this? Are you referring to error name clashes?

Minor point here about the performance issues related to dynamic allocations.

How often do you expect to be hitting errors? If it’s a “once in a while” problem, then I’m curious about whether or not you can actually afford to dynamically allocate. I also have a reflexive allergic reaction to overusing dynamic memory… but context matters.

Can you provide some context about how often you expect to hit these errors and how much information needs to be gathered/transported when you have an error?

For example - maybe the most useful thing is to log information when you hit an error instead of handing it back to the program to deal with. If it’s something like parsing an integer and someone includes the letter q in their string, that may not require as much information as a malformed header in an HTTP request or a failure to authorize a request.

Yes, exactly. Error sets aren’t stable across different translation units, even if they have the same names, their order isn’t defined.

1 Like

You write for your variant Enum that it isn’t extensible, in what way would that be different from defining an explicit hard coded (closed to extension) errorset and then using that everywhere?

Something like:

const PluginErrors = error {
    InitFailure,
    LoadFailure,
    ReadFailed,
    RecompilePlugin,
    CustomError, // writes error to some predetermined buffer/memory or logfile
    OutOfMemory,
};

const Plugin = struct {
    pub fn init() PluginError!Plugin {
        return error.InitFailure;
    }
};

Specific error values could signal to the error handling end that extra diagnostics got written to somewhere else.

I appreciate the suggestion, but unfortunately, this wouldn’t work in this case, since one goal of mine is extensibility, meaning the library user should be able to load a shared library and use its functionality with the library seamlessly.

If I’m not mistaken, that wouldn’t work, since the values in the error set aren’t guaranteed to have specific values, right? Casting to anyerror doesn’t do some complicated name-based lookup, instead, the errors are just assigned integers from 1 “sequentially” (which depends on the other stuff in the TU).

I agree, that’s important to consider. In my case, the library would be parsing user-provided data, so errors would happen somewhat frequently, and handling them gracefully matters.

Generally, I’m fine with logging more granular information about the error to some kind of diagnostics struct, this would just indicate the kind of failure, for example, which parsing step failed.

I was thinking that there was some way to define a specific error set across multiple translation units, but now I can’t find something definite for that.

Almost seems like there should be some way to export and add error set definitions from and to compile steps.

2 Likes