A derived global "Diagnostics" pattern

npc1054657282 · July 24, 2025, 3:46am

Recently, I’m learning the “Diagnostics” pattern through std.tar. I want to use Diagnostics as my structured error handling paradigm.
To be honest, I am not fully satisfied with this implementation. I think it has the following problems:
1.Enabling or not enabling Diagnostics will cause changes in control flow other than error handling. When Diagnostics is enabled, some control flows that would have returned immediately due to an error will continue to execute.
2.After enabling Diagnostics, errors that would have been thrown, such as TarComponentsOutsideStrippedPrefix, will no longer be thrown, but will be implicitly hidden in Diagnostics, and the caller cannot determine whether an error has occurred based on the error union. Now only “double error” (errors that occur during diagnostics) will be thrown.
3. Error codes are for control flow.I agree with this statement, and further, I also think that “error” should always be the center of error control flow, rather than other things, such as an enumTag that has nothing to do with error codes. The current Diagnostics pattern’s control flow center has changed from the error type to a tagged union tag. This is what I am particularly dissatisfied with.
These problems lead to the “Diagnostics” pattern being used for small-scale local error diagnosis, and not a substitute for global, holistic, structural errors. But in practice, the need for errors to carry context may be far greater than you think. Therefore, I tried to make a derived global “Diagnostics” pattern.
In this pattern, the last_diagnostic pointer and the thrown error code together form the latest error context. If you want to override the underlying error with an upper-layer error that carries upper-layer information, you only need to push the captured underlying error together with its last_diagnostic, then rewrite last_diagnostic and throw a new error.
Tagged union is no longer used, because “error” is the only core of error control flow, so error code + ordinary union is used, supplemented by reflection between union field name and error name. Unfortunately, I have not found a way to convert errors through strings at compile time, nor have I found a solution to traverse errors in the error set at compile time (in this way, I can create a “diagnosable” error set to implement compile-time reflection), so currently, this reflection still relies on runtime string comparison, which deviates from the “error code” being the only core of error control flow. Now “error name” participates in controlling the control flow and has some impact on runtime performance.
Note: Use std.debug.print only to make the test pass, and actually use std.log.err

const std = @import("std");

pub const Diagnostics = struct {
    arena: std.heap.ArenaAllocator,
    error_stack: std.ArrayListUnmanaged(Error) = .empty,
    last_diagnostic: Diagnostic = undefined,
    double_error: ?anyerror = null,
    pub const Error = struct {
        code: anyerror,
        diagnostic: Diagnostic,
    };
    pub fn clear(self: *Diagnostics) void {
        _ = self.arena.reset(.free_all);
        self.error_stack = .empty;
        self.last_diagnostic = undefined;
        self.double_error = null;
    }
    pub fn log_all(self: *Diagnostics, last_error: ?anyerror) void {
        if (last_error) |err| {
            if (self.double_error) |double_error| {
                std.debug.print("double error!{s}", .{@errorName(double_error)});
            }
            self.last_diagnostic.log(err);
            var it = std.mem.reverseIterator(self.error_stack.items);
            while (it.nextPtr()) |item| {
                item.diagnostic.log(item.code);
            }
            if (@errorReturnTrace()) |trace| {
                std.debug.dumpStackTrace(trace.*);
            }
        } else return;
    }
};

pub const Diagnostic = union {
    TarUnableToCreateSymLink: struct {
        file_name: []const u8,
        link_name: []const u8,
        fn log(self: @This()) void {
            std.debug.print("file_name: {s} link_name: {s}\n", .{ self.file_name, self.link_name });
        }
    },
    TarComponentsOutsideStrippedPrefix: struct {
        file_name: []const u8,
        fn log(self: @This()) void {
            std.debug.print("file_name: {s}\n", .{self.file_name});
        }
    },
    pub fn enterStack(last_diagnostic: *@This(), last_error: anyerror) !void {
        var diagnostics: *Diagnostics = @fieldParentPtr("last_diagnostic", last_diagnostic);
        if (diagnostics.double_error) |_| {
            return last_error;
        }
        diagnostics.error_stack.append(diagnostics.arena.allocator(), .{ .code = last_error, .diagnostic = last_diagnostic.* }) catch |double_error| {
            diagnostics.double_error = double_error;
            return last_error;
        };
        last_diagnostic.* = undefined;
    }
    pub fn getAllocator(last_diagnostic: *@This()) std.mem.Allocator {
        const diagnostics: *Diagnostics = @fieldParentPtr("last_diagnostic", last_diagnostic);
        return diagnostics.arena.allocator();
    }
    pub fn unableToConstructDiagnostic(last_diagnostic: *@This(), err: anyerror) !void {
        const diagnostics: *Diagnostics = @fieldParentPtr("last_diagnostic", last_diagnostic);
        diagnostics.double_error = err;
        return error.UnableToConstructDiagnostic;
    }

    pub fn log(self: *Diagnostic, err: anyerror) void {
        inline for (@typeInfo(Diagnostic).@"union".fields) |field| {
            if (std.mem.eql(u8, @errorName(err), field.name)) {
                if (@hasDecl(@FieldType(Diagnostic, field.name), "log")) {
                    @field(self, field.name).log();
                } else {
                    std.debug.print("{s}:{}\n", .{ field.name, @field(self, field.name) });
                }
                return;
            }
        }
        std.debug.print("{s}\n", .{@errorName(err)});
    }
};

// Creates a symbolic link at path `file_name` which points to `link_name`.
fn createDirAndSymlink(dir: std.fs.Dir, link_name: []const u8, file_name: []const u8) !void {
    dir.symLink(link_name, file_name, .{}) catch |err| {
        if (err == error.FileNotFound) {
            if (std.fs.path.dirname(file_name)) |dir_name| {
                try dir.makePath(dir_name);
                return try dir.symLink(link_name, file_name, .{});
            }
        }
        return err;
    };
}

fn stripComponents(path: []const u8, count: u32) []const u8 {
    var i: usize = 0;
    var c = count;
    while (c > 0) : (c -= 1) {
        if (std.mem.indexOfScalarPos(u8, path, i, '/')) |pos| {
            i = pos + 1;
        } else {
            i = path.len;
            break;
        }
    }
    return path[i..];
}

fn foo(last_diagnostic: *Diagnostic) !void {
    const path_names = [_][]const u8{ "hello", "world" };
    for (path_names) |path_name| {
        const file_name = stripComponents(path_name, 1);
        if (file_name.len == 0) {
            last_diagnostic.* = .{ .TarComponentsOutsideStrippedPrefix = .{
                .file_name = last_diagnostic.getAllocator().dupe(u8, file_name) catch |err| {
                    return last_diagnostic.unableToConstructDiagnostic(err);
                },
            } };
            return error.TarComponentsOutsideStrippedPrefix;
        }
        var root = std.testing.tmpDir(.{});
        defer root.cleanup();
        const link_name = "link";
        createDirAndSymlink(root.dir, link_name, file_name) catch |err| {
            try last_diagnostic.enterStack(err);
            last_diagnostic.* = .{ .TarUnableToCreateSymLink = .{
                .file_name = last_diagnostic.getAllocator().dupe(u8, file_name) catch |e| {
                    return last_diagnostic.unableToConstructDiagnostic(e);
                },
                .link_name = last_diagnostic.getAllocator().dupe(u8, link_name) catch |e| {
                    return last_diagnostic.unableToConstructDiagnostic(e);
                },
            } };
            return error.TarUnableToCreateSymLink;
        };
    }
}

fn bar(last_diagnostic: *Diagnostic) !void {
    const path_names = [_][]const u8{ "hello/world", "world/hello" };
    for (path_names) |path_name| {
        const file_name = stripComponents(path_name, 1);
        if (file_name.len == 0) {
            last_diagnostic.* = .{ .TarComponentsOutsideStrippedPrefix = .{
                .file_name = last_diagnostic.getAllocator().dupe(u8, file_name) catch |err| {
                    return last_diagnostic.unableToConstructDiagnostic(err);
                },
            } };
            return error.TarComponentsOutsideStrippedPrefix;
        }
        var root = std.testing.tmpDir(.{});
        defer root.cleanup();
        const link_name = "link";
        createDirAndSymlink(root.dir, link_name, file_name) catch |err| {
            try last_diagnostic.enterStack(err);
            last_diagnostic.* = .{ .TarUnableToCreateSymLink = .{
                .file_name = last_diagnostic.getAllocator().dupe(u8, file_name) catch |e| {
                    return last_diagnostic.unableToConstructDiagnostic(e);
                },
                .link_name = last_diagnostic.getAllocator().dupe(u8, link_name) catch |e| {
                    return last_diagnostic.unableToConstructDiagnostic(e);
                },
            } };
            return error.TarUnableToCreateSymLink;
        };
    }
}

fn parse_csv(last_diagnostic: *Diagnostic) !void {
    // now in the middle of this mess we have a int parsing error.
    try last_diagnostic.enterStack(error.InvalidCharacter);
    return error.ParseIntError;
}

fn baz(last_diagnostic: *Diagnostic) !void {
    parse_csv(last_diagnostic) catch |err| {
        try last_diagnostic.enterStack(err);
        return error.ParseCsvFailed;
    };
}

test "new diagnostics" {
    const root_allocator = std.testing.allocator;
    var arena = std.heap.ArenaAllocator.init(root_allocator);
    defer arena.deinit();
    var diagnostics: Diagnostics = .{ .arena = arena };
    foo(&diagnostics.last_diagnostic) catch |err| {
        diagnostics.log_all(err);
        diagnostics.clear();
    };
    bar(&diagnostics.last_diagnostic) catch |err| {
        diagnostics.log_all(err);
        diagnostics.clear();
    };
    baz(&diagnostics.last_diagnostic) catch |err| {
        diagnostics.log_all(err);
        diagnostics.clear();
    };
}

andrewrk · July 24, 2025, 6:48am

Can you share your particular use case? Unfortunately, while it generally gets the idea, I’m not sure I’d recommend std.tar as the perfect example of the Diagnostics pattern.

I’d start by setting some flags in your context struct. If that’s not enough, then appending some enum tags to a list. If that’s not enough, then appending tagged unions to a list that provide context to the enum tags. If that’s not enough, then full blown formatted strings could be acceptable.

Generally, we see people (myself included, in the Zig compiler for example), skipping these steps because ultimately we want to communicate via human languages to the user. But a disciplined approach will expose the semantic diagnostic information with an appropriately minimal state representation to the API user, which will then have the opportunity to convert semantic meaning into human language (possibly while taking into account locale and translation features).

vulpesx · July 24, 2025, 6:56am

when using zigs method call syntax sugar, the instance will be implicitly referenced if the function takes a pointer to it.
use a defined error set if possible instead of anyerror,
you are storing redundant information with last_diagnostic and double_error
why are you storing an allocator? Is it not the same as what the caller would have passed? If not, wouldn’t it be pointing to stack memory? If not, how would you free the heap allocated implementation?
what’s the point of the empty variant since the lack of an error could indicate that.
why have print functions? You don’t know where the api is going to be used, if someone wants anything other than blocking unbuffered stderr printing they have to reimplement it anyway.

That being your concerns with the existing implementation seem valid.

I think std.json provides a much simpler and better diagnostic implementation, though it aborts immediately on an error that ofc simplifies the implementation.

npc1054657282 · July 24, 2025, 7:26am

Consider a complex scenario that may occur in practice. The main process is a ruthless task processor that processes tasks from the task queue in a loop. If an error occurs, it packages its diagnostic information into a log and then continues to process other tasks without stopping. The top-level error is “FetchResourceFailed”, “AssetCookingFailed”, e.g., which marks which task the error comes from, and the diagnostics information at this level contains metadata about the task, such as who published it and to whom the log should be given. Among them, for “AssetCookingFailed”, its next-level error may be “TextureCompressionFailed”, “ShaderCompilationError”, e.g. For “TextureCompressionFailed”, it carries information about which texture resource needs to be compressed and what the compression mode is. Its next-level error may be “TextureTooLarge”, “TextureInvalidFormat”, “TextureDecodingFailed”. For “TextureDecodingFailed”, it carries information about where the decoding error occurred in the texture resource, and its underlying error may be OutOfMemory.

It is certainly possible to use overlapping unions to record each layer of information in this process, but I think it is easier to express it using the idea of a stack.

npc1054657282 · July 24, 2025, 7:49am

last_diagnostic and double_error may exist at the same time. Consider such a scenario: when pushing last_diagnostic and the latest thrown error into the stack, due to OOM, ArrayListUnmanaged fails to expand, so the stacking process is aborted. Therefore, it is necessary to always keep its original information in last_diagnostic, and the currently thrown error always corresponds to last_diagnostic. At this time, double_error is responsible for saving the error that caused the stacking behavior to be aborted.
Because last_diagnostic must be pushed into the stack before saving a new diagnostic, and the existence of double_error is always checked when pushing the stack, and if it exists, it is thrown directly immediately, so when double_error occurs, the current diagnosis will abandon the upper layer error and throw it all the way to the top layer. When performing diagnostic processing at the top layer, first check double_error to make sure it has not occurred, and then perform regular diagnosis.

The purpose of passing in the allocator was to mimic the behavior of std.tar, which saves an independent allocator for the memory allocation and release of its internal ArrayListUnmanaged. Later, I noticed that some lower-level work may indeed use a temporary local allocator (such as the arena allocator), and when it constructs diagnostic information, if it uses the task’s own allocator to construct it, since these diagnostic information will eventually be handed over to the top layer, and at this time the underlying memory allocator may have disappeared, it is necessary for the diagnostic to hold a dedicated allocator.

Initially, I considered such a scenario: the bottom-level error of our function is an independent error code without diagnostic information. Or in the middle of the function, it is indeed possible to overload the bottom-level error with diagnostic information to an error without diagnostic information. At this time, I feel that setting last_diagnostic to empty is the most natural. But now I reflect on your opinion, it does make sense, undefined is enough, specifying it as empty does not have any substantial effect except readability.

You are right, print here is just a somewhat lame example to express “general processing”. I just want to use the print example to express how a general processing can be done under this pattern, but the print example is indeed not very good.

andrewrk · July 24, 2025, 7:56am

In this case I think the “Diagnostics” pattern does not apply. Instead you are batch processing and should return batch processing results.

vulpesx · July 24, 2025, 7:59am

Sorry, I wasn’t clear, I meant both of them can be found by looking at error_stack. last_diagnostic is completely redundant, and the cost of checking for a double_error every time you add a diagnostic is more expensive than the caller checking once if they need it.

npc1054657282 · July 24, 2025, 8:06am

In general, the goal of “Diagnostics” here is more to save all error contexts of the entire control flow, rather than to handle them (of course, some relatively simple default handling solutions can be implemented under this pattern), so whether it is batch processing or not is relatively irrelevant. Whether it is batch processing or not, there is a need to save the control flow error context, and whether it is batch processing or not determines how they handle these contexts at a certain key node.

npc1054657282 · July 24, 2025, 8:17am

When the diagnostic is saved in last_diagnostic, it has not been pushed on the stack. It is pushed on the stack when a new last_diagnostic overwrites the original last_diagnostic. There is an overhead of ArrayListUnmanaged allocating heap memory for stacking, so this can be regarded as a lazy mechanism. Logically, last_diagnostic is always considered as a whole with the currently thrown error, and when it enters the stack, it is pushed on the stack together with the captured error.

I admit that when it is not a “happy path”, it takes some extra overhead to push the error on the stack each time, but this is after all based on the premise that an error has occurred. In contrast to the Diagnostics implemented in std.tar, even in the happy path, when no error is thrown, the upper layer needs to check whether there is an error in Diagnostics, because if no double error occurs, it will not throw an error regardless of whether there is an error or not, which results in the need to check whether Diagnostics is wrong even if there is no error. I think the happy path overhead is worse than the error path overhead.

vulpesx · July 24, 2025, 8:38am

that makes sense, I was confused why you were storing such information.
With that, last_diagnostic also makes sense.

I am not familiar enough with tar to question if that behaviour should be changed or not.