Using ComptimeStringMap for Errors with Payloads

AndrewCodeDev · March 19, 2024, 10:08am

I was genuinely surprised to see that this compiled, but it does… you can use error names as a handle to lookup values in comptime string maps. That basically implies that you can look them up anywhere you want at any point and provide as much additional information as you’d like (including functions pointers), so I’m curious how far we can take this.

Here’s the setup:

const std = @import("std");

const ErrorEntry = struct {
    message: []const u8,
    // other stuff...
};

const Errors = error {
    A, // has additional info
    B, // no additional info
};

const error_map = std.ComptimeStringMap(
    ErrorEntry, .{
    .{ @errorName(Errors.A), ErrorEntry{ .message = "Error Message A" } },
});

And you can retrieve your error like this:

    foo() catch |e| {
        if (error_map.get(@errorName(e))) |err| {
            std.log.err("{s}", .{ err.message });
            // maybe handle error further...
        } else {
            return e;
        }
     }

This is cross-cutting because different error sets don’t really matter here - it’s the names that get looked up. That has an issue of potential name collision, but it allows errors to become handles and still work with the try/catch syntax that union types don’t.

Lookup can fail harmlessly, so not every error needs to be accounted for and we don’t lose functionality.

AndrewCodeDev · March 19, 2024, 10:29am

This technique works with runtime maps, too - there, you could actually set values which could be kinda handy.

tensorush · March 19, 2024, 11:15am

Nice, although for runtime maps it’d probably be nicer to put payloads by error value directly.

const std = @import("std");

const Error = error{
    Value1,
    Value2,
};

const ErrorPayload = struct {
    message: []const u8,
    code: u8 = 1,
};

test ErrorPayload {
    var err_map = std.AutoHashMap(Error, ErrorPayload).init(std.testing.allocator);
    defer err_map.deinit();

    try err_map.put(error.Value1, .{ .message = "Error!" });
    try err_map.put(error.Value2, .{ .message = "Another error!" });

    std.debug.print("{s}\n", .{err_map.get(error.Value1).?.message});
    std.debug.print("{s}\n", .{err_map.get(error.Value2).?.message});
}

Sze · March 19, 2024, 2:32pm

I guess to disable payloads you could have a build option that toggles between a real implementation and a dummy one where put does nothing and get always returns null.

AndrewCodeDev · March 19, 2024, 3:28pm

Exactly - so granted you’d have to fill a runtime map a bit differently (probably would want a helper function, etc). The one thing I had not considered here is this line:

try err_map.put(error.Value1, .{ .message = "Error!" });

I take it that you’re getting the values out of the global error set here? I haven’t tried this approach - I seem to recall that errors in the global set can have surprising integer values, so I’d like to see how that turns out.

For a project, I’d choose to go with either strings or the error value directly so that all your error maps have the same lookup type (you could have a mixture of maps where certain errors are lethal at some points, some could be static maps or dynamic like you’ve presented here). Small detail, but ya know

tauoverpi · March 19, 2024, 5:13pm

This isn’t a good use of it though as you’re making a round-trip through []const u8 for what is effectively declaring a new error set and converting it to an enum. This is not a good pattern at all.

AndrewCodeDev · March 19, 2024, 5:49pm

Well

The example of []const u8 is really just a toy example. You could have multiple maps with anything, really. In other words, you could tie functions to errors and in different contexts, add them in one place, and that would get picked up in multiple locations without having to add new cases to switch statements.

Additionally, you could have multiple maps. It’s not just about []const u8, it’s about begin able to extend what functionality an error can reach while still working with the native keywords. Essentially, whatever you can do with a hashtable, you can do here. Also, enums don’t work with the current try syntax.

If we were just talking about getting a string message, I’d agree - it’s a lot of machinery just for that one string. That’s why I’ve put this under brainstorming. I’d like to see if there’s more we can do.

Here’s an example - say you have an OOM error. I dug up an old version of tcmalloc that was being used by github in the early days. If they failed, they’d try to clear a global cache, try again, and then consider it a hard error if the second attempt failed too.

You can get this same behavior here but under multiple contexts - you could attach an optional function pointer that’s able to clear a cache, for instance - then you’d get that everywhere you use this pattern (which is optional too, you could just forward the error using try) without having to add more “if this / switch that” logic.

Sze · March 19, 2024, 6:01pm

I think error.Value1 gets coerced to Error, the key type in std.AutoHashMap(Error, ErrorPayload) so I think it is the explicit error set named Error.

Do you mean something else?

AndrewCodeDev · March 19, 2024, 6:09pm

Ah, it’s just interesting to see it coming from the error instead of Error but hey, I’ll take it lol. That’s one of the reasons I like the tag-name idea because we can have a situation like the following:

pub fn foo(something: bool) !void { // probably the most common way

    if (something) {
        return MyErrors.A;
    } else {
        return MyOtherErrors.B;
    }
}

So that one function can return both but they come from different error sets. The name is one way to get around this (since both names are one type, you can use a single map for both), but I’m curious if there’s a more generic way to retrieve these besides going to strings.

@Sze, I’ve seem to recall there’s some kind of syntax for making a parameter an enum_literal… I forget where I saw that… I wonder if this could apply here?

Sze · March 19, 2024, 6:48pm

I am not sure I understand your intent.

const MyErrors = error{A};
const MyOtherErrors = error{B};
const BiggerErrorSet = MyErrors || MyOtherErrors;

pub fn foo(something: bool) BiggerErrorSet!void {
    if (something) {
        return MyErrors.A;
    } else {
        return MyOtherErrors.B;
    }
}

The way I saw it was that error.A is just a way to let the compiler infer the type.
In both cases the actual returned type is BiggerErrorSet just that you can let that be inferred as well if you don’t specify it. If you specify anyerror then it becomes the global error set.

Do you want a function to return two separate error sets, instead of it merging into one bigger set? I am a bit lost.

When you declare a comptime Arg: enum{A} parameter you can call that with a .A enum literal, but it also could be comptime var or a constant passed to it.

Well I think you can check it with @typeInfo/comptime whether it is a literal or an enum, so I guess you could assert that it is a literal using comptime, I am just not sure why it should specifically be a literal.

AndrewCodeDev · March 19, 2024, 6:54pm

True, you could compose up a larger error set - that’s one way to solve the issue. So long as the type in your map is expansive enough to check anything handed to it, then that works.

That’s basically what attracted me to comptime string map - since we’re dealing with strings, you can kind-of look past the types of any error individually and just go to it by name. That keeps the map very generic but I also like the strongly typed approach - plenty of good options here, it seems.

Sze · March 19, 2024, 7:16pm

Yes it seems like they are different valid possibilities for different usage scenarios, I think using anyerror would be the most generic equivalent at runtime to the comptime-string-map at comptime.

I think another thing that would need to be considered is what else besides ComptimeStringMap and AutoHashMap could be used, basically you just need some kind of mapping that allows you to implement a get operation and maybe a put if it’s used.

Basically this allows you to create arbitrary error response “handlers” through associating the error value with something else. Then you could switch between different maps to for example switch between logging to a file, panic on error, or trigger @breakpoint in a debug session etc.

It seems like this could be a nice way to provide users with ways in which they can customize what should happen.

I think comptime parameters or build options are alternative ways to allow similar customization, but they have different tradeoffs / use-cases.

tauoverpi · March 20, 2024, 7:24pm

This is effctively solved by Support error sets in switch cases · Issue #2473 · ziglang/zig · GitHub. Adding additional maps only adds more complexity to error handling, obscures code, and introduces more points of failure. Sometimes friction is an indication that the current path may not be the best one.

The ComptimeStringMap(T) wants to be ComptimeMap(anyerror, T) which is effectively just switch (any_err) { ... } of result type T but it’s done through a convoluted path. To see why, expand the comptime function then simplify until you cannot simplify any further. anyerror is the set of all errors thus @field(anyerror, @errorName(err)) == err for all values err making the string path redundant. But this is really just a nitpick.

The switch logic forces you to make a choice at that particular point in the program and makes it clear that you are doing and any configurable behaviour is clear statically. By using a map you’re moving all of this to runtime thus increasing the number of possible configurations without making a visible change in code dealing with it which will be a source of bugs.

Note that error.SomethingWentWrong often doesn’t give the context required to make choices further removed from the call-site. The tcmalloc example is a great example of this where m.clearCache() catch |err| switch (err) { error.ExampleCannotClearCache => try m.clearCache(), ... } which is a significant reduction in complexity in comparison with the map in addition to containing all required information at the call-site. Configurable behaviour here only needs to concern extra logic for the failure switch prong if so desired and the set of possible strategies most certainly small enough for this to be a viable solution (if not, there is a structural problem with the code or you’re trying to solve problems you don’t have).

Payloads are also better as explicit out parameters for functions that need to provide them or as part of the container. With a map it’s more likely that past payloads will be overwritten before they’re read or read within the wrong context causing difficult to track bugs within the application error path. The lifetime of an error payload should be short (if not committed to a log) instead of recreating a local variant of errno with all of the same issues.

Composing a larger error set to handle the errors you expect is the correct way to go which also means as switch is likely all you need.

One should design for that which exists now (or being written now) and avoid adding dynamic structures in-case someone later adds more error values to their code. The correct way to handle such changes is to update the caller to account for the new error values (or even redesign if needed) and not work around the type system to completely erase errors.

I understand that this is the brainstorming category however pointing out issues with the approach is also part of exploration.

AndrewCodeDev · March 20, 2024, 10:31pm

To be clear, I’m not picking a side and saying “this is a great option, you should use this” - I’m interested in what ways can error handling be expanded and what the costs are (especially when data structures get involved). You are certainly right, there are costs and we should get into those.

Yes, this is correct. That’s a clear benefit of of direct switch statements. On this point, total agreement

Better is a relative term here - better than how Zig handles things now? Maybe, but I don’t consider this to be ideal as out-parameters have plenty of problems on their own. It would be very nice if we could work your idea here into the current syntax and be able to return a more robust error directly.

@tauoverpi I know you’ve looked into this quite a bit, do you have any insight into why errors with payloads are not being considered? Perhaps they are, I’m just not aware of the discussion around that.

So again, we’ll agree here on principle. That said, for any table that has statically known and fixed size data, this is always true. You could enumerate the keys, run your comparisons, and then return the one where the keys match. Switches don’t work over strings however, so there are a few non-trivial caveats here, but that doesn’t distract from the point you’re making.

I think at this point, I’m relatively convinced this isn’t a pattern I would use, but it has an interesting set of implications. I’m still out on what pattern I think actually makes the most sense and I’d have to do some serious thinking about it.

tauoverpi · March 21, 2024, 6:41am

Allow returning a value with an error · Issue #2647 · ziglang/zig · GitHub is the existing ticket for payloads.

Paying for collecting and copying error payloads for every call-site would be expensive and it encourages one particular way of handling it rather than taking a step back to add error diagnostics to the design. The amount of diagnostic information desired will only ever be known at the call-site.

Those along with it being unclear of how this is meant to work with the existing error set inference from what I’ve gathered.

What I meant by the reduction to switch is that the string is not doing any of that you can’t do with anyerror other than passing error values to another zig program (which is also what one should most certainly not do). You can place anyerror in a hashmap, cast it to int, and have all the same issues just without the trip through []const u8.

@errorName (and []const u8) only came up as the focus was on ComptimeStringMap which is constrained to []const u8 (it doesn’t need to be Add comptime hashmap by Vexu · Pull Request #5359 · ziglang/zig · GitHub) rather than the pattern itself thus confusing a particular implementation for the whole picture. Strings not working with switches is an issue created by taking a turn through @errorName thus a limitation of a particular implementation and not part of the actual pattern under discussion.

AndrewCodeDev · March 21, 2024, 7:14am

Comptime Hashmap would be a great addition. At this point, I’m pretty sure we’re on the same page - like I mentioned, the string comment doesn’t distract from the point you were making.

Most of the use cases I was able to come up with could probably be exchanged with function calls and those could be swapped out at compile time directly. I’m getting the impression that they’re trying to avoid what this brainstorming thread is looking into.

tauoverpi · March 21, 2024, 8:36am

ComptimeMap would most likely be the wrong thing for most situations given that you know more than it can infer about the type you provide. If you need a comptime map for anything other than strings you should design it around the data you have rather than rely on a generic ComptimeMap as this enables you to take advantage of the structure of the given type. This is exactly what happened in the PR and why there’s only ComptimeStringMap today as there was a much better implementation for this particular case.

This is why ComptimeMap would likely be a negative addition rather than a positive one for code quality.

AndrewCodeDev · March 21, 2024, 9:16am

Sure, standard utilities have to make a trade-off and if you really spend the time, you can often hand-roll things that are faster. We are getting pretty off-topic at this point though.

tauoverpi · March 21, 2024, 4:52pm

What I mean here is that ComptimeMap is a perfect example of something that shouldn’t be included as a standard utility as that which you gain is much lower than the cost of not having it. HashMap on the other hand is more useful than it costs more often than not. The friction above with having to use @errorName() to force a pattern which leads to issues is a great example of a case where making ComptimeMap standard would mostly cause issues.

Not including something due to the unintended effect it may have is also a valid design choice.

This wasn’t really about hand-rolling everything but rather that some things should have a bit more friction such that one gets a bit of time to question the approach. If they never question it then there’s nothing one can do to stop them creating a mess but it helps to not provide tools to make the wrong thing even easier. Here hand-rolling other maps is the more desired option as this is part of your code and not data you load at runtime.

Sze · March 21, 2024, 7:33pm

I do agree with you, regarding long term language direction.

I also kind of like the diagnostic pattern Allow returning a value with an error · Issue #2647 · ziglang/zig · GitHub.

However while I like how pragmatic and practical the diagnostic pattern is, it doesn’t really compose well and is fairly adhoc, which is why that ticket is still pretty interesting / relevant.

I also think that the moment a language accumulates too many patterns you need to ask yourself “why?”, it often is because the language lacks a feature and the pattern is the workaround.

I think these payload topics spring up again and again, exactly because we don’t have a great solution yet, we can say lets see how that issue will develop and I think it makes sense to point out, that a good solution will likely come from that issue, but I don’t think that should prevent people from discussing, whatever less then satisfying workarounds we can come up with in the mean time.

I think people would be glad if they can replace their hacks with something proper, but currently there are only hacks and ad-hoc workarounds. Personally I will probably try to stay with the diagnostic pattern until we have some kind of language support, if we get it.
But I find it useful to know that there are possibly other ways to structure things, even if those have their own downsides.

I think pointing out the downsides is perfectly fine and you have done that well, I just don’t think that should stop the conversation of different possibilities, after all this was posted in the brainstorming section. In the explain category these kind of explorations would be offtopic, but here they aren’t.