Are global integer error values a good idea?

Olvilock · March 26, 2025, 10:11am

I was reading on some cursed Zig and came across Auguste’s repo showcasing the global comptime error state. I am now wondering if choosing to assign integer values to errors globally results in worse codegen for error handling.

Suppose one has a program that relies on several pure Zig libraries (for example, std, mach, zig-network), all compiled in statically in a single unital compilation. Those libraries may have intersecting errors in them, which will get assigned non-consecutive integer values to them. The unpredictable sparsity of integer error codes would result in inconsistent (or impossible to predict) performance of switch (err) { ... } inside those libraries.

My argument here hinges on compiler’s inability to handle sparse switch statements as good as codegen of packed switch (using jump tables) in principle. Is it not correct that a sparse switch is best lowered to a chain of conditional statements?

What do I suggest to change? The return statement (as well as try) already are quite non-trivial if the enclosing block uses defer and/or errdefer. We might change the integer representation of errors here, too. One way to define the semantics more concretely is allowing compiler to re-encode of error values on try and return if the error set broadens. That way, we are able to assign the consecutive integer error values to every named error type and turn switch (err) into a guaranteed O(1) jump

IntegratedQuantum · March 26, 2025, 4:10pm

It is required for anyerror to work, which can be somewhat useful in generic code.

Furthermore keep in mind that an error is intended to be an exceptional value. e.g. error handling have a default branchHint of .unlikely, so I would expect any reasonable optimizer to do basically
if(err != 0) goto :endOfFunctionSection; before the switch anyways, so the happy path should have the same performance in both cases. The only difference is the error path performance, but that’s degraded anyways and if you want maximum performance, then don’t use errors as status codes.

And even without all that, what you propose would make try more expensive, and, I see try way more often than catch |err| switch(err) {...}. Furthermore, remapping things on return could also make the happy path more expensive. E.g. consider return try xyz(); (in a function that has no errdefer), we’d need to check the return value with your proposal, but with the status quo, it can just return the entire thing unchecked to the caller.

Olvilock · March 27, 2025, 10:16am

Errors as status codes are used in some places in the Zig std (for example, std.io.Reader.skipUntilDelimiterOrEof), although very rarely and always in the unlikely branches. Status quo implies to me that handling multiple errors are suboptimal by design in the hot path. I’d like to at least establish relaxed requirements around @intFromError and @errorFromInt so that the integer representation of error sets remained in the compiler’s hands to decide. In particular, incremental compilation requires knowing how shuffling error sets declaration order impact the behavior of distant users of those error sets. Am I allowed to sneakily switch on integers from @intFromError at comptime? We don’t really want any guarantees from compiler about that, same as we forego guarantees about wrapping integer arithmetic and about pointers being just integers.
Another possible way to assign error values is to decouple all error values of primitive (not merged) named error types from one another, e. g. making const ReadError = error { OutOfMemory, EndOfStream } and const FileOpenError = error{ AccessDenied, OutOfMemory, FileNotFound } have different integer representations of OutOfMemory. That way, compiler can pack error values from every module close together and set the consistent cost of error handling for all users of the module. I’d like to allow that.

IntegratedQuantum · March 27, 2025, 2:28pm

Status quo implies to me that handling multiple errors are suboptimal by design in the hot path.

Handling errors in general is not intended for the hot path.

Do you have a specific use case in mind, where you must use and handle errors in the hot path?
If not then I’d say this is bikeshedding.

Olvilock · March 27, 2025, 4:47pm

I did apply errors in the hotpath for particle simulations to uniformly handle cancellations. I needed to select multiple distinct particles from the pool, so when particle interaction had not taken place I returned selected particles back to the pool using errdefer. The calling code handled error.DidNotInteract and propagated every other error up the call stack.
TL;DR: Cancellations

mnemnion · March 27, 2025, 5:03pm

This sounds like the sort of thing one can do, but not necessarily the sort of thing one should do.

Every now and then someone comes to the forum with the great idea to make optionals and errors into tagged unions, the way it works in certain other languages.

To me this looks like the opposite of that: you have a special case which should probably be solved with a tagged union, and you used error handling pathways instead because of the tempting control flow.

I think you’ll be happier if you go back and turn this into ordinary control flow, using tagged unions and/or an enum to signal your cancellation pathway. Zig’s error-handling system is designed for the cold path, because that’s what errors should be. It goes deeper than just non-consecutive error numbers, the compiler is allowed to assume that errors almost never happen and optimize the code accordingly.

It’s very unlikely that major changes to error handling will be made to accommodate the use of errors in ordinary control flow. Going with the existing grain of the language is going to lead to better results for you.

IntegratedQuantum · March 27, 2025, 5:32pm

I would also like to point out that it seems weird that you seem to have other errors in the hot path too. Generally you should try to ensure beforehand (e.g. via ensureCapacity) that the errors cannot happen, or you should handle the error locally (e.g. catch some_default) instead of pushing it up to the root.

chung-leong · March 27, 2025, 6:20pm

On the hot path branch hints would be redundant since the CPU can predict based on actual data.

IntegratedQuantum · March 27, 2025, 6:27pm

They still have an impact on the assembly code, e.g. where the code of the success/failure branch are located or whether to use a cmov.

chung-leong · March 27, 2025, 8:05pm

If the path is hot, the CPU would be working off the micro-op cache.