Try (try ...) vs @panic style for initialisation

anticrisis · August 6, 2024, 9:38pm

I guess this is a matter of opinion, but I wanted to ask what style you think is preferable for code which needs to allocate right at the start of your program. Failing to allocate means the program can’t run. It’s not a recoverable error. So I lean to @panic in that case.

For example, my little command line argument parser needs to allocate. If it can’t, there’s no point continuing. For a generally useful library, is it reasonable to panic in that case?

My practice in other projects is to use errors and exceptions only for recoverable conditions, where the program can continue. Unrecoverable conditions should terminate as quickly as possible, which aids debugging.

Appreciate your thoughts about zig style on this topic.

kristoff · August 6, 2024, 9:53pm

A library should only panic in case of programming errors.

If you are in control of the full program, then killing the program early is fair game but then ask yourself the question:

Will the user care which line failed and what was the stack trace at the moment of failure?

The answer is usually: no. The user cares about what happened (e.g. program ran out of memory), but not the line where that happened, which means that you should prefer this over a panic:

std.debug.print("out of memory", .{);
std.process.exit(1);

Note that this makes sense to do as the application. In a library you should return error.OutOfMemory and let the caller do the printing instead.

Also beware that sometimes your library might be given an allocator that purposefully has a limited amount of memory so running out of memory in one situation doesn’t necessarily mean that all is lost.

anticrisis · August 6, 2024, 10:16pm

Excellent guidance, thank you. This is what I heard, extended a bit:

Only use @panic for logic errors, not for runtime errors such as memory, missing configure files, etc.
In the case of unrecoverable errors such as memory, missing files, the program should prefer to print a message to stderr and exit(1) the process.
In libraries intended for general use, a runtime error should always return an error condition, even if the error is unrecoverable. The library’s client will be responsible for printing to stderr and exiting the process, if appropriate. For logic errors, a library may still @panic.

mnemnion · August 6, 2024, 11:44pm

Libraries should panic only if they absolutely have to, yes. There are some clear cases of ‘operator error’ where it’s defensible, like if you’re passed two indices meant to represent a region, and they’re in the wrong order, maybe you want to signal that as a bug. But even there it’s worth considering returning something like error.InvalidBoundaries or something like that. I err on the side of making even errors like that into a condition for library code, and also like to explicitly document errors in library code, rather than leaning on !T inferred errors. It means they can see in the documentation that a failure mode of the function is two arguments being in an invalid order, and decide what they want to do about that fact.

OutOfMemory especially shouldn’t be escalated into a halt-and-catch-fire state. @kristoff linked to one way to trigger a failure like that, I wanted to add std.testing.checkAllAllocationFailures to that link, because it’s a way to guarantee that every single allocation failure in a test will happen, and that the code properly deals with its memory when it does.

It’s a good thing to know about, because you can use it to check the validity of your own library’s memory handling, and also, because a user of that library might be calling checkAllAllocationFailures on their own code, and crashing on them wouldn’t be friendly.

One last reason: sometimes the only thing code can do if it hits OutOfMemory is release resources and quit, but that first part is important, and it’s only true sometimes. Zig stands alone in deeply integrating with the reality that allocation can fail, and this has a decidedly positive effect on program design and robustness.

kj4tmp · August 7, 2024, 12:35am

unreachable may also be useful!

mnemnion · August 7, 2024, 1:25am

Hitting an unreachable in library code should always mean that you uncovered a bug in that library. If it’s reachable, don’t mark it unreachable.

“Always” is slightly overstating it: if you document that a certain field in a struct should never be changed by the user, and they change it anyway, that might hit an unreachable. If someone uses @constCast to make the body of a RuneSet mutable, and starts flipping bits, they’ll hit unreachable code.

But the question to ask is “if someone hits this branch, should they file a bug?”. A yes answer here indicates unreachable is being used correctly.

dee0xeed · August 7, 2024, 7:05am

For long running services manged by systemd I use this approach:

in a some.service I set Restart=on-failure
if an error encountered on startup (misconfig or similar) do exit(0), so systemd wiil not try to restart the service; it’s more than likely that error will occur again.
if an error occurs when a service is already doing it’s regular job, do exit(1) and it will be restarted
I print back trace (into a log file) only when I got SIGSEGV,SIGBUS,SIGILL

anticrisis · August 7, 2024, 8:02am

Interesting, I haven’t learned systemd yet. Kind of odd there’s no status code that says “I failed to start, please call an operator, because something is seriously amiss.”

dee0xeed · August 7, 2024, 8:13am

There are systems that may run autonomously for weeks/months without a link to external world.

dimdin · August 7, 2024, 8:27am

For configuration errors consider using exit(2) and setting RestartPreventExitStatus=2.

dimdin · August 7, 2024, 8:32am

There is OnFailure=failure-handler.service, when a service fails, it calls a list of other services to handle the failure. The failure-handler.service can be a oneshot service that sends an alert/email/sms to the operator.

dee0xeed · August 7, 2024, 9:02am

Cool, thanx! Did not know that.

kristoff · August 7, 2024, 12:32pm

No, hitting an unreachable could be a bug in the library or a programming error by the consumer of the library.

A library can define preconditions that cannot be enforced statically, and assert that those conditions are being upheld. Asserts and unreachable are the same exact thing (std.debug.assert is implemented using unreachable).

For example a text rendering library might need valid utf8 text, so it could have two functions for it:

pub fn renderUtf8Checked(src: []const u8) !Wordart {
  if (isValidUtf8(src)) return error.InvalidUtf8;
  return renderUtf8(src);
}

///Expects `src` to be valid utf8
pub fn renderUtf8(src: []const u8) Wordart {
  if (builtin.mode == .Debug) std.debug.assert(isValidUtf8(src));
  // do the hard work of generating a wordart
}

The first function can be used by consumers to delegate the act of validating utf8 to the library, but in some cases this validation might have already happened upstream (ie you expect the string to always be valid utf8 by that point) so you can use the second function to avoid a redundant check.

In debug mode the second function will still test that the provided input is valid but in release modes that check will be elided in favor of performance.

This is an example of a situation where hitting an unreachable in a library is not a bug in the library, but in the consumer’s code.

So, no, in this example unreachable was used correctly and hitting it does not mean that you should open a PR to the library.

mnemnion · August 7, 2024, 3:08pm

kristoff:

For example a text rendering library might need valid utf8 text, so it could have two functions for it:
pub fn renderUtf8Checked(src: []const u8) !Wordart {
  if (isValidUtf8(src)) return error.InvalidUtf8;
  return renderUtf8(src);
}

///Expects `src` to be valid utf8
pub fn renderUtf8(src: []const u8) Wordart {
  if (builtin.mode == .Debug) std.debug.assert(isValidUtf8(src));
  // do the hard work of generating a wordart
}
The first function can be used by consumers to delegate the act of validating utf8 to the library, but in some cases this validation might have already happened upstream (ie you expect the string to always be valid utf8 by that point) so you can use the second function to avoid a redundant check.

I had considered discussing the ‘paired variant functions’ pattern, so I’m glad you brought it up.

This is roughly what I was getting at here:

Violating documented assumptions of a library is another case where unreachable comes into play, but in retrospect, my example was too specific, and should be generalized.

This is specifically covered in the documentation on style

Use the word assume to indicate invariants that cause Undefined Behavior when violated.

Use the word assert to indicate invariants that cause safety-checked Undefined Behavior when violated.

Although the terminology is changing to illegal behavior, which I think is great.

I might suggest here, that the functions in question should be renderUtf8 and renderUtf8Unchecked, but this is also a heuristic. runeset has the Rune struct, and the documentation takes pains to explain that this is based on a concept of a conformant Rune, which is the only kind you get back from the API. So it has toCodepoint, which can throw error.InvalidUnicode, because a conformant Rune may contain invalid Unicode, but it must be in a specific format. There’s also toCodepointAssumeValid, which uses catch unreachable: these pair with fromSlice and fromSliceAllowInvalid: fromSlice will only return a Rune-encoded generalized UTF-8 codepoint, so it has the return signature ?Rune, and fromSliceAllowInvalid has the return signature Rune, and encodes invalid bytes one at a time. This is a big part of why I wrote that part of runeset actually: using codepoints doesn’t make ill-formed sequences directly representable, and there are times when you want to be able to do that.

The functions to test validity come in two flavors: isCodepoint and isCodepointAnyRune, and the same for isScalarValue. One of these asserts that the rune is conformant in safety modes, one of them is legal to call on any u32 packed into the Rune struct, in any mode.

So yes, you’re right, “should they file a bug” is not the whole question. It’s definitely an indication that unreachable has been used correctly, but on the contrary, if the answer is “no”, that doesn’t tell us much.

So I propose this modification, two questions for unreachable in libraries:

If the user hits this branch, should they file a bug?
- If yes, good use of unreachable.
If the user hits this branch, have I clearly documented why they have a bug?
- If yes, possibly a good use of unreachable. Returning an error or panicking are also options here.
- If no, you can turn this into a yes by providing that documentation.

And it’s a good practice to provide an error-returning variant which performs checks for the user, which gates the fast-path “assume” or “assert” variant, or might be completely separate code, depending on the depths of the assumptions/assertions in question.

Which one gets the shorter name depends on the details of the API, but if you’re not sure, make the one which can crash longer: so in std.ArrayList we have append and appendAssumeCapacity, for example.