I’ve realized that I have a tendency to hyper-focus on the happy path and neglect error handling, and whenever dealing with errors is an afterthought, I end up with programs that are not as reliable as I would like.
So I was wondering if you could share some of the strategies, patterns, design decisions, ideas, mental models, anything… that you use to properly handle errors! I would also be happy to read about what not to do (anti-patterns)!
(not sure if “Help” is the correct category)
(I guess this is more of an engineering question than a zig specifc question, and I would love to see examples in other languages if relevant)
What really worked for me was to not allow main to return an error (or depending on your program it may make sense to allow certain errors, like OutOfMemory). This forces me to handle all errors locally, which usually is the right choice. I rarely bubble up errors, and if I do it’s part of an internal API (and here I suggest to use an explicit error set). Overall (not counting tests) I have about as many statements with try as statements with catch.
As for the concrete strategy of actually handling the error, it depends a lot on what you want to do. I make a game, and it often makes sense to only log the error and then attempt to go on, so the player can keep playing uninterrupted.
This is something I’ve come to work with as well. While i still might bubble errors up, I’ve concluded that it is almost always a bad idea to use the anonymous error set in return signatures (!ReturnType). I always return a specified subset of errors, even if it is made up inline (error{SpecificError}!ReturnType). This forces me to think about what Is reasonable for a function to bubble up vs what should just be handled.
Another thing I’m trying is to embrace that fatal errors are legitimate, and to accept that they are sometimes the best way to handle things. For the most part I am starting to exit process in that spot rather than try to bubble up the fatal error to some centralized error handling system.
Be sensitive to side effects: for each statement that produces a side effect, use errdefer to restore the state in case of an error. If you are not sure whether the current function may produce an error in the future, you should do this even if it does not produce an error at present.
Unless you actually know how to handle an error correctly, I think concentrating on the happy path is a perfectly valid approach.
Just be sure that if you don’t know how to correctly handle the error, you log the error somewhere, together with some key data, then rollback your program’s effects on the environment/world, in particular, DBs, such that you always leave the environment/world in a consistent state, and only then panic.
Just trying to keep going on may be ok for a game, but I have seen too many strange data issues (read: inconsistencies) in DBs caused by “error handling” without keeping transactions in mind. Unfortunately, handling errors is often confused with logging errors.
Panicking doesn’t necessarily mean your program stops forever - you can put a little wrapper script around it which restarts it (or let eg systemd handle this). That way, the OS can clean up resources.
Thinking about how to correctly handle each and every error which is theoretically possible is a waste of time in many cases (well, unless you develop an OS or a medical device).
Many errors, like OOM, are so rare that your program can afford to just panic with an error message. If this actually happens regularly, then you still develop a strategy how to cope with it.
I’m going to quibble with this: it’s almost always a mistake to write code which only applies to a hypothetical future.
Instead of a state-restoring errdefer, use an errdefer comptime unreachable in that location. It’s a placeholder which has to be filled in if any try statements show up below it, that’s better than a load-bearing errdefer which will never run because it’s unreachable. Write that one later, if it comes up.
Overall, what I agree with is that all side effect points have errdefer as a marker for when a side effect occurs. However, in scenarios without errors, it is best to use errdefer comptime unreachable at each side effect location instead of a specific rollback statement, as it does not introduce code that will not be executed, while still marking the side effect location.