I have been wondering if there have been proposals or discussions on including an early return syntax for optional, akin to try, instead of orelse return null. I looked in here and in GitHub issues and I couldn’t find any discussion or rationale on the exclusion of it. I was wondering if anyone can link me some background discussion on the subject.
I do not think that is as needed. Retuning an error is very common compared to returning null which has in my opinion has the more used pattern of ’orelse default_val’. I think it is much better to handle the null case locally instead of promoting that it should’ve been handled higher up.
The first mentions of the current syntax are here and here if you want a starting point.
The orelse keyword is symmetric with catch. They used to be ?? and %%. There used to be a symmetric ??a and %%a pair as well for “if null/error, crash”.
With error unions in the language but no language support for figuring out which functions return what errors, “crashing with a stack-trace in debug builds” (%%a, currently a catch unreachable) is a strictly better developer experience than returning the error value. With the error value, the caller only sees the value. With a crash/panic, you also get a stack trace with information about where it came from.
This was fixed with error return traces: An error returned from the top level gives you a complete trace of the called functions all the way down to where the error originated. Now catch |err| return err is the correct thing to do if you want crashes with good error information. You have a new problem though, both %%a and a catch unreachable are easier to read and write than a catch |err| return err.
The try syntax addresses that problem, which is unique to errors.
Remove the shorthand for a catch unreachable and introduce a shorthand for a catch |err| return err, now try is the least-effort error handling mechanism, which means most unhandled errors result in nice crashes with a helpful function trace.
It isn’t needed for orelse because null values rarely propagate like that. You handle the null case locally, if you can’t, you return an error. When you received it, the null indicated “something didn’t happen”. If you don’t know what to do in that situation, that’s an error, not a null.
Thanks for the write up. I will go through the links to read more.
people keep repeating that to me but I am not convinced. If anything I believe the causation is backwards: it is not used often because it is three keywords instead of just one.
Often times either the data is there or it is not. No point in returning an error if it is not actionable by the user.
Edit: if anything, .? is the one out of place here.
I have functions that return ?T and I don’t have any problems. The orelse return null seems clear and easy to me. So maybe it’s a matter of becoming accustomed to it? It also helps to know that the language is very unlikely to change in this regard.
Zig std library confirms that null rarely propagates like error values (errors propagate 300 times more that nulls).
I counted 26902 try and 87 orelse return null.
I think this is just something you get used to, and once you do, you appreciate the flexibility. Commonly, when unwrapping an optional, you want to:
return an error value
Use a default other than null
Etc.
This seems like unnecessary syntax sugar to me.
This next part is meant to be very light-hearted, and keep in mind it’s coming from someone who’s also learning Zig very fresh. Zig doesn’t like adding little convenience features to the syntax just to save typing in a few cases. try is an exception, I think simply because of how commonly the compiler authors saw that pattern. This is the same reason I didn’t end up asking for a feature such as aliasing, so I can just have std.mem.Allocator alias to Allocator without me having to write the type name twice, it goes against the Zig philosophy. In other words, hang in there, Zig will likely grow on you and writing orelse return null will feel like a tiny thing
I agree and also believe that common is not the same as best. But at least it explains why there is no shorthand convenient syntax for orelse return null, and why there won’t be in the future, right?
people keep repeating that to me but I am not convinced. If anything I believe the causation is backwards: it is not used often because it is three keywords instead of just one. Often times either the data is there or it is not.
I think you’re missing the second half of the current null semantics.
In Zig the function fn findData() ?T returning null indicates two things:
The data wasn’t found and
Not finding the data is a valid result
“Not finding the data is a valid result” may be true locally, but as you continue up the call chain, it becomes increasingly less likely to remain true. At some point the application is expecting some data to either exist or not exist, and at that point, the null should either cause something valid to happen, or become an error.
Cases where repeated propagation of a null value remains valid program state are pretty rare, while cases where repeated propagation of an error value remains invalid program state are the vast majority of cases.
I think that’s why there hasn’t really been any impetus for try to have a counterpart for null.
Edit: if anything, .? is the one out of place here.
I think the decision to keep it was made to be ergonomic with known-good nullable pointers. Both .* and .? are IB when used on invalid/null data, that makes .?.* shorthand for a raw pointer access.
For my own code, if I want to return an optional type, I would prefer to return a tagged union type.
A common assumption among programmers is that most of the time, the reason a value cannot be retrieved is obvious. But this assumption is often not true. The reason a value cannot be retrieved may not be as apparent as the programmer originally thought, and users really need to understand why the value could not be retrieved.
Optional types cannot provide this information, but tagged unions can.
Moreover, in the future, if a second reason for a value being unavailable arises, a tagged union can clearly distinguish between the two situations.
There are several reasons why I prefer tagged union rather than error union.
Tagged union is as small as possible. When I use error union, there is a risk that the global error union will be occupied by the error code of this function and become larger, and the current return value may also become larger due to the use of error union.
Semantically, tagged union conveys the meaning that “this is a completely reasonable expected phenomenon rather than an unexpected result, and the caller must consider and handle this situation seriously.”
Of course, I admit that there is some contradiction here. For instance, if I were to strictly distinguish between expected semantics and errors, !union(enum) might lead to a larger return value than merely using error union. I can only hope that the error union optimization for tagged union can avoid this.
errors are u16 under the hood and will not change dynamically based on the number of errors in the program. You can change it if you want, though. --error-limit n or exe/lib/obj.error_limit = n.
errors are not necessarily ‘serious’ or ‘unexpected’. error.EndOfStream is a great example of that, most streams will end eventually and when they do, it is usually not a serious problem.
Errors are just for alternate control flow.
Tagged unions are semantically different, they carry both control flow and data, which is certainly preferred at times.
Thank you for pointing out this underlying detail. However, if the error code is artificially set to a lower level, such as 1 byte, the entire project is only allowed to accommodate 256 types of errors, which would be unacceptable, or most of the current errors must be rewritten as tagged union to avoid occupying extremely limited error code positions.
I can understand this idea, but perhaps because I’m a lazy person and especially like to use infer error unions, I always expect my caller to be as lazy as I am and don’t like to observe which errors I might throw, which ones I didn’t expect and which ones I anticipated. So I told my caller through tagged union what try cannot automatically throw up is the part you have to think about and handle. As for the part that can be thrown up, if you are a lazy person like me, you can also throw it up because this indicates that something unexpected like OOM has occurred.
The tag of tagged union can also be used only for control flow if the corresponding type is void
my point was that a u16 is already quite small, but more than capable for the vast majority of projects.
Even with just 255 values (1 of them is used to indicate the non error case) I don’t expect that to be a problem for small to mid-sized projects. Remember that errors are identified by their name, in many cases the names (and by extension their values) are reused.
the way this is worded sounds like you are talking about library code… which should absolutely not be using inferred error sets.
It’s your code, but TBH, I would not want to interact with code like this.
yes, but my point was they can carry data where error unions can’t, so if you need that then a tagged union is the way to go.
However, if that data is usually unnecessary, then I would prefer error unions with a diagnostic pattern.
Communication with the caller does not always occur on the library code. I think every API we write down is a conversation with a caller, who can be a multi-person collaborator on the same project, or even yourself maintaining the project again a few months later.
I favor inferring error sets partly because it minimizes dead code information as much as possible. If we use an error set and remove a potentially returned error from the current API, theoretically, this error set should be problematic because it mistakenly contains errors that should not be included. However, in reality, this can basically be compiled successfully. Then, this redundant error becomes a legacy dead code that provides mistaken information to the reader without being detected. Inferring the error set yields the result of an error set that is as concise as possible, minimizing the situation where the error set of the current API contains errors that are actually impossible to return.
I agree, I just think about it less when working on an application than a library.
If I am aware I am removing an error, I would remove it from the sets first, then I get compile errors telling me where to go next.
This also contradicts your communication with the caller, it hides the possible results of the function. While you can get that information with an empty switch, I won’t even think of doing that at first if I don’t see an error I want/need to handle. That being said, I clearly don’t code like this, so that certainly biases my response.
To tackle your side of this issue, I think it would be reasonable for the compiler to complain about not returning all the errors you declared to be possible.
I agree that declaring an error that will not actually occur is miscommunication with the caller, though I dont think it is as bad as not communicated what errors at all.
An error that doesn’t occur could be dead/forgotten code, or if you have multiple implementations that use the same API, it might just be that one of the implementations never returns that error, while others may.
So if this unused error is for the sake of implementing the same API, then it is still needed so that implementations can be switched. This also means the compiler couldn’t easily say where it is ok and where not, so maybe it shouldn’t be a compilier error and instead some kind of project specific validation step that can distinguish between API vs non-API cases.