Optional vs tagged union vs error union

I’m finding myself increasingly disliking the use of optional types.

E.g., consider a program’s output file pathname. This pathname might be provided manually when parsing the command line, or if not provided, it might be dynamically generated based on subsequent program execution.
In the past, I would express this pathname using the optional type ?[]u8. When determining the output pathname, I’d check whether it’s null and, if so, automatically generate the pathname.
However, I’ve come to realize that this management approach leads to a kind of ownership confusion. User input and automatic generation are fundamentally two completely different sources of ownership. If requirements change in the future, I can’t guarantee that they will be constructed using the same allocator.
Now, I prefer to express this field type using union (enum) { manual: []u8, auto: []u8 }, managing them with different ownership models depending on their source.

For another example, consider a value that needs to be parsed, and the result of the parsing might be null.
In the past, I’d express this using ?T. If it hasn’t been parsed yet, it’s null. After parsing, it might remain null or have a specific value.
Now I prefer to use union (enum) { unparsed: void, parsed: T, parsed_null_because_some_reason: void, parsed_null_because_another_reason: void}, and I realized that I can not only express more precisely whether the value is not parsed or parsed to null, but also further expand the different cases where the parsed result is null to express richer information when further needed.

At this point, I realized something else: from a semantic and performance perspective, this approach competes with error union, especially when used as a function return value.
A naive approach is to use a tagged union if all meanings are expected, even if it’s not a regular return value. This approach also offers better performance, as the expected tags are limited, while error codes are global. Error unions should only be used for unexpected situations. However, because error unions have better syntactic support, they are sometimes used to represent the meaning of expected return conditions. While this is more convenient, it’s not the most correct code.

But I also noticed that if the return value contains both expected and unexpected conditions, using !union(enum) {...} is less performant than simply using the error union and treating the expected but not regular condition as a catchable “error.”

However, if I prioritize performance when designing return values, I might end up using the tagged union’s tag to represent an expected meaning in some cases, and the error union’s error code to represent these expected but not regular conditions in other cases, creating a sense of inconsistent semantics.

2 Likes

I feel like this opinion is somewhat similar to Roc language: Why doesn’t Roc have a Maybe or Option or Optional type, or null or nil or undefined?. On one hand it is convincing and I like that they are doing it this way, but I feel like it’s only appropriate in a higher level language like that one.

Regarding performance, I’m trusting that optional type optimization · Issue #104 · ziglang/zig · GitHub will also cover errors and in the famed 1.0 I’ll just write what feels right

2 Likes

Personally I’ve hit a few cases similar to yours, but in virtually every single one I chose to simply ensure the memory semantics in both cases was identical - in your example this would simply be to duplicate the inputted path using the same allocator that would have constructed a new one.

I personally find that to be preferable to any usage of unions or optionals - to eliminate differences in how a return value needs be handled whenever possible.

There’s certainly instances where that kind of thing isn’t feasible. Some can be solved by simply enforcing usage of an arena, others can be solved by having the caller preallocate. Situations where the allocation might be huge and very wasteful come to mind?

Edit: your second example is a good case too where the function has 3 distinct and useful outcomes, in which case a union is evidently beneficial.

3 Likes