To be clear, I’m not trying to set a default value. I’m trying to partially initialize the value, piecewise, by first setting the tag, and later setting a value.
You can see it sets the tag (and only the tag) on line 10, and the value (and only the value) on line 16. Now let’s modify that example to use an optional instead:
const std = @import("std");
const Foo = ?f32;
pub fn main() void {
var f: Foo = null;
f = ??? // this is the missing row in the chart
bar(&f);
std.debug.print("value: {}\n", .{f.?});
}
fn bar(f: *Foo) void {
f.? = 12.34;
}
How would you preserve the original semantics here?
You cannot preserve the original semantics because optionals are not simply tagged unions. They happen to be implemented that way in most cases but not all (see optional type optimization · Issue #104 · ziglang/zig · GitHub). Because of this, zig does not provide a way to access the tag of the optional value.
Your example would work if you removed the ??? line and changed the optional unwrap in the bar to a pointer dereference: f.* = 12.34;
Furthermore, the .? operator does not do what you seem to think it does. It is equivalent to orelse unreachable and can only be used to access (or “unwrap”) the value, not mutate it.
pub fn main() void {
var foo: ?u8 = 0; // you may find it instructive to replace 0 with null here as well
foo.? = 10;
std.debug.print("{any}\n", .{foo});
}
I didn’t actually know that you can use a .? unwrap in an lvalue position, learn something new every day.
I suppose there’s no reason for it to be illegal, although I also can’t come up with a practical use for it. All it appears to do is create an unnecessary opportunity to crash the program.
The bottom line is that Zig optionals aren’t syntactically or semantically tagged unions. As an implementation detail, they are sometimes. So when you’re trying to figure out a way to apply the “Some tag”, you’re using another language, or trying to use Zig as though it were another language. That’s not how this works.
I wouldn’t say that. Zig is about optimal code, and setting only the bytes you need to set is a very Zig thing to do. If you have ?[4096]u8, it would be wasteful to memset 4097 bytes instead of setting 1 byte.
No matter how the niche is stored, the compiler knows what bits are interpreted as the tag, so of course it’s possible for the compiler to set those bits. The compiler has full knowledge of the data layout. It feels like you’re just saying “this isn’t possible, therefore it’s not possible”, justifying the status quo circularly.
Since the langref calls out this exact pattern as valuable enough to demonstrate, if this indeed isn’t possible, I’m tempted to say it’s an oversight in the design of the language.
It certainly is undefined behavior. You’re merely getting lucky that in the first = undefined is choosing 0xaaaa… (a non-null bit pattern). If you compile in release mode and the OS zeroes the stack, you might end up with 0x0000… (null bit pattern) instead. Change the = undefined to = null to see what happens then.
You’re right, I take it back. Should have tried it in ReleaseFast before replying.
@as([4096]u8, undefined) does work. This is a case of illegal behavior which isn’t safety-checked, I’ll look for an existing issue and otherwise file one.
To @LucasSantos91 point, the semantics of optionals is that a value is either a valid T or null. Trying to add the third type ruins the semantics of being optional.
using the example from the langref you cite, why can’t the signature be like this? (Per @n0s4 suggestion)
Actually, in Debug mode, this works for all types T.
fn bar(comptime T: type) !void {
var x: ?T = null;
x = @as(T, undefined);
try expect(x != null);
}
But in ReleaseFast, it fails for pointers and function pointers, because the undefined value for those types at runtime is (not guaranteed to be, but is usually) 0, which is the same as the null value. But I think this code should work for all types, so this is a language design flaw.
Why should it work for all types?
Optionals don’t need to be assigned (some undefined), because you can assign null to them instead, undefined is for when you don’t have a valid value, but optionals can always be set to null.
It doesn’t make sense to me to set an optional to a non-null value that is undefined, that just means you don’t have a proper value for it and in that case it should be set to null.
I don’t understand how having an optional set to (some undefined) makes any sense. The only way it makes sense to set an optional to undefined, is if the whole optional is set to undefined and in that case it later needs to be initialized to either null or (some valid-value).
But I think this code should work for all types, so this is a language design flaw.
I’m not sure why it should work. null is a defined value. It means “The value is not present” when we look at optional types. undefined is not a defined value. To quote the language reference:
undefined means the value could be anything, even something that is nonsense according to the type. Translated into English, undefined means “Not a meaningful value. Using this value would be a bug. The value will be unused, or overwritten before being used.”
So when you set X to be undefined, even if you mask it behind a cast, there is no guaranteed data there. Because the promise with undefined is that you will never use the variable again, or that you will set the value to something meaningful before using it.
I understand that there is some semantic overlap, wherein for an optional type null, the usage looks a lot like usage of undefined. But that further underscores the point that setting an optionally typed variable to undefined makes little sense. Just set it to null, if we know that there isn’t a valid value for it.
null is a defined value. It means “The value is not present”
This part is absolutely correct. On the other hand, undefined means “the value is present, but unknown.” undefined and null are two orthogonal concepts, and the distinction between them should be upheld.
The fact that my code block works for nearly all types except for pointers is almost certainly an oversight caused by Zig’s null pointer optimization. Somewhere in the world, someone’s generic code is probably broken because of it.
But that further underscores the point that setting an optionally typed variable to undefined makes little sense.
I understand that the use cases are somewhat rare, but for consistency’s sake, I think it would be good if it was supported by the language properly.
The use case is partial initialization, just like in the langref example. One could also ask of the langref example, “why do we need to know it’s a float before we assign it”, and the answer is Zig’s design goal of optimality. You have an uninitialized value, then set some bytes now, and some later, such that all were set exactly once. Sure we could assign the tag over and over again, wasting CPU cycles, but that would not be optimal, and that’s why Zig’s design gives us a way to avoid this waste (for unions, but seemingly not for optionals without an ugly cast).
I think you’re getting the layers mixed up. The “undefined” is strictly within the payload of the optional, and not affected its null-ness or lack thereof, just like a struct field being undefined does not “break” other members of the struct. Leaning again on the langref example:
In status quo Zig you have to distinguish between optionals that are implemented via a sentinel value that represents null and those that are implemented via a tag, for the former there is no separation between the value of the payload and the null value, because there is no separate tag.
You can call that a bug, but I call it a feature that allows me to use optionals with pointers in a useful way that doesn’t waste memory in the majority of cases (if we didn’t have that optionals would be too costly), with something like this enum-backed address spaces · Issue #21870 · ziglang/zig · GitHub you might get your wish, if that opens up enough bits to do tag based tracking of optionals in all (or enough) cases, without making optionals something that would need to be avoided by users.
So I guess I can agree that it may be nice to have optionals which are cheap and always tag based, but if we can’t have it I am fine with what we already have and having to distinguish between sentinel and tag based optionals in a few corner cases.
You’re describing a world where we have to choose between space-efficient optionals, and ability to assign the tag, but that dichotomy doesn’t exist in reality. The tag doesn’t need to be directly addressable in order to merely assign to it. For example, imagine on x86_64 that the compiler chooses to store a ?*Foo (64 bits) in rax, with null being 0x0, and then you run:
var p: ?*Foo = ...;
// ...
p = @as(*Foo, undefined);
It would be valid for the compiler to output:
mov rax, 1
Now p is non-zero, and thus non-null. We fixed the bug, and kept efficient optionals. There’s no reason we can’t have both. This doesn’t depend on the address space proposal.
I understand where this is coming from in theory, but I struggle to see how this is worthwhile. Adding such specific, niche behaviour increases the complexity of the language, which is a significant price.
Along with being a very rare use case (which nobody has been able to come up with an example of), if it ever was relied on, I would be surprised if the performance savings would ever be even observable, never mind being a priority for performance in any sanely written application.
These are just my two cents. I think there are better battles to fight than this one.