Options, Enums and Nullability

pqsrc · December 12, 2024, 10:24pm

To be clear, I’m not trying to set a default value. I’m trying to partially initialize the value, piecewise, by first setting the tag, and later setting a value.

Here’s the full example from zig’s docs:

const std = @import("std");

const Foo = union {
    float: f32,
    int: u32,
};

pub fn main() void {
    var f = Foo{ .int = 42 };
    f = Foo{ .float = undefined };
    bar(&f);
    std.debug.print("value: {}\n", .{f.float});
}

fn bar(f: *Foo) void {
    f.float = 12.34;
}

You can see it sets the tag (and only the tag) on line 10, and the value (and only the value) on line 16. Now let’s modify that example to use an optional instead:

const std = @import("std");

const Foo = ?f32;

pub fn main() void {
    var f: Foo = null;
    f = ??? // this is the missing row in the chart
    bar(&f);
    std.debug.print("value: {}\n", .{f.?});
}

fn bar(f: *Foo) void {
    f.? = 12.34;
}

How would you preserve the original semantics here?

n0s4 · December 12, 2024, 11:05pm

You cannot preserve the original semantics because optionals are not simply tagged unions. They happen to be implemented that way in most cases but not all (see optional type optimization · Issue #104 · ziglang/zig · GitHub). Because of this, zig does not provide a way to access the tag of the optional value.

Your example would work if you removed the ??? line and changed the optional unwrap in the bar to a pointer dereference: f.* = 12.34;

Furthermore, the .? operator does not do what you seem to think it does. It is equivalent to orelse unreachable and can only be used to access (or “unwrap”) the value, not mutate it.

pqsrc · December 13, 2024, 12:05am

pub fn main() void {
    var foo: ?u8 = 0; // you may find it instructive to replace 0 with null here as well
    foo.? = 10;
    std.debug.print("{any}\n", .{foo});
}

mnemnion · December 13, 2024, 12:15am

I didn’t actually know that you can use a .? unwrap in an lvalue position, learn something new every day.

I suppose there’s no reason for it to be illegal, although I also can’t come up with a practical use for it. All it appears to do is create an unnecessary opportunity to crash the program.

The bottom line is that Zig optionals aren’t syntactically or semantically tagged unions. As an implementation detail, they are sometimes. So when you’re trying to figure out a way to apply the “Some tag”, you’re using another language, or trying to use Zig as though it were another language. That’s not how this works.

pqsrc · December 13, 2024, 12:36am

I wouldn’t say that. Zig is about optimal code, and setting only the bytes you need to set is a very Zig thing to do. If you have ?[4096]u8, it would be wasteful to memset 4097 bytes instead of setting 1 byte.

No matter how the niche is stored, the compiler knows what bits are interpreted as the tag, so of course it’s possible for the compiler to set those bits. The compiler has full knowledge of the data layout. It feels like you’re just saying “this isn’t possible, therefore it’s not possible”, justifying the status quo circularly.

Since the langref calls out this exact pattern as valuable enough to demonstrate, if this indeed isn’t possible, I’m tempted to say it’s an oversight in the design of the language.

mnemnion · December 13, 2024, 12:49am

test "block o'bytes" {
    var opt_block: ?[4096]u8 = undefined;
    opt_block.?[0] = '!';
    try expectEqual('!', opt_block.?[0]);
}

pqsrc · December 13, 2024, 1:01am

That is undefined behavior. You can’t set something to undefined and then immediately assert it’s not null.

mnemnion · December 13, 2024, 1:11am

That’s not correct, updated:

test "block o'bytes" {
    var opt_block: ?[4096]u8 = undefined;
    opt_block.?[0] = '!';
    try expectEqual('!', opt_block.?[0]);
    try expect(opt_block != null);
    opt_block = null;
    try expect(opt_block == null);
}

This is not undefined behavior, it’s just a bad idea.

I encourage you to step through this with a debugger.

pqsrc · December 13, 2024, 1:20am

It certainly is undefined behavior. You’re merely getting lucky that in the first = undefined is choosing 0xaaaa… (a non-null bit pattern). If you compile in release mode and the OS zeroes the stack, you might end up with 0x0000… (null bit pattern) instead. Change the = undefined to = null to see what happens then.

mnemnion · December 13, 2024, 1:22am

You’re right, I take it back. Should have tried it in ReleaseFast before replying.

@as([4096]u8, undefined) does work. This is a case of illegal behavior which isn’t safety-checked, I’ll look for an existing issue and otherwise file one.

Calder-Ty · December 13, 2024, 12:41pm

To @LucasSantos91 point, the semantics of optionals is that a value is either a valid T or null. Trying to add the third type ruins the semantics of being optional.

using the example from the langref you cite, why can’t the signature be like this? (Per @n0s4 suggestion)

const std = @import("std");

const Foo = ?f32;

pub fn main() void {
    var f: Foo = null;
    bar(&f);
    std.debug.print("value: {}\n", .{f.?});
}

fn bar(f: *Foo) void {
    f.* = 12.34;
}

Why do we need to know it’s not null before we assign to it?

chadwain · December 13, 2024, 4:45pm

Actually, in Debug mode, this works for all types T.

fn bar(comptime T: type) !void {
    var x: ?T = null;
    x = @as(T, undefined);
    try expect(x != null);
}

But in ReleaseFast, it fails for pointers and function pointers, because the undefined value for those types at runtime is (not guaranteed to be, but is usually) 0, which is the same as the null value. But I think this code should work for all types, so this is a language design flaw.

Sze · December 13, 2024, 5:31pm

Why should it work for all types?
Optionals don’t need to be assigned (some undefined), because you can assign null to them instead, undefined is for when you don’t have a valid value, but optionals can always be set to null.

It doesn’t make sense to me to set an optional to a non-null value that is undefined, that just means you don’t have a proper value for it and in that case it should be set to null.

I don’t understand how having an optional set to (some undefined) makes any sense. The only way it makes sense to set an optional to undefined, is if the whole optional is set to undefined and in that case it later needs to be initialized to either null or (some valid-value).

Calder-Ty · December 13, 2024, 5:35pm

But I think this code should work for all types, so this is a language design flaw.

I’m not sure why it should work. null is a defined value. It means “The value is not present” when we look at optional types. undefined is not a defined value. To quote the language reference:

undefined means the value could be anything, even something that is nonsense according to the type. Translated into English, undefined means “Not a meaningful value. Using this value would be a bug. The value will be unused, or overwritten before being used.”

So when you set X to be undefined, even if you mask it behind a cast, there is no guaranteed data there. Because the promise with undefined is that you will never use the variable again, or that you will set the value to something meaningful before using it.

I understand that there is some semantic overlap, wherein for an optional type null, the usage looks a lot like usage of undefined. But that further underscores the point that setting an optionally typed variable to undefined makes little sense. Just set it to null, if we know that there isn’t a valid value for it.

chadwain · December 13, 2024, 5:57pm

null is a defined value. It means “The value is not present”

This part is absolutely correct. On the other hand, undefined means “the value is present, but unknown.” undefined and null are two orthogonal concepts, and the distinction between them should be upheld.

The fact that my code block works for nearly all types except for pointers is almost certainly an oversight caused by Zig’s null pointer optimization. Somewhere in the world, someone’s generic code is probably broken because of it.

But that further underscores the point that setting an optionally typed variable to undefined makes little sense.

I understand that the use cases are somewhat rare, but for consistency’s sake, I think it would be good if it was supported by the language properly.

pqsrc · December 13, 2024, 6:10pm

The use case is partial initialization, just like in the langref example. One could also ask of the langref example, “why do we need to know it’s a float before we assign it”, and the answer is Zig’s design goal of optimality. You have an uninitialized value, then set some bytes now, and some later, such that all were set exactly once. Sure we could assign the tag over and over again, wasting CPU cycles, but that would not be optimal, and that’s why Zig’s design gives us a way to avoid this waste (for unions, but seemingly not for optionals without an ugly cast).

pqsrc · December 13, 2024, 6:19pm

I think you’re getting the layers mixed up. The “undefined” is strictly within the payload of the optional, and not affected its null-ness or lack thereof, just like a struct field being undefined does not “break” other members of the struct. Leaning again on the langref example:

var f: Foo = .{ .float = undefined };
f.float = 12.34;

Here the first line sets the tag, and the second line sets the value. At the end, the value is fully initialized.

Here’s the equivalent code for optionals:

var f: ?f32 = @as(f32, undefined);
f.? = 12.34;

If there are any types this doesn’t work with, I’m with @chadwain, this is a codegen bug.

Sze · December 13, 2024, 6:51pm

In status quo Zig you have to distinguish between optionals that are implemented via a sentinel value that represents null and those that are implemented via a tag, for the former there is no separation between the value of the payload and the null value, because there is no separate tag.

You can call that a bug, but I call it a feature that allows me to use optionals with pointers in a useful way that doesn’t waste memory in the majority of cases (if we didn’t have that optionals would be too costly), with something like this enum-backed address spaces · Issue #21870 · ziglang/zig · GitHub you might get your wish, if that opens up enough bits to do tag based tracking of optionals in all (or enough) cases, without making optionals something that would need to be avoided by users.

So I guess I can agree that it may be nice to have optionals which are cheap and always tag based, but if we can’t have it I am fine with what we already have and having to distinguish between sentinel and tag based optionals in a few corner cases.

pqsrc · December 13, 2024, 9:06pm

You’re describing a world where we have to choose between space-efficient optionals, and ability to assign the tag, but that dichotomy doesn’t exist in reality. The tag doesn’t need to be directly addressable in order to merely assign to it. For example, imagine on x86_64 that the compiler chooses to store a ?*Foo (64 bits) in rax, with null being 0x0, and then you run:

var p: ?*Foo = ...;
// ...
p = @as(*Foo, undefined);

It would be valid for the compiler to output:

mov rax, 1

Now p is non-zero, and thus non-null. We fixed the bug, and kept efficient optionals. There’s no reason we can’t have both. This doesn’t depend on the address space proposal.

n0s4 · December 15, 2024, 8:37pm

I understand where this is coming from in theory, but I struggle to see how this is worthwhile. Adding such specific, niche behaviour increases the complexity of the language, which is a significant price.

Along with being a very rare use case (which nobody has been able to come up with an example of), if it ever was relied on, I would be surprised if the performance savings would ever be even observable, never mind being a priority for performance in any sanely written application.

These are just my two cents. I think there are better battles to fight than this one.