Optional tagged union

Suppose I have a tagged union A and then an array of optionals ([N]?A).

To save the extra memory for the optional tag ?A could be replaced
by a tagged union B with an additional .null tag.
Maybe this would make sense in some situations?

const std = @import("std");
const print = std.debug.print;

const A = union(enum) {
    i: i64,
    f: f64,
};
const B = union(enum) {
    null,
    i: i64,
    f: f64,
};

const aa: [3]?A = .{ null, .{.i = 1}, .{.f = 2.3} };
const bb: [3]B = .{ .null, .{.i = 1}, .{.f = 2.3} };

pub fn main() void {
    print("{} {}\n", .{@sizeOf(?A), @sizeOf(B)}); // 24 16

    for(&aa) |opt| if (opt) |a| switch(a) {
        .i => |x| print("a.i = {}\n", .{x}),
        .f => |x| print("a.f = {}\n", .{x}),
    };
    for(&bb) |b| switch(b) {
        .null => {},
        .i => |x| print("b.i = {}\n", .{x}),
        .f => |x| print("b.f = {}\n", .{x}),
    };
}
1 Like

yes it does save space, which is useful in situations.

the plan is for zig to eventually do this kind of optimisation on its own.

8 Likes

Umm, can you elaborate more how does it save space? I feel like there needs to be some sort of “null” identifier to detect the null tagged variant right?

2 Likes

In Rust, this is called the “niche optimization”. The rule is: if there is even one invalid state for data to be in, that state is used to represent None. If there are many, one of them is chosen.

Example: a struct SomeThing has a *Foo field. Since it isn’t ?*Foo, it can’t be null / 0, so a ?SomeThing can use the number 0 on that field to represent null.

8 Likes

okay, so from the op’s post struct B either needs space for i/f and space for the identifier? but for ?struct A it needs space for identifier and the max size of the fields + additional tracking space if it’s optional?

The enum tag from the tagged union already needs some space, but it only needs a single bit to distinguish between the two fields, by using another bit alongside the tag-enum it should be possible to store ?A with almost no (additional) overhead, in the end that optimization would be similar to what is done with B manually, but done automatically in the background by the compiler.