Optional tagged union

kudu · May 1, 2025, 7:53am

Suppose I have a tagged union A and then an array of optionals ([N]?A).

To save the extra memory for the optional tag ?A could be replaced
by a tagged union B with an additional .null tag.
Maybe this would make sense in some situations?

const std = @import("std");
const print = std.debug.print;

const A = union(enum) {
    i: i64,
    f: f64,
};
const B = union(enum) {
    null,
    i: i64,
    f: f64,
};

const aa: [3]?A = .{ null, .{.i = 1}, .{.f = 2.3} };
const bb: [3]B = .{ .null, .{.i = 1}, .{.f = 2.3} };

pub fn main() void {
    print("{} {}\n", .{@sizeOf(?A), @sizeOf(B)}); // 24 16

    for(&aa) |opt| if (opt) |a| switch(a) {
        .i => |x| print("a.i = {}\n", .{x}),
        .f => |x| print("a.f = {}\n", .{x}),
    };
    for(&bb) |b| switch(b) {
        .null => {},
        .i => |x| print("b.i = {}\n", .{x}),
        .f => |x| print("b.f = {}\n", .{x}),
    };
}

vulpesx · May 1, 2025, 8:09am

yes it does save space, which is useful in situations.

the plan is for zig to eventually do this kind of optimisation on its own.

korke · May 1, 2025, 1:21pm

Umm, can you elaborate more how does it save space? I feel like there needs to be some sort of “null” identifier to detect the null tagged variant right?

Sze · May 1, 2025, 1:23pm

github.com/ziglang/zig

optional type optimization

opened 03:25AM - 02 Feb 16 UTC

andrewrk

optimization proposal accepted

Track which values are used when data sizes are bigger than necessary and use th…ose states for maybe types. For example, `?bool` could use value `2` of the bool as the null value, and `??bool` can use value `3` for the null value, and so on. If we have spaces in structs for alignment purposes, that space could be used for the null value, or if any of the fields have null space available, such as a struct with a bool field in it somewhere.

mnemnion · May 1, 2025, 2:49pm

In Rust, this is called the “niche optimization”. The rule is: if there is even one invalid state for data to be in, that state is used to represent None. If there are many, one of them is chosen.

Example: a struct SomeThing has a *Foo field. Since it isn’t ?*Foo, it can’t be null / 0, so a ?SomeThing can use the number 0 on that field to represent null.

korke · May 2, 2025, 7:04pm

okay, so from the op’s post struct B either needs space for i/f and space for the identifier? but for ?struct A it needs space for identifier and the max size of the fields + additional tracking space if it’s optional?

Sze · May 2, 2025, 7:32pm

The enum tag from the tagged union already needs some space, but it only needs a single bit to distinguish between the two fields, by using another bit alongside the tag-enum it should be possible to store ?A with almost no (additional) overhead, in the end that optimization would be similar to what is done with B manually, but done automatically in the background by the compiler.