About optional union(enum) niche optimization

It actually echoes my initial question—if we could implement niche optimization in Zig (perhaps requiring significant effort to achieve), allowing the error and boolean flag in error!?T to be placed into T’s unused bits or shared the same bytes like Rust (playground) does, then we wouldn’t need to manually flat sum types into one tagged union for compact memory layout.

2 Likes

I agree that optimizing issues could also be helpful for tagged unions; for example, we can actually do it this way.:

const MaybeNode = union(enum) {
    unparsed: void,
    unspecified: void,
    parsed_null: void,
    node: Node,
};
const Node = union(enum) {
    a: A,
    b: B,
    c: C,
};

Niche optimization is still effective here.

But I am not sure that the lack of this optimization is the only explanation for this problem. It is also possible to look at this problem from another perspective, for example, considering it as a ‘subtype’ issue of an enum.
There may also be some situations, for example, we only need to consider the Node of a and b but not c. At this point, we face the same choice as with optional types: either construct a new type specifically for this, or continue using the Node type, but internally use unreachable to mark which tags need not be considered. I believe this problem is equivalent to using the markers of optional types, so I am not sure that the lack of niche optimization is the only interpretation of this problem.

1 Like

Seems to be the same basic thing, no? Instead of an unused niche in just the enum, use an unused niche in the enum tag to represent the null case.

I do agree that this isn’t as nice as having niche optimization built in to Zig, but in the meantime?

I think the idea here is that, instead of having ?Union, you have MaybeUnion and Union kinds: converting from MaybeUnion to Union has the return type ?Union, going the other way always succeeds, with a builder to create the “null” version of MaybeUnion. If you ensure that the two enum types have the same values except for the null case, conversion should be free (the language doesn’t guarantee this, I mean in practice) from Union to MaybeUnion, and as cheap as using an optional in the other direction.

One does have to call functions to get the nice optional-style control flow, but the type-safety is still there: code can distinguish cleanly between the cases which might be null, and the ones which must not be.

I’ve done this with enums (manually, not using comptime wizardry): one kind is the “at rest” enum, which includes the value “.nothing”, the other is the “in flight” enum, which includes all the other values, only.

It’s adequate, let me put it that way. Zig is certain to have niche optimization eventually, it’s just that it’s a lot of work and no one has decided to do it. Great project for someone who wants to hack the compiler though.

4 Likes

This niche enum and a bare union would work.

Looking at the question again, I think some talks the Zig team have given may be helpful to you. Eg, this one where mlugg goes into how the Zig compiler efficiently stores an ast. It’s not a direct answer to your question (niche optimization), but it probably helps solve what you’re actually trying to solve (efficient ast storage).

I’ve been a fan of “flattened AST” ever since I learned about it a few years ago. I’ve implemented some analysis tools with parser components in Rust using this concept, but it doesn’t fit my current scenario.
Sorry I didn’t make this clear earlier: I’m writing a JS optimizer that will have several passes, and within each pass there will be extensive modifications to the AST structure (creating nodes, replacing nodes, moving nodes, merging nodes…). I believe a traditional pointer-based AST would make life much easier.

1 Like