SubType: a mechanism to catch impossible cases at comptime

I occasionally run into situations where it seems like the compiler should be able to infer that particular switch cases are impossible. Here’s an example:

const Abc = enum { a, b, c };
fn example(abc: Abc) void {
    switch (abc) {
        .a => {},
        .b, .c => |bc| {

            //
            // ...common code for b and c...
            //

            switch (bc) {
                .a => unreachable,
                .b => {
                    // b-specific code
                },
                .c => {
                    // c-specific code
                },
            }
        },
    }
}

In this example we know .a in the nested switch is impossible but, we’re switching on a full Abc enum value so the compiler doesn’t know that. Actually, if our Abc enum was instead an error type error {a, b, c}, then we could capture a new “sub error” value of type error{b, c}and we’d no longer need the .a => unreachable case.

const AbcError = error{ a, b, c };
fn errorExample(abc: AbcError) void {
    switch (abc) {
        error.a => {},
        error.b, error.c => |bc| {
            switch (bc) {
                // compiler knows .a is impossible
                error.b => {},
                error.c => {},
            }
        },
    }
}

Unlike enums, Zig can do this with errors because it’s able to generate error sub types. This is because errors are globally name-spaced so creating an error{ b, c } type from an error{ a, b, c} type is no problem. Not so with enums. Zig doesn’t have a way to create “sub types” of other types that limit the set of possible values, but, what if it did? Here’s what that could look like:

@SubType(comptime T: type, comptime Values: type) type

i.e.

const Abc = enum{ a, b, c };
@SubType(Abc, enum {b, c}) // a sub type of Abc that can only be b or c

The first example would become this:


const Abc = enum { a, b, c };
fn example(abc: Abc) void {
    switch (abc) {
        .a => {},
        .b, .c => |bc| {
            std.debug.assert(@TypeOf(bc) == @SubType(Abc, enum{ b, c }));
            switch (bc) {
                // .a is impossible, not a member of our sub type
                .b => {},
                .c => {},
            }
        },
    }
}

Note that you could create a SubType function in userspace today, but, it would have to generate a new unique type with no association to the parent type. With @SubType, values are implicitly convertible to the parent type and, it has the same declarations as the parent type. And of course more importantly, without language support the compiler couldn’t integrate support into the switch statement.

Note that this ideas also applies to tagged unions:

const AbcTaggedUnion = union(enum) {
    a: i32,
    b: []const u8,
    c: f32,
};
fn example(abc: AbcTaggedUnion ) void {
    switch (abc) {
        .a => |int| {},
        .b, .c => |bc| {
            std.debug.assert(@TypeOf(bc) == @SubType(AbcTaggedUnion, enum{ b, c }));
            switch (bc) {
                // .a is impossible, not a member of our sub type
                .b => |string| { },
                .c => |float| { },
            }
        },
    }
}
8 Likes

I know little of how feasible it is for the compiler to implement this, or if it is reasonable to expect the compiler team to spend any time to do this, but I really do like the idea.
With the tagged union bit especially, since there have been a few times where I have had a greater type that holds a union and then a function that expects data specifically from that type but only acts on a subset of the possible union values.

I think it would be really nice to have a feature like this. But I can see it causing issues in generic code:

const Abc = enum { a, b, c };
fn example(abc: Abc, comptime special_value: Abc) void {
    switch (abc) {
        .a => {},
        .b, .c => |bc| {
            switch (bc) {
                special_value => {}, // What happens here?
                else => {},
            }
        },
    }
}

If special_value == .a, then this would cause a compile error in the inner switch, whereas it’s fine in status quo. You could try to prevent it with something like if (special_value != .a) {...}, but then you’re basically back to status quo.

If the compiler could infer subtypes for enums and tagged unions, this would be handy.
Not sure if I like the name of the built-in function, though, but I don’t know a better one.

I had another weird thought. As far as syntax goes, I realized Zig already has a way to specify value subsets. The switch statement itself already does this. What would it look like if we re-used that same syntax to specify the resulting capture type for each prong?

For context here’s our normal switch syntax:

switch (VALUE) {
    CASE => HANDLER ,
}

// i.e.
switch (foo) {
    0 => bar(),
}

The syntax for CASE in a switch is where Zig already supports specifying sets of values. Let’s imagine a new syntax around it. We could re-use the switch keyword but, let’s pick a new one to make the examples clear:

switchtype(TYPE){ CASE }

// i.e. our Abc enum limited to just b and c
switchtype(Abc){ .b, .c }

// i.e. a u8 limited to values 0 to 100, 110 and 120
switchtype(u8){ 0 ... 100, 110, 120 }

Our Abc example becomes this:

const Abc = enum { a, b, c };
fn example(abc: Abc) void {
    switch (abc) {
        .a => {},
        .b, .c => |bc| {
            std.debug.assert(@TypeOf(bc) == switchtype(Abc){ .b, .c });
            switch (bc) {
                .b => {},
                .c => {},
            }
        },
    }
}

In the general case, the following should hold:

switch (VALUE) {
    CASE => |CAPTURE| {
        assert(@TypeOf(CAPTURE) == switchtype(@TypeOf(VALUE)){ CASE });
    },
}

//i.e.
switch (@as(u8, 0)) {
    0 ... 100 => |int| assert(@TypeOf(int) == switchtype(u8){ 0 ... 100 }),
    110, 120, 200...255 => |int| assert(@TypeOf(int) == switchtype(u8){ 110, 120, 200...255 }),
    else => |int| assert(@TypeOf(int) == switchtype(u8){ 101 ... 109, 111 ... 119, 121 ... 199 }),
}

The u8 example reminded me of the generic integer range proposal allow integer types to be any range · Issue #3806 · ziglang/zig · GitHub, however, that proposal goes a step further as it decouples the underlying integer type from the range. Although, maybe that proposal would also benefit from the same syntax idea? int{ 0 ... 100, 110, 120}? Then again, if Zig already had the integer range proposal accepted, maybe the switch statement should just use those ranges for the capture types and, use other mechanisms for the other small set of types that switch supports (i.e. @TagSubType for enums/tagged unions).

1 Like

One partial alternative to this you already can use in some cases is to encode semantic meaning in the bits of the enum/tag value.

For example you could make it so that b and c share a bit that isn’t shared by any other enum value, then you could even convert that bit/enum to a flags enum and switch on whether the bc flag is set or not set.

So one way to deal with those situations is to switch on a different tag that is somehow derived/computed from the original tag. Similar to this post: Curious comptime pattern - #4 by Sze


Personally I think it would be most satisfying if the compiler could statically recognize impossible cases and then maybe even give you errors for trying to handle/write impossible cases. Although that would be complicated by comptime parameters like already pointed out by @chadwain, so maybe there would be a category of always impossible cases and the rest would remain as needing to be handled like now with unreachable.

I am not sure I like the explicit @SubType that much, I think I would prefer if the compiler could keep/analyze enough context information to know that certain cases aren’t possible / if there was more/deeper inference for switch cases.

Basically I wouldn’t want the subtype to be an actual type, I would want it to keep the original type but just with the added information about which cases are still possible. (Mostly because I don’t want my code pathes to have to deal with an combinatoric explosion in the number of concrete types they have to handle, subtypes could also make comptime/generic programming more complicated)

Maybe we could instead have something like this:

const Abc = enum { a, b, c };
fn example(abc: Abc) void {
    switch (abc) {
        .a => {},
        .b, .c => |bc| {
            switch (bc) {
                // .a is impossible, because it isn't reachable (inferred)
                .b => {},
                .c => {},
            }
        },
    }
}

fn example2(abc: Abc) void {
    switch (abc) {
        .a => {},
        .b, .c => |bc| handleBC(bc), // works
    }
    
    handleBC(abc);
    // compile error:
    // handleBC(abc: @reachable(Abc, .{.b, .c}))
    //   parameter abc called with .a reachable
}

// you only need to declare reachability
// to be able to put it inside a separate function
fn handleBC(bc: @reachable(Abc, .{.b, .c})) void {
    // bc still is of type Abc, but calling handleBC
    // with values that could be .a is a compile error
    switch (bc) {
        // .a is impossible, because it isn't part of what is reachable
        .b => {},
        .c => {},
    }
}

The amount of reachability information that is kept by the compiler could be quite different based on tradeoffs between explicitness and allowing things to be inferred. For example you could either just have switch statements that produce reduced reachability, or you also could have if-conditions propagate reachability information to their then and else body.

Or you also could have escape analysis for blocks cause reduced reachability for specific variables in the rest of the block, that way early returns like if(abc == .a) return; would only make the remaining cases reachable. But I think it would always be about what can be known statically, so things that are only known at runtime wouldn’t cause reduced reachability and would still need cases within switch statements.

1 Like

Related proposal: proposal: enum switch narrowing · Issue #12863 · ziglang/zig · GitHub

1 Like

Isn’t this essentially describing a specific case of data flow analysis or am I understanding something wrong?

I am tbf quite new to compilers in specific, but at least how I understand both things (data flow analysis and this thread), this should be that, shouldn’t it?

It might be a rather ugly solution. I am considering a solution that regards enum as integers and complex control flows as state machines.

const Abc = enum(i3) { a = -1, b = -2, c = 1 };

fn EnumTagWider(comptime E: type, comptime widen_bit: comptime_int) type {
    return struct {
        pub const WidenedTag = std.meta.Int(.unsigned, @bitSizeOf(E) + widen_bit);
        const UsTag = std.meta.Int(.unsigned, @bitSizeOf(E));
        pub fn widen(tag: E) WidenedTag {
            return @as(UsTag, @bitCast(@intFromEnum(tag)));
        }
    };
}

fn example(abc: Abc) void {
    const W = EnumTagWider(Abc, 1);
    const specmask: W.WidenedTag = 1 << @bitSizeOf(Abc);
    sw: switch (W.widen(abc)) {
        W.widen(.a) => {},
        W.widen(.b) | specmask => {
            // b-specific code
        },
        W.widen(.c) | specmask => {
            // cspecific code
        },
        else => |bc| {
            //
            // ...common code for b and c...
            //
            continue :sw bc | specmask;
        },
    }
}