Partially Matching Zig Enums

16 Likes

Backstory: I needed partial enum match for cli: refactor main.zig by matklad · Pull Request #3152 · tigerbeetle/tigerbeetle · GitHub, got annoyed enough to start writing a bug report about compiler complaining about non-exhaustive switches over comptime-known values, and then it dawned on me…

3 Likes

Not a meaningful difference, but note that you can also use @compileError for this:

    switch (u) {
        inline .a, .b, .c => |_, ab| {
            handle_ab();
            switch (ab) {
                .a => handle_a(),
                .b => handle_b(),
                else => @compileError("must be a or b"),
            }
        },
    }
switch.zig:14:25: error: must be a or b
                else => @compileError("must be a or b"),
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6 Likes

this is really cool! I had no idea you could use inline on anything other than else. bytecode VMs are top of mind right now, which is a place where this would be really valuable for me:

  dispatch: switch (instruction) {
      inline .load0, .load1, .load2, .load3, .load => |_, tag| {
          const slot = switch (tag) {
              .load => self.read(u8),
              else => @intFromEnum(tag) - @intFromEnum(.load0),
          };
          self.push(self.locals[slot]);
          continue :dispatch self.read(Instruction);
      },
      // ...
  }

2 Likes

This reminds me of a thread from the end of last year: Sub switch pattern - #4 by joed

I threw together a type-safe way of doing something very similar to this, without having to spell out every case for the inline case, and without having to explicitly add a comptime unreachable; allowing you to write cocde like this:

const Operator = enum {
    pub const Arity = enum { nullary, unary, binary };

    halt,
    neg,
    inc,
    add,
    sub,

    pub fn arity(self: Operator) Arity {
        return switch (self) {
            .halt => .nullary,
            .neg, .inc => .unary,
            .add, .sub => .binary,
        };
    }
};

pub fn main() !void {
    var op: Operator = .neg;
    _ = .{&op};

    switch (groupBy(op, Operator.arity)) {
        .nullary => |o| {
            switch (o) {
                .halt => std.debug.print("halt\n", .{}),
            }
        },
        .unary => |o| {
            switch (o) {
                .neg => std.debug.print("negation\n", .{}),
                .inc => std.debug.print("increment\n", .{}),
            }
        },
        .binary => |o| {
            switch (o) {
                .add => std.debug.print("addition\n", .{}),
                .sub => std.debug.print("subtraction", .{}),
            }
        }
    }
}

I have a modified and hacky version at GitHub - joedavis/metax: Miscellaneous metaprogramming facilities for Zig that works with tagged unions as well as enums, but I haven’t tested it in a few months on recent zig versions.

1 Like

I was going to argue for this instead:

dispatch: switch (instruction) {
      .load0, .load1, .load2, .load3 => {
          const slot = @intFromEnum(tag) - @intFromEnum(.load0);
          self.push(self.locals[slot]);
          continue :dispatch self.read(Instruction);
      },
      .load => {
          const slot = self.read(u8);
          self.push(self.locals[slot]);
          continue :dispatch self.read(Instruction);
      },
      // ...
  }

which, in general, I think would be more preferable. If the self.push(self.locals[slot]); were nontrivial, you could call a common function between the two prongs.

This avoids generic bloat for the loadX tags. However… given that this is a hot dispatch loop, and the logic is trivial, the inline bloat might actually be helping you out since there will be 5 continue :dispatch sites rather than 2, potentially improving branch prediction. Furthermore, having a separate prong for each tag could help the optimizer with lowering to a jump table.

You’ll have to measure and report back!

2 Likes

this was exactly my thinking! in any case, it’s neat that the tools are there. like in @joed’s linked post - you can imagine using this for operators, especially when typed like .int32_add/.int64_add/.f32_add, etc

1 Like

In real-code, I’d recommend not using my solution, and instead @matklad’s from the original post. Looking at the dates, I was more than likely a bit tipsy during some downtime while prepping new years dinner for my partner and in-laws when I came up with this solution.

1 Like

For posterity, the post I actually was writing while I got destructed with tagged union matching:

4 Likes

First of all – really excellent article, I enjoyed reading it end to end.

This form used to be rather important, as Zig lacked a counting loop. It has for(0..10) |i| form now, so I am tempted to call the while-with-increment redundant.

Annoyingly,

while (condition) {
    defer increment;

    body
}

is almost equivalent to

while (condition) : (increment) {
  body
}

But not exactly: if body contains a return, break or try, the defer version would run the increment one extra time, which is useless and might be outright buggy. Oh well.

I’ve run into this before. I was wondering if this could be solved by a continuedefer-like keyword.

From Declaration Literals:

@coerceTo

This builtin doesn’t exist. I’m assuming you meant @as

That’s pseudocode! I am talking specifically about the actual coercion operation applied by the compiler (eg, widening from u8 to u32). In my mental model, @as doesn’t do coercion (the user-space certainly doesn’t), it only sets the result type. Coercion is then inserted by compiler when it notices that the actual and expected types are different, but compatible.

  • How is a reader supposed to know that?
  • Using pseudocode (without mentioning it) in a post talking specifically about the syntax of a language seems like a strange choice
4 Likes

Nice article. I agree Zig’s syntax is lovely. (and also agree that one while with incrementer syntax is a bit bleh.)

While pointer type is prefix, pointer dereference is postfix, which is a more natural subject-verb order to read:

I do think there’s a better way to phrase the reason than “natural subject-verb” ordering as a justification for this: as you read from left to right, the operations work on the type from left to right. I believe this ordering is true of all unconditional reductions of a more complicated type to a simpler type in Zig.

Eg,
if value is a *const ?[3]u32, and you do value.*.?[0], where .* removes *const, .? removes the ?, [0] removes the [3], leaving only u32. This is also true of field accessing of structs and unions.

Indeed, in languages that follow C’s pointer syntax, pointer following is self inconsistent. In Zig a single item pointer, a multi-item pointer, and a slice (all forms of pointers) have their accessor on the right.

Where this pattern doesn’t extend shows a different pattern: Accesses which do invoke control flow tend to wrap the expression: if and while for options, for for arrays and slices, and switch for enums and numbers.

orelse, catch, and, and or are kind of together in their third class as binary operators with control flow, where they sit between two expressions.

2 Likes

I like how while syntax lends itself to pointer chasing.

var current: ?*Node = head;
while (current) |c| : (current = c.next) {
  // ...
}

As for the value.* syntax, here’s how I think about it:

(assuming value is a pointer)
Want to set field foo of value? value.foo = ...
Want to set field bar of value? value.bar = ...
Want to set all of what value points to? value.* = .{ ... }
value is a (pointer to a) number? value.* = 42

So to me the .* is like saying “all the fields / content” :^)

2 Likes

Nice post!

if body contains a return, break or try, the defer version would run the increment one extra time, which is useless and might be outright buggy.

fwiw I find it useful for custom iterators:

pub const FilteredIterator = struct {
    items: []Actor,

    // iterator state
    index: usize = 0,

    // filters
    flags: Actor.FlagSet = .initEmpty(),

    pub fn next(it: *FilteredIterator) ?Actor.Handle {
        while (it.index < items.len) {
            defer it.index += 1;
            if (it.filterAccept()) |handle| return handle;
        }
        return null;
    }

    /// return current handle iff it passes the filters
    fn filterAccept(it: *FilteredIterator) ?Actor.Handle { ... }
}

Before I used defer I kept forgetting this crucial line, causing an infinite loop because the iterator would return the same actor over and over again:

        while (it.index < items.len) : (it.index += 1) {
            if (it.filterAccept()) |handle| {
                it.index += 1; // don't forget this line
                return handle;
            }
        }
1 Like

Thanks, that’s another great reason for why this syntax form is bad =]

1 Like

Related reading : Tagged Union Subsets with Comptime in Zig – Mitchell Hashimoto

2 Likes