Advanced use of comptime: Tagged Union Subsets

dimdin · September 24, 2024, 8:09am

https://mitchellh.com/writing/zig-comptime-tagged-union-subset

Tosti · September 25, 2024, 7:52pm

I always try to stick to the rule “a function should take as parameters only what it really requires and nothing more”. scope function takes Action, but it doesn’t require a payload of the union, which is apparent from its body. What it really requires is the tag type. So, I would do something like this.

const ActionTag = @typeInfo(Action).@"union".tag_type.?;
pub fn scope(action_tag: ActionTag) Scope {
  return switch (action_tag) {
    .quit, .close_all_windows, .open_config, .reload_config => .app,
    .new_window, .close_window, .scroll_lines => .terminal,
  };
}

There is an additional benefit: you don’t have to construct a variable of type Action just to pass it to scope. @unionInit(Action, field.name, undefined) will become @field(ActionTag, field.name), which is simpler and doesn’t involve undefined.

I wanted to test my suggestion, but it turned out that switch over ScopedAction(.app) and ScopedAction(.terminal) is not allowed.

const elem = @unionInit(ScopedAction(.app), "quit", {});
switch (elem) {
    .quit => std.debug.print("quit", {}),
    else => {},
}

src\main.zig:81:13: error: switch on union with no attached enum
    switch (elem) {
            ^~~~
src\main.zig:46:10: note: consider 'union(enum)' here
  return @Type(.{ .@"union" = .{
         ^~~~~

@"union" is initialized with .tag_type = null. ScopedAction also has to construct a tag type and use it here instead of this null. This is my solution. Maybe it can be simplified somehow.

pub fn ScopedAction(comptime s: Scope) type {
  const action_tag_info = @typeInfo(ActionTag).@"enum";
  const action_info = @typeInfo(Action).@"union";

  const all_enum_fields = action_tag_info.fields;
  const all_union_fields = action_info.fields;

  var i: usize = 0;
  var enum_fields: [all_enum_fields.len]std.builtin.Type.EnumField = undefined;
  var union_fields: [all_union_fields.len]std.builtin.Type.UnionField = undefined;
  for (all_enum_fields, all_union_fields) |enum_field, union_field| {
    const action_tag = @field(ActionTag, enum_field.name);
    if (scope(action_tag) == s) {
      enum_fields[i] = enum_field;
      union_fields[i] = union_field;
      i += 1;
    }
  }

  const log2_i = @bitSizeOf(@TypeOf(i)) - @clz(i);

  const ScopedActionTagType = @Type(.{ .int = .{
    .signedness = .unsigned,
    .bits = log2_i,
  } });

  const ScopedActionTag = @Type(.{ .@"enum" = .{
    .tag_type = ScopedActionTagType,
    .fields = enum_fields[0..i],
    .decls = &.{},
    .is_exhaustive = action_tag_info.is_exhaustive,
  } });

  return @Type(.{ .@"union" = .{
    .layout = action_info.layout,
    .tag_type = ScopedActionTag,
    .fields = union_fields[0..i],
    .decls = &.{},
  } });
}

Edit: make ScopedActionTag’s backing type an unsigned integer of optimal size.

mnemnion · October 13, 2024, 8:04pm

I happen to be working on something which uses this technique, and hit a snag. Not a big one, but I’m adding some notes here on how to adapt the technique in the blog post to the 0.14 master branch target.

Running an adaptation of the subsetting code gave this error:

error: switch on union with no attached enum

I have to assume this is a new development, the code in the blog uses the old syntax for Type enums so I have to assume that 0.13 did not impose this requirement.

No problem! We need an enum… let’s make one.

If you refer to the original post the problem arises here:

  // Build our union
  return @Type(.{ .Union = .{
    .layout = .auto,
    .tag_type = null,
    .fields = fields[0..i],
    .decls = &.{},
  } });
}

See that null field? No longer legal on master.

But we can easily adapt the technique used to make a union subset, to make an enum subset as well:

pub fn ScopedActionEnum(comptime s: Scope) type {
    const e_info = @typeInfo(ActionEnum);
    const all_fields = e_info.@"enum".fields;
    var i: usize = 0;
    var fields: [all_fields.len]std.builtin.Type.EnumField = undefined;
    for (all_fields) |field| {
        const action = @unionInit(Action, field.name, undefined);
        if (action.scope() == s) {
            fields[i] = field;
            i += 1;
        }
    }
    return @Type(.{ .@"enum" = .{
        .tag_type = e_info.@"enum".tag_type,
        .fields = fields[0..i],
        .decls = &.{},
        .is_exhaustive = true,
    } });
}

This is very similar code, and works for a hopefully obvious reason: an enum and a union tagged with it have the same field names, so we can use the same scope() function to match to the scope.

Then modify the original ScopedAction code accordingly:

/// Returns a union type that only contains actions that are scoped to
/// the given scope.
pub fn ScopedAction(comptime s: Scope) type {
  const all_fields = @typeInfo(Action).Union.fields;

  // Find all fields that are scoped to s
  var i: usize = 0;
  var fields: [all_fields.len]std.builtin.Type.UnionField = undefined;
  for (all_fields) |field| {
    const action = @unionInit(Action, field.name, undefined);
    if (action.scope() == s) {
      fields[i] = field;
      i += 1;
    }
  }

  // Construct the corresponding enum
  const ScopedEnumType = ScopedActionEnum(s);

  // Build our union
  return @Type(.{ .Union = .{
    .layout = .auto,
    .tag_type = ScopedEnumType,
    .fields = fields[0..i],
    .decls = &.{},
  } });
}

With that, we’re back in business.

This has a nice additional property: the enum fields carry the value, so tags of the same name carry the same value, of the same type, whether scoped or global.

Challenge to the interested reader: use this ability to write a generic function which casts any ScopedAction back to the superset Action.

mnemnion · October 13, 2024, 8:06pm

I see I got scooped on this. Hah!

I would weakly argue that being able to create the enum itself separately from the union has some advantages. But fundamentally these are the same technique.

Now if only there were a way to add declarations to a reified Type…

Sze · October 14, 2024, 12:05am

My suspicion is that not having it is the feature of keeping comptime meta programming within the realm of “more understandable meta programming logic”.

Adding the possibility of declarations would lead towards things being done with meta programming that less and less people are able to understand and maintain.

I am also not convinced that those more complex forms of meta programming would actually be good based on how much “bang for buck” value they provide, for example if Zig allowed that it would be closer towards C++ operator overloading territory again (because people would use it to create ugly math DSLs and other redundant ways to do the same thing).

At least that is my current view/stance on that topic.

I think if people create DSLs those are better done via the buildsystem/buildsteps as small languages.

mnemnion · October 14, 2024, 1:27am

My suspicion is that it’s not obvious how to add it, more than anything. Variation on why the Declaration type is just a name.

When you can reify a type by constructing it from parts, I see no obvious reason to have that ability, but not be able to associate behaviors with that type. I see no principled “here, but no farther” on this. Zig has reified types, it has declarations on not-reified types, I don’t think combining these two features would make things hard to understand.

I think this has approximately nothing to do with operator overloading. You would have ReifiedType: type, you’d stick pub fn doSomething(rt: ReifiedType) void on it (this is the challenging part!), and call it with reified_type.doSomething(). That doesn’t hide control flow, or anything really, it’s just a type built from parts, a thing we already have, and a member function, another thing we already have.

You’d never be left scratching your head wondering what + means, which is the main problem with operator overloading. This is just a function you can look up, attached to a type you can also look up.

Again, I don’t think DSLs play a role here.

Like, concretely, it would be nice to add a declaration function to ScopedCommand(.text) or whatever, so I can turn it back into an ordinary Command. That isn’t going to break the bank on language complexity. I just end up writing it as a not-member function, so it’s toCommand(cmd: ScopedCommand(.text) Command, but I have to call it with toCommand(cmd) instead of cmd.toCommand().

Clearly not a big deal either way, but the latter version is more idiomatic Zig, I’m sure you’d agree. I don’t understand the objection basically.

Like this syntax probably doesn’t work, ok, but forgetting that for a second:

const NewType =  @Type(.{ .@"enum" = .{
        .tag_type = e_info.@"enum".tag_type,
        .fields = fields[0..i],
        .is_exhaustive = true,
        .decls = struct {
             pub fn doSomething(thing: NewType) []const u8 {
                 return "This did something";
             }
        },
    } });

This isn’t any harder to understand than the version without the anonymous struct container, it just does something. A language server should be able to do go-to-definition and hover actions on this no problem.