"Upgrading" union to union(enum)

The AIs are telling me (and based on a quick doc review, I can’t deny) that there is neither a comptime way to generate nor a syntax for upgrading an untagged union to a tagged union. This becomes relevant when you are handling struct serializations in communications settings, and you want to be able to sometimes pass around a tagged message as a union(enum) of the structs on the wire, but at other times you want to pass around a pointer to an untagged message union (e.g., because you either implicitly know the tag or are doing byte-layout operations).

My question is a bit theoretical because I am just playing around with the language to get a sense for its idioms. What I’m thinking about is: If I’ve defined a bunch of packed struct{} for the messages, and I have a union{} of those … do I really have to copy-pasta the body of the union to get a union(enum){} of those?

Hey, welcome to Ziggit!

If you’re serializing the union directly and your structs are packed anyway you’ll probably want to use a packed union. A regular union doesn’t have a guaranteed memory layout (a safety tag is added in debug and release safe mode).

Something like this should work:

const Packed = packed union { ... };
const Tagged = ty: {
    var info = @typeInfo(Packed).@"union";
    info.layout = .auto;
    info.tag_type = std.meta.FieldEnum(Packed);
    // Necessary because packed fields all have alignment 0 which is not allowed in normal fields
    // If you're using a regular union you can skip this step
    var new_fields: [info.fields.len]std.builtin.Type.UnionField = undefined;
    for (info.fields, &new_fields) |field, *new_field| {
        new_field.* = field;
        new_field.alignment = @alignOf(field.type);
    }
    break :ty @Type(.{ .@"union" = info });
};

const std = @import("std");

Note that it’s not possible to have declarations or member functions in Tagged if you do it this way because reified types can’t have those.

You could also just only have the tagged union and then serialize like this (assuming all of your union member are the same size and your backing int is a u32, of course also works for other backing ints):

const as_int: u32 = switch (un) {
    inline else => |data| @bitCast(data),
};

This way you’ll just end up with a raw int you can write whereever.

When you’re dealing with a lot of instances at once and maybe buffering them it probably makes sense to store the tag either implicitly or separately to save some memory bandwidth.
You could do that by e.g. having one ArrayList per tag or with a MultiArrayList(struct { tag: std.meta.FieldEnum(Union), data: [backing int] }) and then @bitCasting the data entries according to their tag (if they’re all the same size).

Thanks for this. So, it seems at minimum that the @TypeInfo() way of building a tagged version of the union at comptime is in fact supported. You called this a reified type, so I will have to do a bit more reading to understand runtime type information in zig because I would assume there is a reified type for textually defined types that have member functions and declarations. If the latter is true, I imagine there is type reflection to discover those member functions and declarations and you’re saying there’s just no mechanism to “dynamically” build those structures at comptime (perhaps the type reflection data are only available non-comptime, sorry I’m forgetting the terminology for non-comptime).

I’ve so far assumed that zig’s “comptime” approach yields true parity between what you can express as text source code and what you can do at comptime … but I guess there are some exceptions.

I said “serialization” but truly, I mean type punning (network gives you byte array, you point the packed union to an offset within that array and use the members to read the data). The wire layout is the memory layout.

I guess you could put the tagged union in as a member to another struct that carries the member functions…

Zig doesn’t have any runtime type information, types are a comptime-only concept and have to be fully resolved by the time the program is done compiling. The @typeInfo builtin is a way to get some information about a type at comptime. Its inverse is @Type, which takes a std.builtin.Type (e.g. one returned by @typeInfo) and turns it into a real type. You can check whether a type has some declaration or a field (there are even dedicated builtins for this, @hasDecl and @hasField), but only a textually defined type can have them. That’s an intentional limitation of the system.

I don’t think I understand why you need a tagged union here. I assume you either already know how you want to interpret the bytes by the time you receive them or the information on that is somewhere in the byte stream. Either way I don’t quite see the point in first converting it to a tagged union and then interpreting the data. Why not just interpret it as soon as you know the layout?

Thank you for the reminder on that. I read the docs several weeks/months ago and forgot that fact – I’m looking forward to finding out how that affects (or doesn’t) certain use cases. I ran through a couple of scenarios in my head just now to explore how not having concrete type data at runtime might effect capabilities, but comptime was enough.

Relative to type punning: Right, that’s not where I would use a tagged union. Rather, it’s why I would define the packed union{} vs the packed union(enum){}. Having declared the untagged union, I can see scenarios where having a tagged union would lead to more readable code – e.g. for writing test cases, i.e. for code where you want to pass the resolved variant type and either have copied the data into it or built the instance without punning.

I can’t really justify this yet. As I mentioned in my original post, my question is a bit theoretical. I’m just starting to write in Zig, and also just started this experiment.

I may find, as I build with the language, that the cases I’m thinking about are entirely resolvable without the tagged union. However, I’m assuming there are cases where the ability to switch on the union type tag at runtime will be useful, that you can’t do that with as much syntax sugar without the tagged union, and that there will be use cases where the performance of explicit switching (vs implicit codepath) is acceptable.

1 Like

I think it would make more sense to have a tagged union that is manually typed in source code with convenience methods etc. and then write a function that generates the packed union from that. So instead of upgrading a packed union to a tagged one, downgrade a tagged one to a packed one (which then may be used to more compactly store the tag info somewhere separately and serialize the union data).

So basically the whole idea of upgrading seems questionable to me, when you instead could go from the more general/descriptive to the more specialized/optimized format instead.

I would explore something more along those lines:

const PackedUnion = Packed(TaggedUnion);
const TaggedUnion = union(enum) {
    ...

    pub const toPacked = PackedMethods(@This()).toPacked;
    pub const fromPacked = PackedMethods(@This()).fromPacked;
};
pub fn PackedMethods(comptime Tagged:type) type {
    return struct {
        pub fn toPacked(self: Tagged) Packed(Tagged) {
            ...
        }
        pub fn fromPacked(p: Packed(Tagged), tag:std.meta.Tag(Tagged)) Tagged {
            ...
        }
    }
}
pub fn Packed(comptime Tagged:type) type {
    ...
}

I think it is more likely that something like a packed union ends up being treated as some number of bytes, so I would be fine with it being modeled by a packed union that is generated through reification. Where for the more high level tagged union, that is more likely used in contexts where you need to manipulate the actual data or inspect it, I would want to have a manually written struct that can have helper and convenience methods.

But I think in the end it depends on how you like to bundle and structure things and what the program actually focuses on.

2 Likes

Yes. Thanks!