I’m working through ziglings and stumbled on this example:
const TripItem = union(enum) {
place: *const Place,
path: *const Path,
// This is a little helper function to print the two different
// types of item correctly.
fn printMe(self: TripItem) void {
switch (self) {
// Oops! The hermit forgot how to capture the union values
// in a switch statement. Please capture both values as
// 'p' so the print statements work!
.place => |p| print("{s}", .{p.name}),
.path => |p| print("--{}->", .{p.dist}),
}
}
};
This code works, but I’m confused by two things.
- when would you ever NOT declare a union(enum)…seems like you would always want to declare this for the convenience that you may need it later…what use case is there for not having that just automatically set?
- In previous examples of zinglings, excercises had me explicitly declare an enum type, and then assign that enum to the union, so in the above example, following previous conventions, the code would look like this:
const TripEnum = enum { place, path };
const TripItem = union(TripEnum) {
place: *const Place,
path: *const Path,
// This is a little helper function to print the two different
// types of item correctly.
fn printMe(self: TripItem) void {
switch (self) {
// Oops! The hermit forgot how to capture the union values
// in a switch statement. Please capture both values as
// 'p' so the print statements work!
.place => |p| print("{s}", .{p.name}),
.path => |p| print("--{}->", .{p.dist}),
}
}
};
Both snippets of code work in the example, the compiler is able to resolve the code and execute. In both examples I get the same results. So it seems like all you really need is just enum, and not a TripEnum which I was led to believe.
What’s going on here and what is actually needed? Why does this work without declaring a TripEnum and when does it make sense to declare a explicit enum type?
1 Like
You’re paying a cost for this. Even if your union only has two variants, you’ll need at least a bool inside there. This bool, in turn, might increase the size of the type by a up to a whole word, because of padding. If you can determine which variant is active by some other means, then you don’t want to pay this price. This can be done with a variable somewhere, or simply by the place in code. Sometimes you know that two variants’ lifetimes will never overlap, so you don’t need the tag to determine the active variant.
union(enum)
is syntactic sugar. It will create an anonymous enum in the background. A lot of times you actually need the tag enum for other things, in which case you want to explicitly name it.
8 Likes
Here are some concrete examples of both scenarios:
-
Untagged union
: zig/lib/std/zig/Ast.zig at 27d4bf753467894836e960bced73740c95e61db8 · ziglang/zig · GitHub Here, the extra
field is an untagged union, where the expected_tag
field of the union is active if and only if the tag
is expected_token
. As @LucasSantos91 mentioned, there would be a memory cost to redundantly storing a union tag in this case.
Also, in safe build modes (Debug and ReleaseSafe), a hidden union tag is added to untagged unions so that a runtime check can be inserted for using the wrong field, so you still get some safety when testing.
-
Tagged union with explicit tag type: zig/src/Package/Fetch/git.zig at 27d4bf753467894836e960bced73740c95e61db8 · ziglang/zig · GitHub Here, the Type
enum is also used on its own, and specific integer values are assigned to each field of Type
to mirror the underlying data structure and help in parsing the data: zig/src/Package/Fetch/git.zig at 27d4bf753467894836e960bced73740c95e61db8 · ziglang/zig · GitHub
4 Likes
- Untagged
union
: …link… Here, the extra
field is an untagged union, where the expected_tag
field of the union is active if and only if the tag
is expected_token
. As @LucasSantos91 mentioned, there would be a memory cost to redundantly storing a union tag in this case.Also, in safe build modes (Debug and ReleaseSafe), a hidden union tag is added to untagged unions so that a runtime check can be inserted for using the wrong field, so you still get some safety when testing.
Looking at that code, if I’m understanding this correctly, the extra union seems to be a clever trick to save a bit of memory. So if there aren’t any parsing errors, the extra is just a none: void, which I’m assuming means the extra union isn’t taking up any memory. Conversely, if extra was a union(Tag) or even just Tag, it would take up space. In that case you’d probably have to specify a Tag.NotApplicable enum value as the default case, so it’d be taking up memory in a situation where it wasn’t called for.
- Tagged union with explicit tag type: …link… Here, the
Type
enum is also used on its own, and specific integer values are assigned to each field of Type
to mirror the underlying data structure and help in parsing the data: …link…
This example makes sense, ‘read’ looks like a factory for EntryHeaders. EntryHeader is explicitly declared a union with the Type enum u8 because we know the first byte of data is going to be the EntryHeader Type.
Thanks for your help in understand all of this, you too @LucasSantos91 . I hope I got this right.
3 Likes
There is ALWAYS a slot in the memory of the struct in question for the extra
field, and it can fit whatever the biggest option is in the union
. Yes, one of the options is void
, meaning that the active value might be a 0-bit value, but the memory slot that can fit a Token.Tag
is still there and still in the same location, regardless of whether the none: void
is the active field or not.
When you have a union
, you typically also have a separate field (an enum
) which tells you which field is active. So you basically have a struct
like struct { kind: enum { ... }, data: { ... } }
where you treat the data differently based on kind
, but data
is always the same size, and kind
also occupies memory, at least a byte.
Untagged unions are specifically for those cases where having a kind
like I had in my example is redundant. Maybe I have struct { num_interesting_things: u32, data: { ... } }
and I can tell how to interpret data
based on num_interesting_things
, so it’s not necessary to store an extra kind
field to tell me how to interpret data
. In Debug mode the compiler will add that extra field no matter what and make sure you don’t accidentally interpret the data
as the wrong type, but for a correctly written program it should be redundant.
4 Likes