I could turn this union into a packed union, to get rid of the automatically added safety checking tag in safe modes.
However if I do that T has to be a data type with defined memory layout, but I want to allow the user to use whatever they want.
Basically I want to opt-out of the safety check even in safe modes, because the tag brings more trouble then it helps me.
This is part of data structures meant to store things compactly, if I canât disable the tag, then I guess I have to avoid using the union completely and instead calculate addresses and offsets directly.
So that I can have both compactness and allow the user to choose an arbitrary type for T.
One of the problems with the tag is that it doubles the size of the nodes.
Using types with vastly different memory layouts while testing and then have everything different in release also seems unhelpful, while checks are on, things get tested, but when they are off things have a different layout.
Another problem is that I no longer can reliably cast the [*]Node pointer to a [*]align(@alignOf(Node)) T pointer, because the beginning of the node is the tag and thus T reinterprets the debug tag info as T, instead of the real value behind the tag.
Ironically the safety feature makes it more difficult to use the union in a way that is helpful to the goal.
I am wondering whether this is just a case where Zig doesnât want me to use a union?
I guess I might just change it to use head: *Head and then add @sizeOf(Head) to that and forwardAlign to the next [*]T to get to the data.
It just seems a bit unsatisfying having to do this so manually.
If it currently needs to be done manually, I would also be interested if anybody knows about plans to make things like this easier.
So, it is the correct answer: the question was how to remove the safety checking tag from a bare union, without forcing the entire build to turn off runtime safety checks. Thatâs how.
You appear to be concerned that this answer will lead someone who reads it to the wrong conclusions about what to do. So Iâll explain whatâs going on here as best I can, and if I get anything wrong, you can correct it.
Zig has several kinds of unions, the one weâre discussing is a âbareâ union. This means that the different types inside the union are accessed using the field name. Thereâs an allocated area of memory which is big enough to hold any one variant of the union, but only one of them at a time.
Like with Zig structs (not extern or packed), a union of this type has no defined memory layout. This makes trying to access the wrong field/type of the union a serious mistake. C style unions, extern in Zig, do have a defined memory layout, so theyâre used for type punning sometimes, although for turning an i64 into an f64 and back, Zig has @bitCast, which is what you should use.
If you tried that type pun on UnsafeUnion, in Fast/Small release modes, it might work today, it might not. But it might not work tomorrow. Itâs undefined behavior, because the type has no defined layout in memory. So itâs really important that no one does this.
So important that, in safe builds, Zig adds a hidden tag to each instance of a union type. You canât write code which discriminates on this union, like a switch, thatâs what a tagged union is for. The tag is there, and if you access the wrong field of the union, youâll get a runtime panic.
So the bare union is only to be used if the program youâre writing has some way to distinguish whether an instance of that union will be a specific member. As a toy example, you could allocate a slice of SafeUnion, and just declare that even numbered elements of the slice are .float, and odd numbered ones are .int.
Then you could do this:
for (union_slice, 0..) |u, i| {
if (i % 2 == 0) {
// do float stuff:
_ = u.float;
} else {
_ = u.int;
}
}
And everything is fine.
Thing is, with safety checking, that slice takes twice as much memory as it would without the hidden tag. Since the largest elements (both of them) are word-aligned, each address of an i64 or f64 needs to be align(8), so with the tag, the stride is 16.
Unlike many runtime safety checks, which you might not ever want to turn off, youâll want to turn this one off eventually. Because if you didnât want to, you could use a tagged union instead, and be able to switch on the enum, rather than have to track which field applies in some other way. It would be the same size, and more flexible: the slice could be any random mix of ints and floats, and the switch would discriminate correctly between them.
Actually combines several of my favorite things about Zig: block scope with labeled breaks, fine-grained control of memory, and flexible, scope-limited runtime safety behavior.
Iâm working on a VM right now, and I managed to squeeze all the opcodes into a tagged union which fits into a single machine word. But some of the âopcodesâ are actually bitmasks, so they take up the entire word, and thereâs no room for the tag in the enum. But because of how the VM functions, the instruction pointer is never pointed at a bitmask, they just live after instructions which know what to do with them, including advancing the instruction pointer to another opcode which has the tag.
Therefore I made a second, bare union: one field is an opcode, and the other is just a u64. As with the SafeUnion in the first post, this ends up with a stride of 16 bytes.
Iâm porting this from my earlier draft in another language, and while Iâm rewriting the core VM loop, Iâve kept the safety-checked mode on for the untagged union. That way, if code tries to access a bitmap, thinking itâs an opcode, Iâll get a crash instead of whatever happens when that memory is treated as an opcode (nothing good, surely).
But this will be bad for performance if I leave it that way, because it will double the code size, and the distance of each jump instruction, which is bad for cache locality. What I like about the construct above is that Iâll be able to turn off just the safety tag of the union, and get my properly-sized instructions without giving up on runtime panics of any other sort.
The TigerStyle talk was a good reminder to me that, just because you can turn off runtime safety in production, doesnât mean you have to, or even should, necessarily. It seems wisest to flip it off carefully, a scope at a time, while benchmarking the result, and only keep the check-free blocks if thereâs a clear performance boost. The compiler for the VM, for instance, can probably afford to be, say, 20% slower, if it means that buffer overruns will crash rather than escalate towards pwning the program.
But the core dispatch loop has to be as fast as it can be, and I definitely canât afford to have the instructions be 16 bytes when the program only needs them to be 8.
So I certainly didnât mean to give the impression that you should just switch off the safety check and do bad things to undefined memory, where youâll have no idea what the program will do. Andrewâs right, donât do that.
A big question here is whether runtime safety is an attribute of the union. If itâs not then using a type defined within a runtime-safe-off context outside that context will break the compiler.
extern is too limiting. i deal with this constantly where i just want a fixed, well defined layout but regular/auto structs donât give that to you and extern has too much extra baggage. There needs to be a better solution besides the current offering.
it is basically struct coloring., and it is just as annoying with structs as with functions.
For this use case, the restrictions of what you can put in a extern union or packed union make the program more complicated, reduce the usefulness of the data structure (user is disallowed to use types that should work) and/or require complicated casting that makes using unions unattractive.
To the point where I am likely to calculate my addresses manually instead of using any union, because they bring more restrictions than when I donât use them.
Basically I would want a unsafe union, that has the same size as its biggest member, the alignment of the highest needed alignment, puts every member at the 0 offset. So I guess it would be considered as having a defined memory layout. (And if it was named differently and had safety checks, those either wouldnât be intrinsically implemented, or would be at the end of the union)
So that you can have [*]UnsafeUnion, contain a bitmask as the first member (thus you know it is always the bitmask), then you can use that bitmask to be able to tell what is the length of the multi-item slice via @popCount of the set bits, and then by having specific bits mean specific types you can have arbitrary types as members of the slot, you check the bit, index into the multi-item pointer and access the correct active field.
The trouble only starts when you know that everything after the head slot is just data slots and you want to use a slice with the right alignment to provide a view into only these data slots:
const head:[*]UnsafeUnion = try allocAndInitialize(...);
// NOTE this only works for defined memory layouts which bare unions lack
// NOTE this also requires that @sizeOf(Data) == @sizeOf(UnsafeUnion)
const view:[]alignment(slot_alignment) Data = @ptrCast(@alignCast(head[1..getSizeFromMask(head)]));
With the theoretical UnsafeUnion that happens to work because the element is garanteed to be put at the 0 offset (it doesnât have an unaligned stride) and the size of Data and the union is the same.
But with a safety checked union, the safety check gets aligned and my data is located after it. That ruins the ability to properly align the data within the slot or provide a view into it via a slice (basically we would need a slice with a stride).
And if turning of safety checks to get an unsafe union, is invalid, then unions are useless for this usecase.
Because this becomes easier (without any of the restrictions of packed or extern):
const head:[*]Head = try allocAndInitialize(...);
const data:[*]Data = calculateDataPointer(head); // basically advances forward and aligns by whatever amount is necessary
const view:[]Data = data[0..getDataSizeFromMask(head)];
So I think I will try to change my code towards using that.
I think this answers the topic, you canât disable safety as a property of the type.
(And have the compiler treat that type in an unsafe way everywhere)
I donât know whether this is generally the case, but that is what @chung-leongâs code seems to hint at.
I think technically with bare unions the observed behavior is different, but the language disallows to treat bare unions as if their memory layout was defined.
Even when turning off safety seems to make the layout empirically predictable, it still is considered UndefinedBehavior by the language.
I can understand the restrictions of extern union, but I donât know any restrictions for packed union.
packed union have the size as its biggest member and puts every member at the 0 offset.
I donât know about the alignment handling.
What I am understanding is that you have a slice and the stride is needed to move to the next element when the size of the data is smaller than the union.
If I am understanding correctly this works:
packedunion.zig:10:15: error: packed unions cannot contain fields of type 'packedunion.main.Data'
data: Data,
^~~~
packedunion.zig:10:15: note: only packed structs layout are allowed in packed types
packedunion.zig:4:18: note: struct declared here
const Data = struct {
^~~~~~
With packed union everything is packed, I want to give the user the choice whether their data is packed or a normal struct or a primitive type, or even an enum. The data is only as packed as the type they supply.
I also tried your extern union suggestion, instead with a packed union and using data: std.meta.Int(.unsigned, @bitSizeOf(T)), and then bit casting to it, but then I noticed that you canât use @bitCast on an enum and instead have to use @intFromEnum, which makes the code more complicated.
I donât think this code should require any bitcasts anyway, I donât want to reinterpret any memory, I just want to assign the correct values to specific slots of specific sizes, I donât really care about the internal layout of the user supplied type. They get a slot where they put their data and can read it back out again.
I shouldnât have to type erase their data to be able to put the blob of data into my data field, but if I donât type erase and use a union, then I am forced to know more about the user supplied type then I want to.
It does work, but only for certain allowed types. (sidenote: I donât want to do type-punning at least not between different field types. I donât know if using an zero overhead union and casting that to the type of the active field is also called type-punning, the way I see it, is that I want to strip away the union, which only works if it is of the same size)
With the packed union the stride isnât a problem, because the size is well defined, but it doesnât allow normal structs.
The union allows normal structs, but you donât have a well defined size for the union, and you would need a slice with a stride to construct a view that only shows the data field. (It instead shows you the debug tag as if it were data, if you try to pointer cast it)
So this definitely isnât going to work with a bare union, because it has a size, but it doesnât have a layout. So you wonât be able to get a single offset inside the union to have a consistent interpretation between the field types of the union. You should be able to make a Zig struct with a field (that would have a consistent offset) that also has another field which is a bare union, and use a tagging scheme on the offset-defined field to decide on the value of the union part of the struct.
Would you be able to write an example of the type youâre looking for in C? obviously not generic, but it isnât clear to me the specific kind of memory-punning youâre trying to do. I did assume when you asked about disabling the safety check tag, that you had some external way of discriminating the union.
I donât think that @chung-leongâs example is doing anything which usefully generalizes. Because for a union, you absolutely can disable the tag.
Run this (it wonât segfault):
// unsafe union
pub const UnsafeUnion = blk: {
@setRuntimeSafety(false);
break :blk union {
int: i64,
float: f64,
};
};
const union_array: [2]UnsafeUnion = .{ UnsafeUnion{ .int = 64 }, UnsafeUnion{ .float = 1.5 } };
fn runtimePass() []const UnsafeUnion {
return union_array[0..];
}
test "safe and unsafe" {
std.debug.print("\nsize of SafeUnion is {}\n", .{@sizeOf(SafeUnion)});
std.debug.print("size of UnsafeUnion is {}\n", .{@sizeOf(UnsafeUnion)});
std.debug.print("size of array of UnsafeUnion is {}\n", .{@sizeOf([2]UnsafeUnion)});
const union_int = UnsafeUnion{ .int = 64 };
std.debug.print("value of union_int is {}\n", .{union_int.int});
const union_float = UnsafeUnion{ .float = 3.14159 };
std.debug.print("value of union_float is {}\n", .{union_float.float});
const union_slice = runtimePass();
// this is VERY ILLEGAL and is meant to illustrate the actual behavior
// I DISAVOW
std.debug.print("Does a bad thing happen? {}\n", .{union_slice[1].int});
}
The important question for me is this: say we have a bare union just like from the example. This is my question: if a type like this is used only correctly in safety-checked blocks, is it guaranteed to be correct? Or is the whole program undefined?
Because the union type evidently doesnât have the tag. That much is very clear. As long as the compiler knows that, and doesnât write safety checks it canât make, then this is ok.
So just empirically, weâve confirmed that: the block removes the tag, and we get the âexpectedâ undefined behavior, in a block which is safety checked. If you try this with the SafeUnion variant, it will panic. With UnsafeUnion, it just prints random stuff (from the perspective of the standard).
But is this property guaranteed to hold for any union defined this way, which is used properly: that is, always defined and accessed through the same field? I know that it risks invoking undefined behavior, the big question for me is whether it guarantees the expected behavior given that the type invariant is upheld.
It seems to me like it has to work that way, as in, the future standard would have to forbid ever getting this wrong. Otherwise block level runtime safety isnât well-defined, or, it would need to be impossible to disable safety checking of bare unions. In which case, as I said in my first post, why have them? If the tag is mandatory you may as well be able to use it.
Iâm glad @chung-leong tried that example, but I wouldnât expect it to work, because it doesnât change the type. Unions might be unique here in that the physical type of a runtime-checked union is literally different, so (as weâve seen) it is possible to compile it in a context which isnât safety checked, and then use it in one which is.
Wrong Union Field Access sure looks to me like the only case where the runtime safety setting active at type creation can follow the type around.
Iâm intending to build an entire program on this premise it matters to me if what Iâm trying to do is well defined, provided that use of the union is correct.
how does that work? To get a consistent offset, you need to make it extern, then you need to have an extern union, and back to where you canât have arbitrary structs inside it?
I should have said a defined offset, youâre right. It doesnât need to be consistent in this case, as long as it refers to a well-defined part of memory. Which actual offset that is could change between version of the compiler, or build modes, or what have you.
The idea is that a struct with an internal union will have at least one field which points to consistent memory, and that can be e.g. pointer tagged to discriminate the inner union.
Or it could be a usize-backed packed struct (inside the struct) where the tag is an enum, that has a receiver method which casts it to the correct pointer when itâs needed, masking off the tag bits in some fashion. The VM Iâm working on now doesnât need this, but Iâll need something like that down the line, so Iâve been experimenting a bit with how to make it work.