Since the Zig team is not accepting proposals right now, I wanted to at least leave this here while its fresh in my brain.
Background
- Bitcasting enums was removed in favor of
@intFromEnum
: #3647 - Bitcasting packed structs containing enums can create invalid enums: #21372
- Proposal to allow bitcasting non-exhaustive enums: #14367
Proposal
-
@bitCast
to and from enums should be allowed. -
all of the existing properties of bitcasting is the same.
@bitCast
will still require the two types have the same bit width. -
@bitCast
to / from enums should require the tag type be specifed on the enum.
// this enum can be bitcasted
const MemeEnum = enum(u32) {
dead_beef = 0xdeadbeef,
great_band = 0xABBAABBA,
};
// this enum cant be bitcasted
const MemeEnum = enum {
dead_beef = 0xdeadbeef,
great_band = 0xABBAABBA,
};
The tag type is required because the compiler is allowed to pick an arbitrary backing integer for an enum when it is not specified.
@bitCast
int to enum should invoke safety checked undefined behavior when the bitpattern is not found in the target enum. Int to non-exhaustive enum will never panic.
const MemeEnum = enum(u32) {
dead_beef = 0xdeadbeef,
great_band = 0xABBAABBA,
};
// this panics in debug mode
const ahhhh: MemeEnum = @bitCast(@as(u32, 0xAAAA_AAAA));
// this doesnt panic in debug mode
const dead_beef: MemeEnum = @bitCast(@as(i32, @bitCast(@as(u32, 0xdeadbeef))));
Why?
- Bitcast is allowed for packed structs containing enums. Allowing bitcasting enums makes the language more consistent here.
- The meaning of an enum with a tag type specified transforms the concept of an enum to “a set of bit patterns”.
// this means:
// * a set of unique integers I am storing in a u32
// * a set of unique 32-bit-long bit patterns
const MemeEnum = enum(u32) {
dead_beef = 0xdeadbeef,
great_band = 0xABBAABBA,
};
// this means:
// * a set of unique integers, I don't care about the type used to store them.
const State = enum {
started,
in_progress,
done,
};
Concrete Use Case
Bit-packed Binary Protocols
Packed structs can be bitcasted with enums inside.
I have defined a generic deserializer for the EtherCAT industrial automation protocol, a bit-packed binary communication protocol.
I can directly define what the representation should be on the wire using the little-endian representation of packed structs:
/// EtherCAT command, present in the EtherCAT datagram header.
pub const Command = enum(u8) {
NOP = 0x00,
APRD,
APWR,
APRW,
FPRD,
FPWR,
FPRW,
BRD,
BWR,
BRW,
LRD,
LWR,
LRW,
ARMW,
FRMW,
_, // here to prevent panic on deserialization
};
/// Datagram Header
///
/// Ref: IEC 61158-4-12:2019 5.4.1.2
pub const Header = packed struct(u80) {
command: Command,
idx: u8 = 0,
address: u32,
length: u11,
reserved: u3 = 0,
circulating: bool,
next: bool,
irq: u16,
};
/// Convert little endian packed bytes from EtherCAT to host representation.
///
/// Supports enums, packed structs, and most primitive types. All must have
/// bitlength divisible by 8.
pub fn packFromECat(comptime T: type, ecat_bytes: [@divExact(@bitSizeOf(T), 8)]u8) T {
switch (native_endian) {
.little => {
if (@typeInfo(T) == .@"enum") {
return @enumFromInt(@as(@typeInfo(T).@"enum".tag_type, @bitCast(ecat_bytes)));
}
return @bitCast(ecat_bytes);
},
.big => {
var bytes_copy = ecat_bytes;
std.mem.reverse(u8, &bytes_copy);
if (@typeInfo(T) == .@"enum") {
return @enumFromInt(@as(@typeInfo(T).@"enum".tag_type, @bitCast(ecat_bytes)));
}
return @bitCast(bytes_copy);
},
}
unreachable;
}
I have to special-case enums here in my deserializer, which is not intuitive, especially since I am specifying the backing integer of my enum.
Zig should allow me to throw around bits and reason about the in-memory representation.
Other developers may suggest using readers / writers etc, but I know that if I am on a little-endian platform, I can deserialize with a single memcopy, and I don’t want to rely on a complex optimizer to make sure my individual reads are transformed into a single memcopy. In this real-time software use case, the deserialization time will take away valuable CPU time from every control cycle.
Memory Mapped Input-Output (MMIO)
I don’t have personal experience with MMIO but I hear packed structs are usefull for it, and enums are usually in them.
Unanswered Questions
- What do we do about
@typeInfo
?
This is the current @typeInfo
for an enum
:
pub const Enum = struct {
tag_type: type,
fields: []const EnumField,
decls: []const Declaration,
is_exhaustive: bool,
};
This will likely need to be changed to reflect whether an enum is explicitly tagged, perhaps an is_explicitly_tagged: bool
field?
Counter-Arguments
-
The deserialization use-case is not compelling, readers/ writers should be used for endianness conversion.
I should not need an abstraction to re-interpret non-byte-aligned memory (see
std.io.BitReader
). Zig should allow me to reason about the expected number of memcopy’s directly to produce optimal code. -
Users will use
@bitCast
when they should use@intFromEnum
.This risk can be mitigated with a warning in the docs.
What do you think?