Bitcast should be allowed on enums with tag types

Since the Zig team is not accepting proposals right now, I wanted to at least leave this here while its fresh in my brain.

Background

  • Bitcasting enums was removed in favor of @intFromEnum: #3647
  • Bitcasting packed structs containing enums can create invalid enums: #21372
  • Proposal to allow bitcasting non-exhaustive enums: #14367

Proposal

  • @bitCast to and from enums should be allowed.

  • all of the existing properties of bitcasting is the same. @bitCast will still require the two types have the same bit width.

  • @bitCast to / from enums should require the tag type be specifed on the enum.

// this enum can be bitcasted
const MemeEnum = enum(u32) {
    dead_beef = 0xdeadbeef,
    great_band = 0xABBAABBA,
};

// this enum cant be bitcasted
const MemeEnum = enum {
    dead_beef = 0xdeadbeef,
    great_band = 0xABBAABBA,
};

The tag type is required because the compiler is allowed to pick an arbitrary backing integer for an enum when it is not specified.

  • @bitCast int to enum should invoke safety checked undefined behavior when the bitpattern is not found in the target enum. Int to non-exhaustive enum will never panic.
const MemeEnum = enum(u32) {
    dead_beef = 0xdeadbeef,
    great_band = 0xABBAABBA,
};

// this panics in debug mode
const ahhhh: MemeEnum = @bitCast(@as(u32, 0xAAAA_AAAA));

// this doesnt panic in debug mode
const dead_beef: MemeEnum = @bitCast(@as(i32, @bitCast(@as(u32, 0xdeadbeef))));

Why?

  1. Bitcast is allowed for packed structs containing enums. Allowing bitcasting enums makes the language more consistent here.
  2. The meaning of an enum with a tag type specified transforms the concept of an enum to “a set of bit patterns”.
// this means:
// * a set of unique integers I am storing in a u32
// * a set of unique 32-bit-long bit patterns
const MemeEnum = enum(u32) {
    dead_beef = 0xdeadbeef,
    great_band = 0xABBAABBA,
};

// this means:
// * a set of unique integers, I don't care about the type used to store them.
const State = enum {
    started,
    in_progress,
    done,
};

Concrete Use Case

Bit-packed Binary Protocols

Packed structs can be bitcasted with enums inside.

I have defined a generic deserializer for the EtherCAT industrial automation protocol, a bit-packed binary communication protocol.

I can directly define what the representation should be on the wire using the little-endian representation of packed structs:

/// EtherCAT command, present in the EtherCAT datagram header.
pub const Command = enum(u8) {
    NOP = 0x00,
    APRD,
    APWR,
    APRW,
    FPRD,
    FPWR,
    FPRW,
    BRD,
    BWR,
    BRW,
    LRD,
    LWR,
    LRW,
    ARMW,
    FRMW,
    _, // here to prevent panic on deserialization
};

/// Datagram Header
///
/// Ref: IEC 61158-4-12:2019 5.4.1.2
pub const Header = packed struct(u80) {
    command: Command,
    idx: u8 = 0,
    address: u32,
    length: u11,
    reserved: u3 = 0,
    circulating: bool,
    next: bool,
    irq: u16,
};

/// Convert little endian packed bytes from EtherCAT to host representation.
///
/// Supports enums, packed structs, and most primitive types. All must have
/// bitlength divisible by 8.
pub fn packFromECat(comptime T: type, ecat_bytes: [@divExact(@bitSizeOf(T), 8)]u8) T {
    switch (native_endian) {
        .little => {
            if (@typeInfo(T) == .@"enum") {
                return @enumFromInt(@as(@typeInfo(T).@"enum".tag_type, @bitCast(ecat_bytes)));
            }
            return @bitCast(ecat_bytes);
        },
        .big => {
            var bytes_copy = ecat_bytes;
            std.mem.reverse(u8, &bytes_copy);
            if (@typeInfo(T) == .@"enum") {
                return @enumFromInt(@as(@typeInfo(T).@"enum".tag_type, @bitCast(ecat_bytes)));
            }
            return @bitCast(bytes_copy);
        },
    }
    unreachable;
}

I have to special-case enums here in my deserializer, which is not intuitive, especially since I am specifying the backing integer of my enum.
Zig should allow me to throw around bits and reason about the in-memory representation.

Other developers may suggest using readers / writers etc, but I know that if I am on a little-endian platform, I can deserialize with a single memcopy, and I don’t want to rely on a complex optimizer to make sure my individual reads are transformed into a single memcopy. In this real-time software use case, the deserialization time will take away valuable CPU time from every control cycle.

Memory Mapped Input-Output (MMIO)

I don’t have personal experience with MMIO but I hear packed structs are usefull for it, and enums are usually in them.

Unanswered Questions

  1. What do we do about @typeInfo?

This is the current @typeInfo for an enum:

pub const Enum = struct {
    tag_type: type,
    fields: []const EnumField,
    decls: []const Declaration,
    is_exhaustive: bool,
};

This will likely need to be changed to reflect whether an enum is explicitly tagged, perhaps an is_explicitly_tagged: bool field?

Counter-Arguments

  • The deserialization use-case is not compelling, readers/ writers should be used for endianness conversion.

    I should not need an abstraction to re-interpret non-byte-aligned memory (see std.io.BitReader). Zig should allow me to reason about the expected number of memcopy’s directly to produce optimal code.

  • Users will use @bitCast when they should use @intFromEnum.

    This risk can be mitigated with a warning in the docs.

What do you think?

5 Likes

I really like your proposal, I think it’s a big improvement in ergonomics, and also in readability.

1 Like

In add `@intFromStruct` and `@structFromInt` for converting between a packed struct and its integer representation · Issue #18882 · ziglang/zig · GitHub

Andrew I think mentions a useful metric:

Generally, the more specific conversion is preferred, because it is more resilient to code churn.

which is “resilience to code churn”. We want to enable or encourage the developer to write code that is easier to maintain. If a type changes but happens to be the same bit width, @bitCast will provide fewer guarantees than @intFromEnum.

I think allowing @bitCast here is still fine, even though it perhaps makes “resilience to code churn” worse. I think it’s still useful to provide multiple levels of “sharp” tools like @ptrCast, @bitCast, @enumFromInt etc.

The key difference in “sharpness” here is the difference between casting the value of an integer (@enumFromInt) and casting the bit pattern of the underlying memory (@bitCast).

2 Likes

Yes and I think that builtins do improve readability, and the added friction often makes it easy to focus your attention on the potentially harmful areas. On top of that doing any kind of low level or embedded work, if often just encoding magic numbers in #define in C. I think leveraging the type system trough enums is better, safer, makes it easier on the tooling, the reader, you also get all the nice things that comes with enums and exhaustive switching. So I really like your proposal I think it does improve readability and ergonomics at the same time.

1 Like

I think it would be fine to propose this on Github.

1 Like