Catching "invalid enum value" errors

I’m playing with network servers sending and receiving binary data. I convert incoming binary data into Zig structs with @as(MyFormat, @bitCast(bytes[0..@sizeOf(MyFormat)].*)). If MyFormat has an enum in it, and the value in the binary data is not one of the assigned values, I get:

15006 panic: invalid enum value
???:?:?: 0x27333a in formatType__anon_8749 (read)
???:?:?: 0x262bd9 in formatType__anon_8358 (read)
???:?:?: 0x2621f8 in format__anon_8346 (read)
???:?:?: 0x2518d0 in print__anon_7059 (read)
???:?:?: 0x22101c in print__anon_3812 (read)
???:?:?: 0x21ee62 in main (read)
???:?:?: 0x21e385 in posixCallMainAndExit (read)
???:?:?: 0x21de71 in _start (read)
???:?:?: 0x0 in ??? (???)
zsh: done             ./hand-crafted-binary | 
zsh: IOT instruction  ./read

The Internet being a jungle, a network server must of course be robust in face of erroneous messages. Therefore, I would like to catch the error and be able to continue. The probem is that I do not find a way to do so in Zig (try/catch do not seem to be usable for @bitCast Documentation - The Zig Programming Language).

1 Like

Note that the error isn’t coming from @bitCast, but from the usage of the result later.
As far as I know the only ways to solve this would be either by replacing the bitCast with a manual initilization and std.meta.intToEnum or you can use a non-exhaustive enum, but that would require handling the faulty values at all later points.

3 Likes

I didn’t notice the non-exhaustive enums before. It seems there is no easy way to test if a number is in the named values. (@enumFromInt never fails for a non-exhaustive enum). The only solution seems to be a switch where I have to list all possible values.

There’s an old C-trick that may help here…

const std = @import("std");

const Nums = enum {
    one, 
    two,
    end // always the last value
};

pub fn main() !void {

    // get the numeric value of the "end" tag
    const end: usize = @intFromEnum(Nums.end);

    std.debug.print("\nend: {}\n", .{ end });
}

You can then see if an integer is greater than or equal to that value. If it is, it’s outside the range of your enum.

4 Likes

This is a cool trick. I believe I’ve spotted it in the Go standard library too.

Non-exhaustive enums don’t work with EnumSet. It’s too big of a trade-off in my opinion. Given that you’ll likely need to perform other types of validation on the data anyway, it’s probably better to code a mechanism that uses std.meta.intToEnum to check whether the value would lead to an error.

Here’s an example that I whipped up:

const std = @import("std");

const Pet = enum { Cat, Dog, Camel };
const Struct = packed struct {
    pet: Pet,
    number: u17,
};

fn validateStructBytes(comptime T: type, bytes: []const u8, comptime validators: ?type) bool {
    const info = @typeInfo(T).Struct;
    const NoEnum = @Type(.{
        .Struct = .{
            .layout = info.layout,
            .backing_integer = info.backing_integer,
            .is_tuple = info.is_tuple,
            .decls = &.{},
            .fields = create: {
                comptime var fields: [info.fields.len]std.builtin.Type.StructField = undefined;
                inline for (info.fields, 0..) |field, index| {
                    fields[index] = .{
                        .name = field.name,
                        .default_value = null,
                        .is_comptime = field.is_comptime,
                        .alignment = field.alignment,
                        .type = switch (@typeInfo(field.type)) {
                            .Enum => |em| em.tag_type,
                            else => field.type,
                        },
                    };
                }
                break :create &fields;
            },
        },
    });
    if (bytes.len != @sizeOf(T)) {
        return false;
    }
    const ptr = std.mem.bytesAsValue(NoEnum, bytes[0..@sizeOf(T)]);
    inline for (info.fields) |field| {
        const raw_value = @field(ptr, field.name);
        const conversion: anyerror!field.type = switch (@typeInfo(field.type)) {
            .Enum => std.meta.intToEnum(field.type, raw_value),
            else => raw_value,
        };
        if (conversion) |value| {
            if (validators) |ns| {
                if (@hasDecl(ns, field.name)) {
                    const callback = @field(ns, field.name);
                    if (!callback(value)) {
                        return false;
                    }
                }
            }
        } else |_| {
            return false;
        }
    }
    return true;
}

test "validateStructBytes" {
    std.debug.print("\n", .{});
    const test_struct: Struct = .{
        .pet = .Dog,
        .number = 23,
    };
    const bytes1 = std.mem.asBytes(&test_struct);
    const bytes2: []const u8 = &.{ 0xFF, 0xFF, 0xFF, 0xFF };
    const result1 = validateStructBytes(Struct, bytes1, null);
    std.debug.print("{s}\n", .{if (result1) "valid" else "invalid"});
    const result2 = validateStructBytes(Struct, bytes2, null);
    std.debug.print("{s}\n", .{if (result2) "valid" else "invalid"});
    const validators = struct {
        pub fn number(value: u17) bool {
            return value < 20;
        }
    };
    const result3 = validateStructBytes(Struct, bytes1, validators);
    std.debug.print("{s}\n", .{if (result3) "valid" else "invalid"});
}

The function basically creates a parallel struct type where enums are replaced with ints so that we can perform the check. It accepts an optional namespace for performing additional validation.

2 Likes

Yes, useful trick, but works only if the values are contiguous (for several network protocols, they are not).

True - this does assume contiguity. I’ll play around with some meta programming tonight and see if I can cook up something more direct. You can iterate over the fields in an enum and I’m pretty sure that this can be used to create runtime checks (I do it all the time for tuples). I’ll write an example tonight, but it should actually be pretty straight forward for the non-contiguous case.

    /// This data structure is used by the Zig language code generation and
    /// therefore must be kept in sync with the compiler implementation.
    pub const EnumField = struct {
        name: []const u8,
        value: comptime_int,
    };

    /// This data structure is used by the Zig language code generation and
    /// therefore must be kept in sync with the compiler implementation.
    pub const Enum = struct {
        tag_type: type,
        fields: []const EnumField,
        decls: []const Declaration,
        is_exhaustive: bool,
    };

Here you can see that the Enum builtin type has a member called “fields” and each field has a name and value. You can unroll this with an inline loop and check it’s numeric value against a provided value.

1 Like

Here’s a simple way to see if a value is inside of an enum that does not require contiguous values:

const std = @import("std");

fn isEnum(comptime E: type) bool {
    return switch(@typeInfo(E)) {
        .Enum => true,
        else => false,
    };
}

pub fn inEnum(comptime E: type, t: usize) bool {

    if (comptime !isEnum(E)) {
        @compileError("inEnum requires enum type as first argument.");
    }
    
    return inline for (@typeInfo(E).Enum.fields) |f| {
        if (f.value == t) {
            break true;
        }
    } else false;
}

const TagsA = enum {
    a,
    b,
};

const TagsB = enum(usize) {
    a = 10, 
    b = 50,
};

pub fn main() !void {
    // the inferred case:
    std.debug.print("\nIn enum: {}\n", .{ inEnum(TagsA, 0) });
    std.debug.print("\nIn enum: {}\n", .{ inEnum(TagsA, 1) });
    std.debug.print("\nIn enum: {}\n", .{ inEnum(TagsA, 2) });

    // the explicit case:
    std.debug.print("\nIn enum: {}\n", .{ inEnum(TagsB, 10) });
    std.debug.print("\nIn enum: {}\n", .{ inEnum(TagsB, 50) });
    std.debug.print("\nIn enum: {}\n", .{ inEnum(TagsB, 42) });

}

std.meta.intToEnum has been mentioned above but here’s a simple example that may make it clearer how it’s useful in this case:

const std = @import("std");

const E = enum(u8) {
    a = 10,
    b = 2,
    c = 33,

    fn fromU8(n: u8) ?E {
        return std.meta.intToEnum(E, n) catch null;
    }
};

pub fn main() void {
    std.debug.print("33? {?}\n", .{E.fromU8(33)});
    std.debug.print("42? {?}\n", .{E.fromU8(42)});
}

BTW fix for this is planned on my TODO list for a long time (for now I just re-implement smaller subset of what I need from std.enums for non-exhaustive enums). If I understand correctly, this restriction was relevant when one type of enumerations were allowed to have field aliases (now-removed extern enums), and since we don’t have them anymore this restriction can be lifted.

Related:
Add some enum utilities by SpexGuy · Pull Request #8171 · ziglang/zig · GitHub (original PR that introduced std.enums, merged March 19, 2021)
enums should disallow multiple enumerations with the same value · Issue #2115 · ziglang/zig · GitHub (proposal to remove extern enums, April 28, 2021, also note about time complexity of std.enums.directEnumArray on a extern enum)

1 Like

This code is almost exactly what std.meta.intToEnum does.
Yesterday, I was writing a performance critical loop. I had a bunch of functions in an array (not function pointers, actual function types, the kind that is comptime-only). I was iterating over this array, doing a bunch of work and, at the end of each iteration, I would call one the functions. Because functions are comptime-only, I had to inline the loop, like you did in your code. I decided to test what would happen if I placed function pointers in the array instead, so that I could remove the inline from the loop. This resulted in a 2x speedup.
I had previously looked at the std.meta.intToEnum implementation, and I noticed yours was very similar. Primed by my experience yesterday, I decided to test if inlining the loop could be causing an inefficiency. I placed the values of the enum in an array, which I can iterate at runtime without inlining the loop. The generated code was much better. I wrote a pull request about this.
So, here’s the lesson from I got from these past couple of days. Never inline loops that are going to be executed at runtime. You can always find a way to make the iteration work with a value that is not comptime-only, while still preserving the comptime knowledge about the loop bounds.

3 Likes

That’s great information. Thanks for sharing your find! I’ll have to test this at some point myself and maybe I’ll start defaulting to something similar.