Determining if a string corresponds to a builtin type

biosbob · November 11, 2024, 5:11pm

given strings such as "bool" or "u8" or "short", is there a library function that tells me whether a particular string corresponds to a builtin type???

i’m already handling this in a textMate grammar using a regex, but i also need a local implementation in zig itself…

is it time for me to bite the bullet and start using a zig regex library???

Dok8tavo · November 11, 2024, 6:19pm

It’s not out yet, but in my project, I do a very primitive kind of parsing of zig and I have a StaticStringMap of some identifiers. Here’s the part for the builtin types:

const map = std.StaticStringMap(TokenKind).initComptime(.{
    // various things here
    ... 

    // various builtin types
    .{ "anyerror", TokenKind.builtin_type },
    .{ "anyframe", TokenKind.builtin_type },
    .{ "anyopaque", TokenKind.builtin_type },
    .{ "anytype", TokenKind.builtin_type }, // this isn't technically a type, but it makes sense for me
    .{ "bool", TokenKind.builtin_type },
    .{ "noreturn", TokenKind.builtin_type },
    .{ "type", TokenKind.builtin_type },
    .{ "void", TokenKind.builtin_type },

    // builtin integers
    .{ "c_char", TokenKind.builtin_type },
    .{ "c_int", TokenKind.builtin_type },
    .{ "c_long", TokenKind.builtin_type },
    .{ "c_longlong", TokenKind.builtin_type },
    .{ "c_short", TokenKind.builtin_type },
    .{ "c_uint", TokenKind.builtin_type },
    .{ "c_ulong", TokenKind.builtin_type },
    .{ "c_ulonglong", TokenKind.builtin_type },
    .{ "c_ushort", TokenKind.builtin_type },
    .{ "comptime_int", TokenKind.builtin_type },
    .{ "isize", TokenKind.builtin_type },
    .{ "usize", TokenKind.builtin_type },

    // builtin floating points
    .{ "c_longdouble", TokenKind.builtin_type },
    .{ "comptime_float", TokenKind.builtin_type },
    .{ "f16", TokenKind.builtin_type },
    .{ "f32", TokenKind.builtin_type },
    .{ "f64", TokenKind.builtin_type },
    .{ "f80", TokenKind.builtin_type },
    .{ "f128", TokenKind.builtin_type },

    // other various things here
    ... 
});

Please note that these do not include all the integers types as you can go for arbitrary size \b(u|i)[0-9]+\b. However they don’t go further than 65535, but I don’t know if it’s an edge case you care about.

Hope it helps.

biosbob · November 11, 2024, 6:26pm

unfortunately, i need the arbitrary-sized u and i types… i really DO need \b(u|i)[0-9]+\b

i’m seeing a lot of recent activity here, so maybe that’s the way to go???

mnemnion · November 11, 2024, 6:33pm

Mvzr should handle that regex just fine Bob. It isn’t full-featured but it covers the basics. I encourage you to give it a spin, and see if it meets your needs.

dimdin · November 11, 2024, 7:38pm

If that is the only reason to use regular expressions, perhaps std.zig.isPrimitive is suitable for your needs.