Support switch on string variable

Sze · February 26, 2025, 6:23am

I think a similar approach would be to generate a small enum from the big enum that uses identical values and that only has the wanted values and then apply stringToEnum to the small enum.

vulpesx · February 26, 2025, 11:24am

That also works, but imo that would be less ergonomic

chung-leong · February 26, 2025, 9:45pm

It’s the wrong cure, in my opinion, for what’s essentially a craving for syntactic sugar. There’s nothing wrong with using series of if statements. It just doesn’t look very good and is somewhat more error prone.

For situations where a hash table is warranted, a hash table of function pointers would make better sense. Looking up a value just to send it immediately to a switch is kinda silly.

Sze · February 27, 2025, 5:25am

It can do more work than necessary, have worse performance and high variance in execution time between the first and last case.

I am very skeptical, I think function pointers can be a lot more costly than using a switch on a value. I don’t see why you would use function pointers when you instead could leave it up to the compiler.

Why?
I think it is quite a common thing to create or get a value to switch over.

I don’t really see it as a lookup, I see it as a mostly comptime transformation that reduces a big type to a small type, similar to this Advanced use of comptime: Tagged Union Subsets, but if it is done with enums that both have the same backing integer and use the same values it should essentially be a @enumFromInt(@intFromEnum(x)), because there is no payload that needs to be copied.

With Bitcast should be allowed on enums with tag types it could be done with a bitcast.

vulpesx · February 27, 2025, 6:40am

While hashing is O(1) assuming no collisions, it’s often slower for small data sets.
If you care about performance, profile multiple solutions for your specific use case. Also check if that’s even the bottleneck

chung-leong · February 27, 2025, 3:02pm

I don’t think there’s going to be much of a difference thanks to branch prediction. The last case would be in the execution pipeline already before the check is finished.

A large switch would employ a jump table. It’s an indirect branch either way. Separating code into smaller obviously functions makes it easier to navigate, easier to unit test. Easier to “fall through” to another case too–you just call the other function.

floooh · February 27, 2025, 3:36pm

FWIW, any halfway decent compiler should output the exact same code for a switch-case vs an equivalent if-else, e.g.: Compiler Explorer

(PS: I’m not a fan of the original thread topic “switching-on-strings” though, too much hidden compiler-magic to make that efficient - not to mention that Zig doesn’t even have a string type (which IMHO is also a good thing).

Sze · February 27, 2025, 4:14pm

You still do more work. With StaticStringMap (which is used by stringToEnum for small enough enums) the input needle length is used to switch to the group of keys that have exactly that length, immediately discarding all other keys as possibility which can cut down on the number of strings that need to be checked significantly. There is also a nice optimization in the works Replace StaticStringMap with far more optimized version by RetroDev256 · Pull Request #21498 · ziglang/zig · GitHub

@floooh a compiler won’t magically group up branches and share common work between the conditions of those branches (scattered across different function calls), if that was the case the pr wouldn’t be necessary. Once you have done those length based groupings you can implement the switching on that length in an arbitrary way that is fastest, but that isn’t what I am talking about.

stringToEnum already falls back to using if and letting the compiler optimize that (in cases where comptime is too slow to use the StaticStringMap), so I don’t get why I would use if-else chains manually instead of letting the compiler/standard-library generate it when it wants to. From my point of view using stringToEnum + switch is strictly better, I can wait for the pr to get merged and automatically get better performance, if comptime gets faster I get better performance too (because more stringToEnum calls would use StaticStringMap for algorithmic improvements), I also get exhaustiveness checking for the code that is within the switch prongs.

I don’t see how that is an issue you can have a function call in a prong and for “fall through” you can continue to a different prong via Labeled-switch and continue.

I get the feeling like we all argue about slightly different problems.

chung-leong · February 27, 2025, 5:16pm

TUSF · February 27, 2025, 5:52pm

I found the approach that the std.http module uses for Method funny.

Just re-interpret the bytes of the string as an integer. Only works if your string is short enough to fit into your integer type, tho. And I don’t think std.http ever actually uses this in a switch?

chung-leong · February 27, 2025, 10:08pm

Switching on a []const u8 is a bit wonky, since that’s a pointer. Switching on fixed sized u8 array is worth implementing, I think. Four-C signatures are used quite frequently used in file formats. Being able to do something like this would be genuinely useful:

    const sig: [4]u8 = .{ ... };
    switch (sig) {
        "hhea" => {
            // ...
        },
        "hmtx" => {
            // ...
        },
        "cmap" => {
            // ...
        },
        "OS/2" => {
            // ...
        },
        else => {},
    }

In C, we typically treat the signature as an uint32_t. Most often than not that means the code ends up being dependent on a little-endian arrangement.

castholm · February 27, 2025, 11:18pm

You can switch on static-length strings reinterpreted as integers in user space:

const sig: [4]u8 = .{ ... };
switch (std.mem.readInt(u32, &sig, .little)) {
    std.mem.readInt(u32, "hhea", .little) => {
        // ...
    },
    std.mem.readInt(u32, "hmtx", .little) => {
        // ...
    },
    std.mem.readInt(u32, "cmap", .little) => {
        // ...
    },
    std.mem.readInt(u32, "OS/2", .little) => {
        // ...
    },
    else => {},
}

ajoino · February 28, 2025, 11:52am

Would @bitCast work here as well?

castholm · February 28, 2025, 8:38pm

Yes, @as(u32, @bitCast("abcd".*)) also works.

semperfidelis · March 2, 2025, 7:28pm

Another way to tackle this would be pattern matching for arrays similar to that in C#. Maybe a robust pattern matching syntax could be considered for future versions of Zig but for now I’d personally rather see Zig 1.0 come to fruition before too much syntactic sugar gets added in.

const-void · March 14, 2025, 10:41am

what we really want is a multi conditional if

if str == "foo" {...}
       == "bar" {...}
       == "xyz" {...}
       >= "zzz" {...}
       <= "aaa" {...}
 else { 
   ...
}

vulpesx · March 14, 2025, 12:29pm

That’s literally what a switch is, minus the >= <=, but you can get that with ranges, also not sure how that would work for non number types, like a string.

mnemnion · March 14, 2025, 8:07pm

I like that Zig doesn’t hide the expense of code. A switch on strings is not at all the same thing as a switch on enums or native integer values, in terms of implementation, it’s something more expensive. Yes it’s true that one can write a switch on very large non-native integers, but how often would someone want to do that? Whereas switching on a string is natural enough if you’re accustomed to it from other languages.

The problem can be solved very much as the compiler would solve it, as others have noted, by using a StaticStringMap to turn the strings of interest into enums, perhaps with a default value if the string isn’t found in the map. Or, maybe, the string isn’t needed at all and enums could simply be used directly.

I’m playing along with the premise that Zig even has strings, but when you come down to it, it doesn’t. The reasons for that are not unrelated to the question at hand.

So breaking the problem into two parts like this has several advantages. For one, it’s a bit of friction which nudges toward using enums instead, which is optimally fast in a way which chained string comparisons can’t reach. For another, once you have the StaticStringMap, you can use it in other places. For a third, if you notice a lot of conversion of strings to their enum equivalents in many places, you can do that once, on the edge, and operate with enums in the heart of the program.

That’s what you want to be doing, so overall I would say that the lack of this feature is working as intended.

const-void · March 15, 2025, 12:28pm

well…not so much:

switch is a construct that is expected to result into a optimized lookup table when compiled to a machine code.

also

[switch strings] It’s the wrong cure, in my opinion, for what’s essentially a craving for syntactic sugar. There’s nothing wrong with using series of if statements. It just doesn’t look very good and is somewhat more error prone.

Sze · March 15, 2025, 1:05pm

Did you intend to repeat this quote 2 times? Seems like you meant to quote something different the second time, or I don’t quite understand what you mean.

I think Zig isn’t big on syntactic sugar in general (using a tiny bit, but not more than that), macros are a way to add endless variations of syntactic sugar (and Zig has deliberately decided against those), at first macros seem great, but personally I found the resulting hyper-individualization of code constructs, where everyone uses their own little gimmicks and increasingly more complex macros and macro generating macros, etc., (that also comes with lots of bike-shedding, redundancy and a community where nobody follows the same principles, making it difficult to read and understand code) a bit exhausting after a while.

I mention macros, because a lot of the feature requests I see, that are just wanting to “make something easier and more convenient” remind me of macros that I have seen, that essentially add a new syntactic way that adds some more constrained and specialized way to do something, that already could be done, just slightly less “pretty”, at the expense of an increasingly exploding vocabulary of new constructs that have their own weird edge-cases and often don’t even compose well together.

If anything I would want people to explore and use macro systems in different languages, until they either have found a way to improve and eliminate these issues, or become disillusioned with the fancy and superficial heaps of syntactic sugar. One reason I like Zig is that it gives a lot more weight to semantics over syntax, I think it is much better to find good semantics before considering additional syntactic additions.

Unless somebody can find something better than existing macro systems and comptime combined, that doesn’t destroy the nice properties of comptime, I definitely prefer Zig’s more manual, explicit and clear style, over something that looks fancy, but hides complexity from you, by adding a mountain of constructs.