Support switch on string variable

I think a similar approach would be to generate a small enum from the big enum that uses identical values and that only has the wanted values and then apply stringToEnum to the small enum.

That also works, but imo that would be less ergonomic

1 Like

It’s the wrong cure, in my opinion, for what’s essentially a craving for syntactic sugar. There’s nothing wrong with using series of if statements. It just doesn’t look very good and is somewhat more error prone.

For situations where a hash table is warranted, a hash table of function pointers would make better sense. Looking up a value just to send it immediately to a switch is kinda silly.

It can do more work than necessary, have worse performance and high variance in execution time between the first and last case.

I am very skeptical, I think function pointers can be a lot more costly than using a switch on a value. I don’t see why you would use function pointers when you instead could leave it up to the compiler.

Why?
I think it is quite a common thing to create or get a value to switch over.

I don’t really see it as a lookup, I see it as a mostly comptime transformation that reduces a big type to a small type, similar to this Advanced use of comptime: Tagged Union Subsets, but if it is done with enums that both have the same backing integer and use the same values it should essentially be a @enumFromInt(@intFromEnum(x)), because there is no payload that needs to be copied.

With Bitcast should be allowed on enums with tag types it could be done with a bitcast.

While hashing is O(1) assuming no collisions, it’s often slower for small data sets.
If you care about performance, profile multiple solutions for your specific use case. Also check if that’s even the bottleneck

I don’t think there’s going to be much of a difference thanks to branch prediction. The last case would be in the execution pipeline already before the check is finished.

A large switch would employ a jump table. It’s an indirect branch either way. Separating code into smaller obviously functions makes it easier to navigate, easier to unit test. Easier to “fall through” to another case too–you just call the other function.

FWIW, any halfway decent compiler should output the exact same code for a switch-case vs an equivalent if-else, e.g.: Compiler Explorer

(PS: I’m not a fan of the original thread topic “switching-on-strings” though, too much hidden compiler-magic to make that efficient - not to mention that Zig doesn’t even have a string type (which IMHO is also a good thing).

You still do more work. With StaticStringMap (which is used by stringToEnum for small enough enums) the input needle length is used to switch to the group of keys that have exactly that length, immediately discarding all other keys as possibility which can cut down on the number of strings that need to be checked significantly. There is also a nice optimization in the works Replace StaticStringMap with far more optimized version by RetroDev256 ¡ Pull Request #21498 ¡ ziglang/zig ¡ GitHub

@floooh a compiler won’t magically group up branches and share common work between the conditions of those branches (scattered across different function calls), if that was the case the pr wouldn’t be necessary. Once you have done those length based groupings you can implement the switching on that length in an arbitrary way that is fastest, but that isn’t what I am talking about.

stringToEnum already falls back to using if and letting the compiler optimize that (in cases where comptime is too slow to use the StaticStringMap), so I don’t get why I would use if-else chains manually instead of letting the compiler/standard-library generate it when it wants to. From my point of view using stringToEnum + switch is strictly better, I can wait for the pr to get merged and automatically get better performance, if comptime gets faster I get better performance too (because more stringToEnum calls would use StaticStringMap for algorithmic improvements), I also get exhaustiveness checking for the code that is within the switch prongs.

I don’t see how that is an issue you can have a function call in a prong and for “fall through” you can continue to a different prong via Labeled-switch and continue.


I get the feeling like we all argue about slightly different problems.

I found the approach that the std.http module uses for Method funny.

Just re-interpret the bytes of the string as an integer. Only works if your string is short enough to fit into your integer type, tho. And I don’t think std.http ever actually uses this in a switch?

Switching on a []const u8 is a bit wonky, since that’s a pointer. Switching on fixed sized u8 array is worth implementing, I think. Four-C signatures are used quite frequently used in file formats. Being able to do something like this would be genuinely useful:

    const sig: [4]u8 = .{ ... };
    switch (sig) {
        "hhea" => {
            // ...
        },
        "hmtx" => {
            // ...
        },
        "cmap" => {
            // ...
        },
        "OS/2" => {
            // ...
        },
        else => {},
    }

In C, we typically treat the signature as an uint32_t. Most often than not that means the code ends up being dependent on a little-endian arrangement.

You can switch on static-length strings reinterpreted as integers in user space:

const sig: [4]u8 = .{ ... };
switch (std.mem.readInt(u32, &sig, .little)) {
    std.mem.readInt(u32, "hhea", .little) => {
        // ...
    },
    std.mem.readInt(u32, "hmtx", .little) => {
        // ...
    },
    std.mem.readInt(u32, "cmap", .little) => {
        // ...
    },
    std.mem.readInt(u32, "OS/2", .little) => {
        // ...
    },
    else => {},
}
3 Likes

Would @bitCast work here as well?

Yes, @as(u32, @bitCast("abcd".*)) also works.

2 Likes

Another way to tackle this would be pattern matching for arrays similar to that in C#. Maybe a robust pattern matching syntax could be considered for future versions of Zig but for now I’d personally rather see Zig 1.0 come to fruition before too much syntactic sugar gets added in.

what we really want is a multi conditional if

if str == "foo" {...}
       == "bar" {...}
       == "xyz" {...}
       >= "zzz" {...}
       <= "aaa" {...}
 else { 
   ...
}

That’s literally what a switch is, minus the >= <=, but you can get that with ranges, also not sure how that would work for non number types, like a string.

I like that Zig doesn’t hide the expense of code. A switch on strings is not at all the same thing as a switch on enums or native integer values, in terms of implementation, it’s something more expensive. Yes it’s true that one can write a switch on very large non-native integers, but how often would someone want to do that? Whereas switching on a string is natural enough if you’re accustomed to it from other languages.

The problem can be solved very much as the compiler would solve it, as others have noted, by using a StaticStringMap to turn the strings of interest into enums, perhaps with a default value if the string isn’t found in the map. Or, maybe, the string isn’t needed at all and enums could simply be used directly.

I’m playing along with the premise that Zig even has strings, but when you come down to it, it doesn’t. The reasons for that are not unrelated to the question at hand.

So breaking the problem into two parts like this has several advantages. For one, it’s a bit of friction which nudges toward using enums instead, which is optimally fast in a way which chained string comparisons can’t reach. For another, once you have the StaticStringMap, you can use it in other places. For a third, if you notice a lot of conversion of strings to their enum equivalents in many places, you can do that once, on the edge, and operate with enums in the heart of the program.

That’s what you want to be doing, so overall I would say that the lack of this feature is working as intended.

7 Likes