Support switch on string variable

Why does the switch syntax not support string variables like other languages do? Surely any ambiguity in the type can be resolved at compile time. It would be great to avoid a complex if-else-if ladder.

switch is a construct that is expected to result into a optimized lookup table when compiled to a machine code. If you want to lookup strings, you should either use a hashmap, or do something like:

const Thing = enum{
   accept,
   decline,
};
const thing = std.meta.stringToEnum(Thing, mystr) orelse @panic("welp");
switch (thing) {
   .accept => {},
   .decline => @panic("declined"),
}
9 Likes

There is also still ambiguity in the identity of a string. Are strings only equal if they share the same pointer? Is a null terminated string equal to an unterminated string of the same content?

2 Likes

In addition to the above answers, the FAQ page in the ziglang/zig Wiki directly addresses this question:

Why is switching on []u8 (strings) not supported?

In summary, Jimmi made a good attempt at implementing a StringSwitch in comptime and concluded that good old chained if statements were fastest.

For details see match.zig . Note that switching on variable identifiers as const strings is easy with the @"" syntax.

3 Likes

Golang support this.

switch os {
	case "darwin":
		fmt.Println("OS X.")
	case "linux":
		fmt.Println("Linux.")
	default:
		// freebsd, openbsd,
		// plan9, windows...
		fmt.Printf("%s.\n", os)
	}
}

Comparing strings is never going to be as fast as a jump table. With runtime defined strings there’s potentially quite a lot of processing to do in an if-then-else chain and could lead to weird behaviour like the first case being checked faster than the final one.

For Go, it sounds like they’re happy with that compromise but it wouldn’t be appropriate for all types of systems - which is why I think Zig makes you take an extra step.

https://groups.google.com/g/golang-nuts/c/IURR4Z2SY7M/m/R7ORD_yDix4J?pli=1

4 Likes

One thing I did in a C++ code bade at work was to hash the strings into integers and then switch on that. I think it would translate pretty well into zig.
Probably wouldn’t want it in a hot loop tho.

While it is not an appropriate situation for every scenario, if you only requiring mapping a string to a value of another kind, you can take advantage of a the std.StaticStringMap, a comptime hashmap implementation.

Here is an example:

/// An enum representing video ranges.
pub const VideoRange = enum {
    unknown,
    sdr,
    hdr,

    /// Parses a string into a VideoRange, returning `null` upon failure.
    /// Accepts names both `snake_case` and `PascalCase` naming conventions.
    pub fn parse(str: []const u8) ?VideoRange
        const hashmap = comptime try std.StaticStringMap(VideoRange).init(.{
            { "Unknown", .unknown },
            { "unknown", .unknown },
            { "SDR", .sdr },
            { "sdr", .sdr },
            { "HDR", .hdr },
            { "hdr", .hdr },
        };

        return hashmap.get(str);
    }

    /// Returns the name of the enumeration value.
    pub fn name(self: VideoRange) []conts u8 {
        return switch (self) {
            .unknown => "Unknown",
            .sdr => "SDR",
            .hdr => "HDR",
        };
    }
};
6 Likes

Seems like you’ve written this from memory since there are a few typos in your code, but just to clarify, this should be:

const hashmap = std.StaticStringMap(VideoRange).initComptime(.{
    .{ "Unknown", .unknown },
    // ...
});
1 Like

Seems like you’ve written this from memory

Actually it was just generated from a few lines of Ruby in a little one-off script. Thanks for the spot, I fixed it up.

How does this compare in terms of performance to using a chain of if statements with std.mem.eql?

Why are you switching on strings anyway? Are you sure it’s not better to use an enum?

4 Likes

I don’t see how the conclusion was reached. The evidence presented seems to support the opposite. Supporting switch on u8 is actually easy. The compiler just needs to produce the equivalent “good old chained if statements” and the result would be fairly optimal.

2 Likes

I don’t think it needs to be hardcoded into the compiler as a language feature when the combination of std.meta.stringToEnum + switch gives you essentially the same thing, but packaged in a way where you still can easily change it to a different implementation based on your actual inputs and set of actual strings.

I think making it a language feature would make it less likely that people explore alternatives (e.g. perfect hash function + stencilvector, custom simd code).

Note that std.meta.stringToEnum already constructs a lookup table using StaticStringMap (for small enums) and falls back to using if for bigger ones.

// Using StaticStringMap here is more performant, but it will start to take too
// long to compile if the enum is large enough, due to the current limits of comptime
// performance when doing things like constructing lookup maps at comptime.
// TODO The ‘100’ here is arbitrary and should be increased when possible:
// - improve comptime performance to roughly, generally the same as CPython execution speed of equivalent Python code ¡ Issue #4055 ¡ ziglang/zig ¡ GitHub
// - Optimise stringToEnum ¡ Issue #3863 ¡ ziglang/zig ¡ GitHub

6 Likes

Some alternative approaches as well: The Zig Pastebin

3 Likes

I think one caveat in the HashMatcher is that you can’t use a run-time needle that wasn’t provided as comptime keys, because else you could have a hash collision at run-time, giving you a false enum value instead of null.

So I think it isn’t as general / working with any string as input / solving the same problem.
Is keys meant to be the set of valid inputs and the hashing expects no collisions at runtime?
Otherwise I think the code would have to check that the needle matches the key, once the hash matches.

Also I am not sure if hashing is all that useful because it is a constant overhead you always have to do upfront, with comparing two strings you can early exit if they have different lengths or differ along the way (basically what StaticStringMap does), with hashing you have to compute the full hash first.

I think a better approach may be to use interned strings / symbols, which are basically like enums but expandable at run time, that way you just have symbolFromString function which interns a string and after that you can compare symbols by identity because all equal symbols share that identity.

The string interning itself would need some way to unify/store the symbols and some implementations of that may use hashing, but you also could use a prefix trie (with or without hashing).

Personally, I’m more interested in being able to use switch on a tuple. It does happen where one needs to switch on a pair of value. Right now, the scenario where the primary value matches but the secondary value does not has to be handled using label switch. That’s ugly as hell, and especially so because you’d need to deliberately continue to a unhandled value in order to get into the else clause.

Since we can use switch on a u1024, the compiler clearly has the ability to handle comparison larger than the machine word. Switching on string is just a matter of making it variable length.

stringToEnum() is not a good solution when the set is large and you’re only interested in a few of the possible values. This happens frequently enough. File extensions for example. Or mime types. Or language codes. Values you’re not interested in would needlessly show up in the string table.

2 Likes

There is proposal for allowing switch on packed structs. std.Http bitcasts HTTP method to u32 and switches on that.

1 Like

Yes that is indeed a problem the hash matcher would not work if the runtime input can collide.

1 Like

not the most relevant to the discussion but regarding this:

const std = @import("std");
const BigE = enum {
    a,
    b,
    c,
    d,
    // super big enum :3
};

pub fn main() !void {
    const str = "c";
    const e = stringToEnumSet(BigE, str, &.{ .a, .b, .d });
    std.debug.print("{?}", .{e});
}

fn stringToEnumSet(comptime E: type, s: []const u8, comptime set: []const E) ?E {
    if (set.len <= 100) {
        const kvs = comptime build_kvs: {
            const EnumKV = struct { []const u8, E };
            var kvs_array: [set.len]EnumKV = undefined;
            for (set, 0..) |enumField, i| {
                kvs_array[i] = .{ @tagName(enumField), enumField };
            }
            break :build_kvs kvs_array[0..];
        };
        const map = std.StaticStringMap(E).initComptime(kvs);
        return map.get(s);
    } else {
        inline for (set) |enumField| {
            if (std.mem.eql(u8, s, @tagName(enumField))) {
                return enumField;
            }
        }
        return null;
    }
}

tada :3

1 Like