Can I generate switch cases at compile time?

TL;DR: Can I generate multiple cases of a switch expression at comptime based on a table/array?

I’m writing a simple string escaping function, which will expand particular characters from an input string into longer sequences. I’m currently doing this in two passes: first, I calculate the length of the output string. Then, I copy and expand as necessary.

This approach means I have to hardcode information about the special characters to be expanded in two different switch statements (in this example, it’s XML character entities):

for (source) |c| {
    out_chars += switch (c) {
        '<', '>' => 4,
        '"' => 6,
        '&' => 5,
        else => 1,
    };
}
for (source) |c| {
    switch (c) {
        '<' => { @memcpy(out[out_i..], "&lt;"); out_i += 4; },
        '>' => { @memcpy(out[out_i..], "&gt;"); out_i += 4; },
        '"' => { @memcpy(out[out_i..], "&quot;"); out_i += 6; },
        '&' => { @memcpy(out[out_i..], "&amp;"); out_i += 5; },
        else => { out[out_i] = c; out_i += 1; },
    }
}

Is there any way I could write a single table defining the special characters and their resultant expansions, i.e.. .{ .{ ‘<‘, “&lt;” }, .{ ‘>’, “&gt;” }, … }, and generate both of the appropriate switch cases entirely at compile time? I’ve found a few mentions of ‘inline else’ while searching around for this, but I’m not sure I fully understand how it’s useful for this sort of case.

Have a look here : Creating switch cases programmatically - #2 by ScottRedig

I get the impression that this would generate a case for every possible character, rather than just each “special” character I want to match, plus the default case. Does the compiler have some way of limiting this explosion of cases?

You can do something like this:

  inline ' '...'/', ':'...'@', '['...'`', '{'...'~' => |c| {

You can call it like this

switch (c) {
     inline  '<', '>', '"', '&' => |t| out_chars += comptime function(t),
     else  => out_chars += 1,
}

edit: ninja’d

2 Likes

I like this, it’s not quite as elegant as I’d hoped as I still have to list all the cases, but it’s much better than nothing.

Another Idea: Don’t use switch at all. Instead generate a comptime known table:

const expansions = [_]struct { u8, []const u8 }{
    .{ '<', "&lt;" }, .{ '>', "&gt;" },
};

const expansion_table = blk: {
    var table: [256][]const u8 = undefined;

    for (&table, 0..) |*string, c| {
        string.* = &.{c};
    }

    for (expansions) |expansion| {
        table[expansion[0]] = expansion[1];
    }
    break :blk table;
};

Now expansion_table[c] just contains the string. No need for unnecessary branches.

1 Like

This is certainly an option, but the table would span >4 cache lines, which is probably more expensive to load than a few branches.

In retrospect to the original question, though, I think I’d’ve been best off just writing a single switch statement in a function which returns the expanded sequence, or a slice in some temporary buffer containing the single character passed in, and then using that in both passes.

There’s also issue #21507 which I opened for ‘splatting’ arrays into switch prong options.

That will do exactly what you want it to, in the event that it’s implemented. Seems to be some support for the idea, I agree (obviously) that it would be a great addition to the language.

1 Like

I’d like to note that in retrospect, I really didn’t need to do this at all - I could always have just written a single switch statement in an inline function, rather than writing a table. This is probably usually the cleanest and most intuitive solution, so long as the desired else case should have similar behaviour to the special cases being switched on.

This is the function I wrote:

inline fn xmlExpandedEnt(char: *const u8) []const u8 {
    return switch (char.*) {
        '<' => "&lt;",
        '>' => "&gt;",
        '"' => "&quot;",
        '&' => "&amp;",
        else => @as(*const [1]u8, char),
    };
}

And then just used it in both places where I wanted to switch on the character:

for (source) |c| {
    out_chars += xmlExpandedEnt(&c).len;
}
for (source) |c| {
    const expansion = xmlExpandedEnt(&c);
    @memcpy(out[out_i..out_i + expansion.len], expansion);
    out_i += expansion.len;
}

I’m not sure if the performance will quite be equivalent to what I originally intended to do, but it should be “good enough.”

3 Likes

That’s basically how I did it, yep. For this case it’s ideal, it’s not like the XML / HTML escapables are ever going to change.

But for larger collections used in more switches, and especially when we start thinking about conditional variation, it would be pretty handy to be able to splat an array into a prong.

In fact… now I’m thinking, maybe, there should be an inline splat, where you can capture the index as a second capture, analogous to capturing the enum of a union in inline switches:

inline splatted[...] => |_, i| expansions[i],

Hmm. Maybe too much? Maybe not.

1 Like

As I stated in the topic linked, why use a switch statement when a for statement will do? My general rule is that if the optimizer has a decent chance of doing a better job than me, then let it to its job until proven with performance testing.

const assert = @import("std").debug.assert;
const std = @import("std");

const replacements: []const struct { u8, []const u8 } = &.{
    .{ '<', "&lt;" },
    .{ '>', "&gt;" },
    .{ '"', "&quot;" },
    .{ '&', "&amp;" },
};

const max_replacement_length = blk: {
    var max: usize = 1;
    for (replacements) |replacement| {
        max = @max(max, replacement[1].len);
    }
    break :blk max;
};

fn replace(source: []const u8, out: []u8) []u8 {
    assert(source.len * max_replacement_length <= out.len);

    var out_i: usize = 0;
    for (source) |c| {
        for (replacements) |replacement| {
            if (replacement[0] == c) {
                @memcpy(out[out_i .. out_i + replacement[1].len], replacement[1]);
                out_i += replacement[1].len;
                break;
            }
        } else {
            out[out_i] = c;
            out_i += 1;
        }
    }
    return out[0..out_i];
}

test {
    const source = "<script>alert(\"xss\");</script>";
    var out: [source.len * max_replacement_length]u8 = undefined;
    const result = replace(source, &out);

    try std.testing.expectEqualStrings("&lt;script&gt;alert(&quot;xss&quot;);&lt;/script&gt;", result);
}
3 Likes

Fyi, in inline is about comptime semantics, not function inlining like in c. The general advice is you should avoid using it to force function inlining unless proven by performance testing, as it actually removes options from the optimizer. I do think the xmlExpandedEnt solution is good, though.

Also, I realized after my previous post that the real answer for performance here is doing something with simd/@Vector, if you really wanted to go fast.

1 Like

You make a great point, the optimizer can almost definitely unroll that loop and generate code similar to what it would for a switch statement.

Regarding your second reply about vectorization, what do you have in mind? I do love a little unnecessary optimization sometimes.

I don’t understand why you would do that.

I also don’t understand the values implicit in the question: is a switch statement somehow fancier than a for loop? Is there some other reason to favor them?

If not, then we can ask the question the other way too, right: why use a for loop when a switch statement will do? But those can’t both make sense as a heuristic, so maybe neither statement is really going to help us.

Agreed, I would expect a worthwhile optimizer to see a call like this:

for (source) |c| {
    out_chars += xmlExpandedEnt(&c).len;
}

And “think”: ok, probably a good candidate for inlining, and hey, look at that, now we don’t need the strings here, just their lengths.

They both do the job. I think the one with the switch is several times clearer, but that’s personal preference. Something about how you presented it seems as though you think the one with the for loop is obviously better though. I’m not seeing it.

I think what is happening here is that my reply was to the OP question and the immediate responses. (I got notified because my reply to a different thread was linked.) I started writing my reply before the xmlExpandedEnt solution was proposed, which I think is a perfectly reasonable solution.

But to directly answer your first question anyways: I think a for loop checking a few cases with an if statement is simpler and more clear than trying to generate prongs to a switch statement. No I don’t think switch statements are fancier than for loops.

1 Like

Ah, a forum leapfrog. Yes that explains it completely.