Tracking issue:
Hi, perhaps youāve already studied this but I looked into it a little bit and I have some findings, so Iāll share just in case. I found the following function, which I believe is responsible for this optimisation in LLVM:
In particular there is this little section that attempts to make one jump table for the whole range. Note that this seems to be the only thing that it does in debug mode and any attempts at splitting the switch into subsections seem to be only done with optimisations on.
// Cheap case: the whole range may be suitable for jump table.
if (TLI->isSuitableForJumpTable(SI, NumCases, Range, PSI, BFI)) {
CaseCluster JTCluster;
if (buildJumpTable(Clusters, 0, N - 1, SI, SL, DefaultMBB, JTCluster)) {
Clusters[0] = JTCluster;
Clusters.resize(1);
return;
}
}
// The algorithm below is not suitable for -O0.
if (TM->getOptLevel() == CodeGenOptLevel::None)
return;
In theory isSuitableForJumpTable is a virtual function, but there seems to be only one implementation for it and itās very simple:
It mostly just checks if the density of the switch is greater (or equal) than the minimum density, where the density is the ratio of the number of cases to the range of values. Looking further the density limit seems to be 10% by default (or 40% if optimising for size):
/// Minimum jump table density for normal functions.
static cl::opt<unsigned>
JumpTableDensity("jump-table-density", cl::init(10), cl::Hidden,
cl::desc("Minimum density for building a jump table in "
"a normal function"));
/// Minimum jump table density for -Os or -Oz functions.
static cl::opt<unsigned> OptsizeJumpTableDensity(
"optsize-jump-table-density", cl::init(40), cl::Hidden,
cl::desc("Minimum density for building a jump table in "
"an optsize function"));
In order to test this I made the following program:
pub fn main() !void {
var input: u64 = 5;
var output: u64 = 129847139810; // arbitrary number to search for
switch (input) {
1 => {
output = 1;
},
2 => {
output = 2;
},
3 => {
output = 3;
},
4 => {
output = 4;
},
5 => {
output = 5;
},
6 => {
output = 6;
},
7 => {
output = 7;
},
8 => {
output = 8;
},
9 => {
output = 9;
},
100 => {
output = 100;
},
else => {
output = 200;
},
}
input = 6;
}
There are 10 cases (the default case is not included in the count according to my testing) and the range of values is 100 (max_value - min_value + 1), therefore the density is exactly 10% which should be just enough to still pass the >= condition. I compile with zig build-exe main.zig -fllvm -femit-asm (zig version 0.14.1) and sure enough it LLVM creates a jump table for the switch. However when I just change the last case from 100 to 101, the density now becomes slightly lower than 10%, which indeed leads the LLVM to fall back to just generating a separate branch for each case. Note that even though the first few cases are consecutive, LLVM doesnāt look for such cases with optimisations off and just goes all or nothing.
Conclusions:
- LLVM is happy to make a jump table for the whole switch with gaps as long as at least 10% of the value range is covered with cases (or 40% with size optimisations)
- if optimisations are off, LLVM does not attempt to split sparse switches into dense subsections and either makes a jump table for the whole thing or not at all
Perhaps this is āold newsā to the many who have been using the master branch, but as someone who waited until 0.15 to make the switch from 0.14 (wanted to wait for the WriterGate dust to settle), I am surprised that there isnāt more discussion about the significant speed increase in the new backend, or perhaps I just didnāt see it beyond a few mentions.
A couple weeks into using it, and I am still delighted every time I run a build to see just how much it has improved compile times. As someone who is so often guilty of printf-debugging, I am so used to the ācompile ā wait a bit ā run ā view outputā practice that I am still surprised each time zig build run feels nearly instantaneous, and seems more like ārun ā view outputā.
Canāt wait for it to hit the platforms I actually work with.
Do you know if thereās already a ticket for this problem? It might make sense to just copy Clangs behaviour, Iāve been relying on fast switch-case in a lot of my C code, and Clang always seems to do a stellar job optimizing big switch-case statements in C - also in debug mode (and I spent a lot of time staring at the assembler output lol)