Self-Hosted x86 Backend is Now Default in Debug Mode

Tracking issue:

12 Likes

Hi, perhaps you’ve already studied this but I looked into it a little bit and I have some findings, so I’ll share just in case. I found the following function, which I believe is responsible for this optimisation in LLVM:

In particular there is this little section that attempts to make one jump table for the whole range. Note that this seems to be the only thing that it does in debug mode and any attempts at splitting the switch into subsections seem to be only done with optimisations on.

  // Cheap case: the whole range may be suitable for jump table.
  if (TLI->isSuitableForJumpTable(SI, NumCases, Range, PSI, BFI)) {
    CaseCluster JTCluster;
    if (buildJumpTable(Clusters, 0, N - 1, SI, SL, DefaultMBB, JTCluster)) {
      Clusters[0] = JTCluster;
      Clusters.resize(1);
      return;
    }
  }

  // The algorithm below is not suitable for -O0.
  if (TM->getOptLevel() == CodeGenOptLevel::None)
    return;

In theory isSuitableForJumpTable is a virtual function, but there seems to be only one implementation for it and it’s very simple:

It mostly just checks if the density of the switch is greater (or equal) than the minimum density, where the density is the ratio of the number of cases to the range of values. Looking further the density limit seems to be 10% by default (or 40% if optimising for size):

/// Minimum jump table density for normal functions.
static cl::opt<unsigned>
    JumpTableDensity("jump-table-density", cl::init(10), cl::Hidden,
                     cl::desc("Minimum density for building a jump table in "
                              "a normal function"));

/// Minimum jump table density for -Os or -Oz functions.
static cl::opt<unsigned> OptsizeJumpTableDensity(
    "optsize-jump-table-density", cl::init(40), cl::Hidden,
    cl::desc("Minimum density for building a jump table in "
             "an optsize function"));

In order to test this I made the following program:

pub fn main() !void {
    var input: u64 = 5;
    var output: u64 = 129847139810; // arbitrary number to search for
    switch (input) {
        1 => {
            output = 1;
        },
        2 => {
            output = 2;
        },
        3 => {
            output = 3;
        },
        4 => {
            output = 4;
        },
        5 => {
            output = 5;
        },
        6 => {
            output = 6;
        },
        7 => {
            output = 7;
        },
        8 => {
            output = 8;
        },
        9 => {
            output = 9;
        },
        100 => {
            output = 100;
        },
        else => {
            output = 200;
        },
    }
    input = 6;
}

There are 10 cases (the default case is not included in the count according to my testing) and the range of values is 100 (max_value - min_value + 1), therefore the density is exactly 10% which should be just enough to still pass the >= condition. I compile with zig build-exe main.zig -fllvm -femit-asm (zig version 0.14.1) and sure enough it LLVM creates a jump table for the switch. However when I just change the last case from 100 to 101, the density now becomes slightly lower than 10%, which indeed leads the LLVM to fall back to just generating a separate branch for each case. Note that even though the first few cases are consecutive, LLVM doesn’t look for such cases with optimisations off and just goes all or nothing.

Conclusions:

  • LLVM is happy to make a jump table for the whole switch with gaps as long as at least 10% of the value range is covered with cases (or 40% with size optimisations)
  • if optimisations are off, LLVM does not attempt to split sparse switches into dense subsections and either makes a jump table for the whole thing or not at all
8 Likes

Perhaps this is ā€œold newsā€ to the many who have been using the master branch, but as someone who waited until 0.15 to make the switch from 0.14 (wanted to wait for the WriterGate dust to settle), I am surprised that there isn’t more discussion about the significant speed increase in the new backend, or perhaps I just didn’t see it beyond a few mentions.

A couple weeks into using it, and I am still delighted every time I run a build to see just how much it has improved compile times. As someone who is so often guilty of printf-debugging, I am so used to the ā€œcompile → wait a bit → run → view outputā€ practice that I am still surprised each time zig build run feels nearly instantaneous, and seems more like ā€œrun → view outputā€.

12 Likes

Can’t wait for it to hit the platforms I actually work with.

Do you know if there’s already a ticket for this problem? It might make sense to just copy Clangs behaviour, I’ve been relying on fast switch-case in a lot of my C code, and Clang always seems to do a stellar job optimizing big switch-case statements in C - also in debug mode (and I spent a lot of time staring at the assembler output lol)