Self-Hosted x86 Backend is Now Default in Debug Mode

andrewrk · June 10, 2025, 10:42pm

Tracking issue:

szczoo17 · August 13, 2025, 6:50pm

Hi, perhaps you’ve already studied this but I looked into it a little bit and I have some findings, so I’ll share just in case. I found the following function, which I believe is responsible for this optimisation in LLVM:

github.com/llvm/llvm-project

llvm/lib/CodeGen/SwitchLoweringUtils.cpp

e861e4957


      
          uint64_t
          SwitchCG::getJumpTableNumCases(const SmallVectorImpl<unsigned> &TotalCases,
                                         unsigned First, unsigned Last) {
            assert(Last >= First);
            assert(TotalCases[Last] >= TotalCases[First]);
            uint64_t NumCases =
                TotalCases[Last] - (First == 0 ? 0 : TotalCases[First - 1]);
            return NumCases;
          }
          
          void SwitchCG::SwitchLowering::findJumpTables(CaseClusterVector &Clusters,
                                                        const SwitchInst *SI,
                                                        std::optional<SDLoc> SL,
                                                        MachineBasicBlock *DefaultMBB,
                                                        ProfileSummaryInfo *PSI,
                                                        BlockFrequencyInfo *BFI) {
          #ifndef NDEBUG
            // Clusters must be non-empty, sorted, and only contain Range clusters.
            assert(!Clusters.empty());
            for (CaseCluster &C : Clusters)
              assert(C.Kind == CC_Range);

In particular there is this little section that attempts to make one jump table for the whole range. Note that this seems to be the only thing that it does in debug mode and any attempts at splitting the switch into subsections seem to be only done with optimisations on.

  // Cheap case: the whole range may be suitable for jump table.
  if (TLI->isSuitableForJumpTable(SI, NumCases, Range, PSI, BFI)) {
    CaseCluster JTCluster;
    if (buildJumpTable(Clusters, 0, N - 1, SI, SL, DefaultMBB, JTCluster)) {
      Clusters[0] = JTCluster;
      Clusters.resize(1);
      return;
    }
  }

  // The algorithm below is not suitable for -O0.
  if (TM->getOptLevel() == CodeGenOptLevel::None)
    return;

In theory isSuitableForJumpTable is a virtual function, but there seems to be only one implementation for it and it’s very simple:

github.com/llvm/llvm-project

llvm/lib/CodeGen/TargetLoweringBase.cpp

b87879373


      
              if (!llvm::has_single_bit<uint32_t>(NewVTSize.getKnownMinValue()))
                NewVTSize = NewVTSize.coefficientNextPowerOf2();
              return NumVectorRegs*(NewVTSize/DestVT.getSizeInBits());
            }
          
            // Otherwise, promotion or legal types use the same number of registers as
            // the vector decimated to the appropriate level.
            return NumVectorRegs;
          }
          
          bool TargetLoweringBase::isSuitableForJumpTable(const SwitchInst *SI,
                                                          uint64_t NumCases,
                                                          uint64_t Range,
                                                          ProfileSummaryInfo *PSI,
                                                          BlockFrequencyInfo *BFI) const {
            // FIXME: This function check the maximum table size and density, but the
            // minimum size is not checked. It would be nice if the minimum size is
            // also combined within this function. Currently, the minimum size check is
            // performed in findJumpTable() in SelectionDAGBuiler and
            // getEstimatedNumberOfCaseClusters() in BasicTTIImpl.
            const bool OptForSize =

It mostly just checks if the density of the switch is greater (or equal) than the minimum density, where the density is the ratio of the number of cases to the range of values. Looking further the density limit seems to be 10% by default (or 40% if optimising for size):

/// Minimum jump table density for normal functions.
static cl::opt<unsigned>
    JumpTableDensity("jump-table-density", cl::init(10), cl::Hidden,
                     cl::desc("Minimum density for building a jump table in "
                              "a normal function"));

/// Minimum jump table density for -Os or -Oz functions.
static cl::opt<unsigned> OptsizeJumpTableDensity(
    "optsize-jump-table-density", cl::init(40), cl::Hidden,
    cl::desc("Minimum density for building a jump table in "
             "an optsize function"));

In order to test this I made the following program:

pub fn main() !void {
    var input: u64 = 5;
    var output: u64 = 129847139810; // arbitrary number to search for
    switch (input) {
        1 => {
            output = 1;
        },
        2 => {
            output = 2;
        },
        3 => {
            output = 3;
        },
        4 => {
            output = 4;
        },
        5 => {
            output = 5;
        },
        6 => {
            output = 6;
        },
        7 => {
            output = 7;
        },
        8 => {
            output = 8;
        },
        9 => {
            output = 9;
        },
        100 => {
            output = 100;
        },
        else => {
            output = 200;
        },
    }
    input = 6;
}

There are 10 cases (the default case is not included in the count according to my testing) and the range of values is 100 (max_value - min_value + 1), therefore the density is exactly 10% which should be just enough to still pass the >= condition. I compile with zig build-exe main.zig -fllvm -femit-asm (zig version 0.14.1) and sure enough it LLVM creates a jump table for the switch. However when I just change the last case from 100 to 101, the density now becomes slightly lower than 10%, which indeed leads the LLVM to fall back to just generating a separate branch for each case. Note that even though the first few cases are consecutive, LLVM doesn’t look for such cases with optimisations off and just goes all or nothing.

Conclusions:

LLVM is happy to make a jump table for the whole switch with gaps as long as at least 10% of the value range is covered with cases (or 40% with size optimisations)
if optimisations are off, LLVM does not attempt to split sparse switches into dense subsections and either makes a jump table for the whole thing or not at all

ForeverZer0 · August 31, 2025, 6:07pm

Perhaps this is “old news” to the many who have been using the master branch, but as someone who waited until 0.15 to make the switch from 0.14 (wanted to wait for the WriterGate dust to settle), I am surprised that there isn’t more discussion about the significant speed increase in the new backend, or perhaps I just didn’t see it beyond a few mentions.

A couple weeks into using it, and I am still delighted every time I run a build to see just how much it has improved compile times. As someone who is so often guilty of printf-debugging, I am so used to the “compile → wait a bit → run → view output” practice that I am still surprised each time zig build run feels nearly instantaneous, and seems more like “run → view output”.

tsdtas · September 1, 2025, 9:01am

Can’t wait for it to hit the platforms I actually work with.

floooh · September 1, 2025, 9:21am

Do you know if there’s already a ticket for this problem? It might make sense to just copy Clangs behaviour, I’ve been relying on fast switch-case in a lot of my C code, and Clang always seems to do a stellar job optimizing big switch-case statements in C - also in debug mode (and I spent a lot of time staring at the assembler output lol)