Find the nth set bit

ericlang · February 20, 2026, 1:48pm

I don’t think this is possible but I will ask anyway.
In my program I loop through the bits of a u64 with:

pub fn bitloop(bitboard: *u64) ?u6 {
    if (bitboard.* == 0) return null;
    defer bitboard.* &= (bitboard.* - 1);
    return @intCast(@ctz(bitboard.*));
}

in a lot of routines.

Now I need a trick (for a crazy compression trick).
Is it actually possible to directly get the nth set bit?

for example for this reading from LSB-MSB from left to right for simplicity:

00001101 // our bits
01234567 // bitnumbers

I would like to get the bitindex for the second set bit, which should give 5.
or the bitindex for the third set bit which is 7.

without looping through the bits. Is that possible?

cryptocode · February 20, 2026, 1:58pm

This will be easy if pdep/pext gets merged

enzogagarin · February 20, 2026, 2:09pm

// Note: This requires x86-64 and the BMI2 instruction set.
pub fn getNthSetBit(bitboard: u64, n: u6) ?u6 {
    // Return null if we are asking for a bit that doesn't exist
    if (@popCount(bitboard) <= n) return null;
    
    const nth_bit = @as(u64, 1) << n;
    
    // PDEP (Parallel Bit Deposit) magic via inline assembly
    const isolated = asm (
        "pdep %[mask], %[val], %[out]"
        : [out] "=r" (-> u64),
        : [val] "r" (nth_bit),
          [mask] "r" (bitboard),
    );
    
    return @intCast(@ctz(isolated));
}

This uses the hardware PDEP instruction to deposit a single shifted bit directly into the n-th set bit of your bitboard, then counts the trailing zeros. It’s $O(1)$ and insanely fast, but it requires a CPU with BMI2 support (Intel Haswell+, AMD Zen 3+).

ericlang · February 20, 2026, 3:25pm

Fantastic!
asm far beyond my skillset

taylorh140 · February 20, 2026, 3:43pm

I might be wrong but i feel like there is a native method to get the first set bit in a integer.

The count leading zeros function will give you the index of the next 1 bit. well you’d have to use the bit size of the type to clear it. but this would be fairly efficient to put into a loop. and it uses fast assembly instructions if they are available.

ericlang · February 20, 2026, 5:50pm

@clz is the opposite of @ctz. What I needed it the nth bit. Not the LSB or MSB.

jumpnbrownweasel · February 20, 2026, 9:01pm

You can use @ctz to find the nth 1 bit. See the next function in BitSetIterator here:
https://ziglang.org/documentation/master/std/#src/std/bit_set.zig

If you’re using the std bit set, you could just call the iterator’s next function n times. Or you could copy and modify the iterator code to fit your needs.

ericlang · February 20, 2026, 11:53pm

Thanks. I know bit_set.
But my bitloop is faster and just one little function.
It is called many million times per second.

jumpnbrownweasel · February 20, 2026, 11:59pm

Ok. But my reply was to your statement about @ctz not giving you the right results.

WeeBull · February 22, 2026, 3:17pm

If you’re looking for the nth-bit without assembly…

fn getNthSetBit(bitboard: u64, n: u6) u7 {
    var bb = bitboard;
    for (0..n-1) |_| {
        bb = bb & (bb - 1);
    }   
    return @ctz(bb);
}

For small fixed values of n this optimises pretty well.

mnemnion · February 23, 2026, 5:07pm

For some value of ‘well’, I suppose.

It just unrolls the loop. High values of n are less efficient than low ones.

ericlang · February 25, 2026, 11:06am

Yep true. the nth is variable.

jjrv · February 25, 2026, 4:58pm

I implemented branchless rank and select in C a long time ago, 2009. Parallel algorithm without loops. Not claiming it actually makes sense though, especially given hardware alternatives!

The requested algorithm would be select, and rank is the reverse: how many set bits up to a position.

jjrv · February 25, 2026, 6:02pm

Here, I just ported it to Zig and tested. However, this counts from MSB instead of from LSB like the earlier pdep version does, and r_in and the result differ by one after reversing due to zero-based vs one-based indexing. So my code effectively returns:

    const result = getNthSetBit(reverse(v), @truncate(r_in - 1)) orelse 63;
    return result + 1;

Where reverse() will reverse the bits so MSB becomes LSB and vice versa. Here it is:

pub fn selectBitRank64_msb(v: u64, r_in: u7) u7 {
    var r: u32 = r_in;

    const all: u64 = ~@as(u64, 0);

    const a: u64 = v -% ((v >> 1) & (all / 3));
    const b: u64 = (a & (all / 5)) +% ((a >> 2) & (all / 5));
    const c: u64 = (b +% (b >> 4)) & (all / 0x11);
    const d: u64 = (c +% (c >> 8)) & (all / 0x101);

    var t: u32 = @intCast((d >> 32) +% (d >> 48));
    var s: u7 = 64;

    var diff: u32 = t -% r;
    s -= @intCast((diff & 256) >> 3);
    r -= (t & (diff >> 8));

    t = @intCast((d >> @intCast(s - 16)) & 0xff);
    diff = t -% r;
    s -= @intCast((diff & 256) >> 4);
    r -= (t & (diff >> 8));

    t = @intCast((c >> @intCast(s - 8)) & 0x0f);
    diff = t -% r;
    s -= @intCast((diff & 256) >> 5);
    r -= (t & (diff >> 8));

    t = @intCast((b >> @intCast(s - 4)) & 0x07);
    diff = t -% r;
    s -= @intCast((diff & 256) >> 6);
    r -= (t & (diff >> 8));

    t = @intCast((a >> @intCast(s - 2)) & 0x03);
    diff = t -% r;
    s -= @intCast((diff & 256) >> 7);
    r -= (t & (diff >> 8));

    t = @intCast((v >> @intCast(s - 1)) & 0x01);
    diff = t -% r;
    s -= @intCast((diff & 256) >> 8);

    return 65 - s;
}

I think it can be made to count from LSB without having to reverse the bits, if you need it for some reason

taylorh140 · March 12, 2026, 3:15pm

An optimization here you could count the first N 1 bits so you can skip them:
Mask = ((1<<63)>>N)); // gives a n leading 1s eg 1111100000…
Nopt = N - @popCount(Mask & value) //Counts 1 bits in that region
getNthSetBit((~Mask) & value, Nopt)
this isn’t quite right but you get the idea.

floooh · March 13, 2026, 9:12am

My goto page for such things is the ‘Bit Twiddling Hacks’ page:

…and the patterns described there also often get special compiler treatment, e.g. note how this dumb ‘find MSB bit’ code is optimized to replace the loop with an lzcnt instruction:

(this optimization happens down in LLVM and should also work in Zig - assuming LLVM backend).

cryptocode · March 13, 2026, 9:33am

Yeah that’s a great source of tricks. I maintain a Zig version of that fwiw.

ericlang · March 13, 2026, 9:41am

Oho. I will save that one if I need more bit tricks.