Find the nth set bit

I don’t think this is possible but I will ask anyway.
In my program I loop through the bits of a u64 with:

pub fn bitloop(bitboard: *u64) ?u6 {
    if (bitboard.* == 0) return null;
    defer bitboard.* &= (bitboard.* - 1);
    return @intCast(@ctz(bitboard.*));
}

in a lot of routines.

Now I need a trick (for a crazy compression trick).
Is it actually possible to directly get the nth set bit?

for example for this reading from LSB-MSB from left to right for simplicity:

00001101 // our bits
01234567 // bitnumbers

I would like to get the bitindex for the second set bit, which should give 5.
or the bitindex for the third set bit which is 7.

without looping through the bits. Is that possible?

3 Likes

This will be easy if pdep/pext gets merged

2 Likes
// Note: This requires x86-64 and the BMI2 instruction set.
pub fn getNthSetBit(bitboard: u64, n: u6) ?u6 {
    // Return null if we are asking for a bit that doesn't exist
    if (@popCount(bitboard) <= n) return null;
    
    const nth_bit = @as(u64, 1) << n;
    
    // PDEP (Parallel Bit Deposit) magic via inline assembly
    const isolated = asm (
        "pdep %[mask], %[val], %[out]"
        : [out] "=r" (-> u64),
        : [val] "r" (nth_bit),
          [mask] "r" (bitboard),
    );
    
    return @intCast(@ctz(isolated));
}

This uses the hardware PDEP instruction to deposit a single shifted bit directly into the n-th set bit of your bitboard, then counts the trailing zeros. It’s $O(1)$ and insanely fast, but it requires a CPU with BMI2 support (Intel Haswell+, AMD Zen 3+).

6 Likes

Fantastic!
asm far beyond my skillset :wink:

1 Like

I might be wrong but i feel like there is a native method to get the first set bit in a integer.

The count leading zeros function will give you the index of the next 1 bit. well you’d have to use the bit size of the type to clear it. but this would be fairly efficient to put into a loop. and it uses fast assembly instructions if they are available.

@clz is the opposite of @ctz. What I needed it the nth bit. Not the LSB or MSB.

You can use @ctz to find the nth 1 bit. See the next function in BitSetIterator here:
https://ziglang.org/documentation/master/std/#src/std/bit_set.zig

If you’re using the std bit set, you could just call the iterator’s next function n times. Or you could copy and modify the iterator code to fit your needs.

Thanks. I know bit_set.
But my bitloop is faster and just one little function.
It is called many million times per second.

Ok. But my reply was to your statement about @ctz not giving you the right results.

If you’re looking for the nth-bit without assembly…

fn getNthSetBit(bitboard: u64, n: u6) u7 {
    var bb = bitboard;
    for (0..n-1) |_| {
        bb = bb & (bb - 1);
    }   
    return @ctz(bb);
}

For small fixed values of n this optimises pretty well.

For some value of ‘well’, I suppose.

It just unrolls the loop. High values of n are less efficient than low ones.

Yep true. the nth is variable.

I implemented branchless rank and select in C a long time ago, 2009. Parallel algorithm without loops. Not claiming it actually makes sense though, especially given hardware alternatives!

The requested algorithm would be select, and rank is the reverse: how many set bits up to a position.

2 Likes

Here, I just ported it to Zig and tested. However, this counts from MSB instead of from LSB like the earlier pdep version does, and r_in and the result differ by one after reversing due to zero-based vs one-based indexing. So my code effectively returns:

    const result = getNthSetBit(reverse(v), @truncate(r_in - 1)) orelse 63;
    return result + 1;

Where reverse() will reverse the bits so MSB becomes LSB and vice versa. Here it is:

pub fn selectBitRank64_msb(v: u64, r_in: u7) u7 {
    var r: u32 = r_in;

    const all: u64 = ~@as(u64, 0);

    const a: u64 = v -% ((v >> 1) & (all / 3));
    const b: u64 = (a & (all / 5)) +% ((a >> 2) & (all / 5));
    const c: u64 = (b +% (b >> 4)) & (all / 0x11);
    const d: u64 = (c +% (c >> 8)) & (all / 0x101);

    var t: u32 = @intCast((d >> 32) +% (d >> 48));
    var s: u7 = 64;

    var diff: u32 = t -% r;
    s -= @intCast((diff & 256) >> 3);
    r -= (t & (diff >> 8));

    t = @intCast((d >> @intCast(s - 16)) & 0xff);
    diff = t -% r;
    s -= @intCast((diff & 256) >> 4);
    r -= (t & (diff >> 8));

    t = @intCast((c >> @intCast(s - 8)) & 0x0f);
    diff = t -% r;
    s -= @intCast((diff & 256) >> 5);
    r -= (t & (diff >> 8));

    t = @intCast((b >> @intCast(s - 4)) & 0x07);
    diff = t -% r;
    s -= @intCast((diff & 256) >> 6);
    r -= (t & (diff >> 8));

    t = @intCast((a >> @intCast(s - 2)) & 0x03);
    diff = t -% r;
    s -= @intCast((diff & 256) >> 7);
    r -= (t & (diff >> 8));

    t = @intCast((v >> @intCast(s - 1)) & 0x01);
    diff = t -% r;
    s -= @intCast((diff & 256) >> 8);

    return 65 - s;
}

I think it can be made to count from LSB without having to reverse the bits, if you need it for some reason :smiley:

2 Likes

An optimization here you could count the first N 1 bits so you can skip them:
Mask = ((1<<63)>>N)); // gives a n leading 1s eg 1111100000…
Nopt = N - @popCount(Mask & value) //Counts 1 bits in that region
getNthSetBit((~Mask) & value, Nopt)
this isn’t quite right but you get the idea.

My goto page for such things is the ‘Bit Twiddling Hacks’ page:

…and the patterns described there also often get special compiler treatment, e.g. note how this dumb ‘find MSB bit’ code is optimized to replace the loop with an lzcnt instruction:

(this optimization happens down in LLVM and should also work in Zig - assuming LLVM backend).

Yeah that’s a great source of tricks. I maintain a Zig version of that fwiw.

Oho. I will save that one if I need more bit tricks.

1 Like