// Note: This requires x86-64 and the BMI2 instruction set.
pub fn getNthSetBit(bitboard: u64, n: u6) ?u6 {
// Return null if we are asking for a bit that doesn't exist
if (@popCount(bitboard) <= n) return null;
const nth_bit = @as(u64, 1) << n;
// PDEP (Parallel Bit Deposit) magic via inline assembly
const isolated = asm (
"pdep %[mask], %[val], %[out]"
: [out] "=r" (-> u64),
: [val] "r" (nth_bit),
[mask] "r" (bitboard),
);
return @intCast(@ctz(isolated));
}
This uses the hardware PDEP instruction to deposit a single shifted bit directly into the n-th set bit of your bitboard, then counts the trailing zeros. It’s $O(1)$ and insanely fast, but it requires a CPU with BMI2 support (Intel Haswell+, AMD Zen 3+).
I might be wrong but i feel like there is a native method to get the first set bit in a integer.
The count leading zeros function will give you the index of the next 1 bit. well you’d have to use the bit size of the type to clear it. but this would be fairly efficient to put into a loop. and it uses fast assembly instructions if they are available.
If you’re using the std bit set, you could just call the iterator’s next function n times. Or you could copy and modify the iterator code to fit your needs.
I implemented branchless rank and select in C a long time ago, 2009. Parallel algorithm without loops. Not claiming it actually makes sense though, especially given hardware alternatives!
The requested algorithm would be select, and rank is the reverse: how many set bits up to a position.
Here, I just ported it to Zig and tested. However, this counts from MSB instead of from LSB like the earlier pdep version does, and r_in and the result differ by one after reversing due to zero-based vs one-based indexing. So my code effectively returns:
const result = getNthSetBit(reverse(v), @truncate(r_in - 1)) orelse 63;
return result + 1;
Where reverse() will reverse the bits so MSB becomes LSB and vice versa. Here it is:
pub fn selectBitRank64_msb(v: u64, r_in: u7) u7 {
var r: u32 = r_in;
const all: u64 = ~@as(u64, 0);
const a: u64 = v -% ((v >> 1) & (all / 3));
const b: u64 = (a & (all / 5)) +% ((a >> 2) & (all / 5));
const c: u64 = (b +% (b >> 4)) & (all / 0x11);
const d: u64 = (c +% (c >> 8)) & (all / 0x101);
var t: u32 = @intCast((d >> 32) +% (d >> 48));
var s: u7 = 64;
var diff: u32 = t -% r;
s -= @intCast((diff & 256) >> 3);
r -= (t & (diff >> 8));
t = @intCast((d >> @intCast(s - 16)) & 0xff);
diff = t -% r;
s -= @intCast((diff & 256) >> 4);
r -= (t & (diff >> 8));
t = @intCast((c >> @intCast(s - 8)) & 0x0f);
diff = t -% r;
s -= @intCast((diff & 256) >> 5);
r -= (t & (diff >> 8));
t = @intCast((b >> @intCast(s - 4)) & 0x07);
diff = t -% r;
s -= @intCast((diff & 256) >> 6);
r -= (t & (diff >> 8));
t = @intCast((a >> @intCast(s - 2)) & 0x03);
diff = t -% r;
s -= @intCast((diff & 256) >> 7);
r -= (t & (diff >> 8));
t = @intCast((v >> @intCast(s - 1)) & 0x01);
diff = t -% r;
s -= @intCast((diff & 256) >> 8);
return 65 - s;
}
I think it can be made to count from LSB without having to reverse the bits, if you need it for some reason
An optimization here you could count the first N 1 bits so you can skip them:
Mask = ((1<<63)>>N)); // gives a n leading 1s eg 1111100000…
Nopt = N - @popCount(Mask & value) //Counts 1 bits in that region
getNthSetBit((~Mask) & value, Nopt)
this isn’t quite right but you get the idea.
My goto page for such things is the ‘Bit Twiddling Hacks’ page:
…and the patterns described there also often get special compiler treatment, e.g. note how this dumb ‘find MSB bit’ code is optimized to replace the loop with an lzcnt instruction:
(this optimization happens down in LLVM and should also work in Zig - assuming LLVM backend).