Best representation for a 225 bits bitboard

I am making a scrabble program, where board-representation is a bit like the one in a chess program.
A chess board is very conveniently 64 squares, so there we can use a 64 bits number for all kinds of processing.
For example in chess we can have a (as so called) 64 bits bitboard attacked_squares.
Looping over the attacked_squares is convenient and fast getting @ctz for the next filled bit and then clearing that bit. (or in Zig we can use a std bitset iterator for that).
Furthermore all kinds of and / xor / or tricks can be done on these.

The scrabble board however is 225 square (15x15). What would be the fastest equivalent ā€œbitboardā€ for that in Zig?

  • u225
  • some @Vector
  • something else?

Updating these bitboards should be as fast as possible, because that happens many millions of times per second.

Unsure how it’d perform for your use case, but have you looked at std.bit_set.ArrayBitSet?

(if you use std.bit_set.StaticBitSet(225), it’ll use ArrayBitSet(usize, 225) internally)

If nothing else, might give you some ideas.

2 Likes

Tbh I would just use a u225 and write a couple of helper functions to set and clear bits instead of fuzzing around with fancy stdlib types :wink:

Performance-wise every option should compile down to the same bit twiddling code though (there might at best be a perf difference in debug mode with more ā€˜abstract’ approaches)

PS: what might affect performance though is how you arrange the bits, since that u225 will be split into 4 u64 values (on 64-bit CPUs). E.g. it might help to arrange bits so that common access patterns work within the same 64-bit cluster.

1 Like

True, I did something like that in c# when writing something for ā€˜Blokus’ that tile board game.

I also expected this, but if you look at the implementation of ArrayBitSet, they use an array of usize instead of one large type, I wonder why that is.

The generated code for operations on large ints isn’t actually that clever a lot of the time, here’s a comparison of the u225 vs @Vector(4, u64) approaches: https://godbolt.org/z/88hfEdKn9

2 Likes

i suspected so. quite some difference. sometimes you have to do the bit fiddling yourself :slight_smile: thx for that code

Hmm, when I experimented with wide unsigned integers in my emulator project, the generated code for bit twiddling looked quite as I would have expected (e.g. as if the operations would be applied manually on multiple u64s - or if you stay in a single u64 ā€˜cluster’ the code would look the same as working on a single u64, e.g. see here and search for ā€˜Using wide integers with bit twiddling code is fast’:

Oh yeah, there’s definitely a lot of cases where the compiler optimizes wide ints well, it’s just that the ā€˜reasoning’ required to optimize the shift + xor in my example is a lot more complex.

In the cases you describe, the range of bits in the wide int that are accessed/modified by the operation is known at compile time, making the reduction of the wide operations into narrower ones easy.

With my example on the other hand, the compiler would have to:

  • See that the right hand side of the xor was produced by shifting 1 and infer that it only has one set bit and that you could easily compute the set bit’s location using the shift amount
  • Based on this, infer that the operation only affects a single 64-bit chunk of the wide int, and that the location of said chunk and the portion of the right hand side that aligns with the chunk can be easily computed
  • Calculate the offset of the affected 64-bit chunk in terms of the shift amount
  • Calculate the value to be xored with the 64-bit chunk in terms of the shift amount
  • Ensure that all of this is cheaper than the regular approach

This certainly isn’t impossible for a compiler to do, but it may be hard to justify putting in all the work to implement this for a relatively niche case.

2 Likes