Compressing 64 bytes into 32

I am looking for a fast algorithm to move 64 byte sized enums into 32 bytes, knowing that each of the 64 source enums <= 15.
I can do it with a manual loop but looking for something better. Maybe SIMD?

fn compress(src: []const Piece) [32]byte 
{
   // splat it?
}

This is the enum

pub const Piece = enum(u4)
{
    no_piece = 0,
    w_pawn = 1,
    w_knight = 2,
    w_bishop = 3,
    w_rook = 4,
    w_queen = 5,
    w_king = 6,

    b_pawn = 9,
    b_knight = 10,
    b_bishop = 11,
    b_rook = 12,
    b_queen = 13,
    b_king = 14,
};

I have actually found myself having to do the opposite (expanding packed bits like this) recently, and I believe the algorithm should be easy to reverse.

In my search I have come across this blog post: How fast can you bit-interleave 32-bit integers? – Daniel Lemire's blog the linked source code even contains an implementation of the reverse algorithm, and it can be easily changed to work with packs of 4 bits instead of (de-)interleaving single bits.
Furthermore there also is a SIMD version of the same blog in case you need to go that far: How fast can you bit-interleave 32-bit integers? (SIMD edition) – Daniel Lemire's blog

Aha, yes I need expanding as well.
I cannot read c-code / asm very well :slight_smile: This is too difficult.
I should really dive into @Vector etc…

Tbh, this is one of those situations where I would trust the compiler (…but of course check the assembly output).

At least Clang turns this simple straightforward loop into an impressively looking unrolled chunk of SIMD code (depending on the target cpu model) - my SIMD foo is not enough to tell whether the code isn’t doing anything stupid though :wink:

…I would expect that Zig with the LLVM backend produces identical code.

PS: wait … bug… I forgot the shift, but except for a bigger lookup table in the march=native version the code hasn’t grown.

1 Like

Aha aha. I will do some check there as well.