I am looking for a fast algorithm to move 64 byte sized enums into 32 bytes, knowing that each of the 64 source enums <= 15.
I can do it with a manual loop but looking for something better. Maybe SIMD?
I have actually found myself having to do the opposite (expanding packed bits like this) recently, and I believe the algorithm should be easy to reverse.
Tbh, this is one of those situations where I would trust the compiler (…but of course check the assembly output).
At least Clang turns this simple straightforward loop into an impressively looking unrolled chunk of SIMD code (depending on the target cpu model) - my SIMD foo is not enough to tell whether the code isn’t doing anything stupid though
…I would expect that Zig with the LLVM backend produces identical code.
PS: wait … bug… I forgot the shift, but except for a bigger lookup table in the march=native version the code hasn’t grown.