There is an accepted proposal Indexing arrays with vectors (gather) #12815 which if i understand correctly would allow one to shuffle (or permute) an @Vector with a runtime control.
However I cannot find any active development on that issue. Meantime the posted workaround on that page is no longer working with the latest zig version.
I tried to implement a runtime shuffle in C for my situation (x86_64 with instruction set up to AVX2) to then use it as a library in zig.
I found it was not entirely trivial; For avx2 there are only 2 intrinsics that allow a cross-lane permute with a runtime variable: _mm256_permutevar8x32_ps and _mm256_permutevar8x32_epi32. So shuffling anything else requires more work. Some examples show below.
__m256d my_shuffle256_D(__m256d V, __m256i mask) {
// Runtime shuffle of 4xdouble by using intrinsic for 8xfloat
// Need to double the mask indices and create the neighbouring index
// to keep the 2xf32 slots together that form an f64
__m256i masklo = _mm256_add_epi64(mask, mask);
__m256i maskhi = _mm256_add_epi64(masklo, _mm256_set1_epi64x(1));
// Shift result left, OR them together with original,
__m256i shufmask = _mm256_or_si256(masklo, _mm256_slli_epi64(maskhi, 32));
// do the shuffle
__m256d out = (__m256d)(_mm256_permutevar8x32_ps((__m256)V, shufmask));
return out;
}
__m256i my_shuffle256_B(__m256i in, __m256i index) {
// Runtime shuffle of 32*byte
// create second vector with values from the other lane
__m256i in_hihi = _mm256_permute2x128_si256(in, in, 2 << 4 | 2 << 0);
__m256i in_lolo = _mm256_permute2x128_si256(in, in, 1 << 4 | 1 << 0);
// shuffle hi and lo
__m256i ins = _mm256_shuffle_epi8(in_hihi, index);
__m256i nis = _mm256_shuffle_epi8(in_lolo, index);
// blend values from correct section
__m256i mask = _mm256_cmpgt_epi8(index, _mm256_set1_epi8(0x0F));
__m256i out = _mm256_blendv_epi8(ins, nis, mask);
return out;
}
This approach however currently doesn’t work as translate-c is unable to process the required <immintrin.h> header (as asked in my other question here Translate C lib using <immintrin.h>).
I created an issue at translate-c, but does anybody have any other ideas how to do this?