SIMD Vector to Mask for Substring Search

wrapitup · December 22, 2023, 2:09am

Hi, I’m trying to implement a SIMD substring search based on the following post: simd-strfind. There’s a particular part in the code where the following instrinsic is called: _mm256_movemask_epi8.

My understanding is that you can take a vector and create a mask using the most significant bit in the byte (or whichever type you use)

 // for example:
const x: @Vector(8, u8) = [_] {1, 1, 1, 1, 0, 0, 0, 0};
// expected output: u4 = 0b11110000

Now, I’m honestly just a beginner in using SIMD and the I found openmymind and std/simd.zig to be very helpful in learning the basics. I think I came up with a solution with the following routine:

inline fn movemask(v: @Vector(32, bool)) u32 {
    const mask: @Vector(32, u32) = comptime blk: {
        var out: [32]u32 = undefined;
        for (0..32) |i| {
            out[i] = 1 << (31 - i);
        }
        break :blk out;
    };
    const i: @Vector(32, u32) = std.simd.iota(u32, 32);
    const r = @select(u32, v, i, mask);
    return ~@reduce(.Or, r);
}

The thing is that, this really is a single instruction on x86: vpmovmskb but I can’t seem to emit that instruction when putting this into compiler explorer.

Is there anyway I can adjust my routine to get the instruction I want? Or will it require using the inline assembly?

(title edit to reflect question is mostly about the mask).

Validark · December 23, 2023, 2:45pm

Movmask doesn’t typically need to be written so explicitly. Just produce a vector of bools and @bitCast it to an integer.

wrapitup · December 23, 2023, 7:32pm

Thanks! I validated that it indeed compiles to the instruction I was looking for.

fn movemask(mask: @Vector(8, u8)) u8 {
    const zeros: @Vector(8, u8) = [_]u4{0} ** 8;
    return @bitCast(mask != zeros);
}

test movemask {
    const a: @Vector(8, u8) = [_]u8{ 1, 1, 0, 0, 1, 1, 0, 0 };
    try std.testing.expect(movemask(a) == 0b00110011);
}

movemask:
        vpxor  xmm1, xmm1, xmm1
        vpcmpeqb        xmm0, xmm0, xmm1
        vpmovmskb       eax, xmm0
        not     eax
        ret