Sharing code between scalar and vector types

Couple questions about how folk are approaching working with vector types for SIMD work.

Say I have some complex function that takes a number of f64 arguments. I want to avoid writing two implementations of the same logic, one for scalar f64 and one for @Vector(n, f64). I have a few questions / thoughts on this below.

One solution: use a signature that takes anytype, and use type reflection to detect vector types. This is doable, but stuff like scalar mutliplication is scalar * x in scalar land and the lovely @as(Vec, @splat(3.0)) * x in vector-land, so you have to do a lot of these comptime checks.

Another solution I’ve been playing with is writing everything as vector only, and special casing scalar calls as vectors of length 1. For example

fn typedMultiply(T: type, x: T, y: T) T {
    return x * y;
}


export fn multiplyAsVec(x: f64, y: f64) f64 {
    return typedMultiply(@Vector(1, f64), @splat(x), @splat(y))[0];
    
}

export fn multiplyAsScalar(x: f64, y: f64) f64 {
    return typedMultiply(f64, x, y);
}

I was interested in whether there is overhead on doing the vector to scalar conversion - so ran this through to look at the assembly. Interestingly, it’s really exactly the same (to my very, very untrained eye).

example.multiplyAsScalar:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 304
        vmovsd  qword ptr [rbp - 296], xmm0
        vmovsd  qword ptr [rbp - 288], xmm1
        lea     rax, [rbp - 280]
        mov     qword ptr [rbp - 16], rax
        mov     qword ptr [rbp - 8], 32
        mov     qword ptr [rbp - 24], 0
        lea     rdi, [rbp - 24]
        call    example.typedMultiply__anon_482
        add     rsp, 304
        pop     rbp
        ret

example.typedMultiply__anon_482:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        vmovsd  qword ptr [rbp - 16], xmm0
        vmovsd  qword ptr [rbp - 8], xmm1
        vmulsd  xmm0, xmm0, xmm1
        add     rsp, 16
        pop     rbp
        ret

example.multiplyAsVec:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 304
        vmovsd  qword ptr [rbp - 304], xmm0
        vmovsd  qword ptr [rbp - 296], xmm1
        lea     rax, [rbp - 288]
        mov     qword ptr [rbp - 24], rax
        mov     qword ptr [rbp - 16], 32
        mov     qword ptr [rbp - 32], 0
        lea     rdi, [rbp - 32]
        call    example.typedMultiply__anon_491
        vmovsd  qword ptr [rbp - 8], xmm0
        vmovsd  xmm0, qword ptr [rbp - 8]
        add     rsp, 304
        pop     rbp
        ret

example.typedMultiply__anon_491:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        vmovsd  qword ptr [rbp - 16], xmm0
        vmovsd  qword ptr [rbp - 8], xmm1
        vmulsd  xmm0, xmm0, xmm1
        add     rsp, 16
        pop     rbp
        ret

My question then is: is this a sensible pattern? Or am I going to run into gotchas where vector-based code will use slower instruction sets on scalars or things like this?

Please do throw in any other tips for this kind of work as well :slight_smile: thanks all!