Does Zig IR have Sum of Absolute Differences operation for vectors?

grep SAD -r didn’t show anything in zig. Question came from learning about GCC’s tree vectors with SAD_EXPR.

I’m not aware of a builtin that does this, so I’d assume there is no functionality for it.
What do you need this for? Is this an actual instruction on some particular hardware that you own?

I only ask out of curiosity. I don’t need this for my zig projects for now.

Zig uses LLVM as a backend. If you are asking if it’s possible to emit a sum of absolute difference instruction, there are three ways of accomplishing that. You can rely on LLVM to optimize a set of operations that can become a single instruction, you can use an intrinsic, or you can use inline assembly. If there’s a specific instruction you want to get, feel free to contact me at any time and I’ll help you.

4 Likes

I literally just ran into this use case, where I’m trying to get Zig + LLVM to compile to reasonable SAD instructions, but I just can’t quite trick it to do the right thing.

I’m not sure if this is a Zig optimization issue, an LLVM optimization issue, or perhaps both.

Here are my two attempts to get an optimized result, and both are failing (extra instructions are emitted than should be necessary): Compiler Explorer

Here’s an inline example of the code I’m trying to optimize:

const block_width = 8;
const block_height = 1;
const T = u8;
const VTT = @Vector(block_width, T);

fn load(comptime VT: type, src: []const @typeInfo(VT).vector.child, offset: usize) VT {
    return src[offset..][0..@typeInfo(VT).vector.len].*;
}

pub fn absDiff(a: anytype, b: anytype) @TypeOf(a, b) {
    return @max(a, b) - @min(a, b);
}

export fn sad(noalias srcp: [*]const T, noalias refp: [*]const T, height: usize, stride: usize) u64 {
   const src = srcp[0..height * stride];
   const ref = refp[0..height * stride];
   var sum: u64 = 0;

   for (0..block_height) |row| {
      const s = load(VTT, src, row * stride);
      const r = load(VTT, ref, row * stride);

      sum += @reduce(.Add, absDiff(s,r));
   }
   return sum;
}

and here’s what LLVM spits out:

sad:
        push    rbp
        mov     rbp, rsp
        vmovq   xmm0, qword ptr [rdi]
        vmovq   xmm1, qword ptr [rsi]
        vpminub xmm2, xmm0, xmm1 // unnecessary?
        vpmaxub xmm0, xmm0, xmm1 // unnecessary?
        vpxor   xmm1, xmm1, xmm1 // unnecessary?
        vpsubb  xmm0, xmm0, xmm2 // unnecessary?
        vpsadbw xmm0, xmm0, xmm1
        vpextrb eax, xmm0, 0
        pop     rbp
        ret

The issue (I believe) is that there’s a whole chunk of instructions dedicated to computing the absolute difference, but since vpsadbw computes the absolute difference itself as part of the operation, we’re essentially computing the absolute difference twice.

Am I reading this code correctly? Any ideas on how to optimize this further? Or should I open an issue with Zig or LLVM?

1 Like

It seems LLVM isn’t figuring out that you are wanting SAD, vpxor ... vpsadbw is what any @reduce(.Add) gets compiled to (in my messing of arounds)

In general, you are welcome to open an issue if there is a vector operation that you are unable to express in the language. Pretty much any vector operation is intended to be directly expressible either by arithmetic operations on vectors, or via builtins such as @reduce and @shuffle.

Thank you very much for your responses, I’ve gone ahead and opened an issue on the Zig tracker to get the ball rolling: SAD operation cannot be optimally expressed · Issue #24125 · ziglang/zig · GitHub

I did find that LLVM has some SAD-pattern matching already, so we’re likely close: Detects the SAD pattern on X86 so that much better code will be emitt… · llvm/llvm-project@6f879d9 · GitHub