Is it possible to use LLVM instructions for vector operations not implemented in the standard library?

pauljurczak · May 31, 2026, 7:27am

Zig provides a basic set of operations on vectors: Documentation - The Zig Programming Language. Is it possible to use LLVM instructions or some other form of direct, perhaps platform dependent intrinsics, to get direct access to the CPU SIMD instructions, which are not implemented in the standard library? For example gather/scatter instructions, i.e. LLVM Language Reference Manual — LLVM 23.0.0git documentation.

I have not used Zig yet, just kicking the tires.

lalinsky · May 31, 2026, 7:41am

Here is an example of dynamic shuffle that Zig does not support, but LLVM does:

github.com/acoustid/acoustid-index

src/streamvbyte.zig

main


      
          
          // Backend detection - only use LLVM intrinsics with LLVM backend
          const use_llvm_intrinsics = switch (builtin.zig_backend) {
              .stage2_llvm => true,
              else => false,
          };
          const use_inline_asm = true;
          
          // StreamVByte shuffle implementation with multi-tier fallback
          // Uses pshufb/vpshufb behavior: high bit set in mask -> output 0
          fn shuffle(x: Vu8x16, m: Vu8x16) Vu8x16 {
              if (use_llvm_intrinsics and has_ssse3) {
                  // Use LLVM intrinsic - compiles to single pshufb/vpshufb
                  const builtin_fn = struct {
                      extern fn @"llvm.x86.ssse3.pshuf.b.128"(Vu8x16, Vu8x16) Vu8x16;
                  }.@"llvm.x86.ssse3.pshuf.b.128";
                  return builtin_fn(x, m);
              } else if (use_inline_asm and has_ssse3) {
                  // Use inline assembly fallback
                  var result = x;
                  asm ("pshufb %[mask], %[result]"

rpkak · May 31, 2026, 7:49am

Maybe this will be possible with builtins in the future.

github.com/ziglang/zig

Add SIMD Support

opened 12:08PM - 07 Apr 18 UTC

tiehuis

contributor friendly proposal accepted

[Current Progress](https://github.com/ziglang/zig/issues/903#issuecomment-459508…820) ----- SIMD is very useful for fast processing of data and given Zig's goals of going fast, I think we need to look at how exposing some way of using these instructions easily and reliably. ### Status-Quo #### Inline Assembly It is possible to do simd in inline-assembly as is. This is a bit cumbersome though and I think we should strive for being able to get any speed performances in the zig language itself. #### Rely on the Optimizer The optimizer is good and comptime unrolling and support helps a lot, but it doesn't provide guarantees that any specific code will be vectorized. You are at the mercy of LLVM and you don't want to see your code lose a huge hit in performance simply due to a compiler upgrade/change. ### LLVM Vector Intrinsics LLVM supports [vector types](https://llvm.org/docs/LangRef.html#vector-type) as first class objects in it's ir. These correspond to simd instructions. This provides the bulk of the work and for us, we simply need to expose a way to construct these vector types. This would be analagous to the `__attribute__((vector))__` builtin found in C compilers. --- If anyone has any thoughts on the implementation and or usage then that would be great since I'm not very familiar with how these are exposed by LLVM. It would be great to get some discussion going in this area since I'm sure people would like to be able to match the performance of C in all areas with Zig.

pauljurczak · May 31, 2026, 9:37pm

If I understand this correctly, there are two ways available now to use a not supported SIMD instruction: extern fn and asm(), right?

lalinsky · May 31, 2026, 9:41pm

Yes. I’m honestly not sure if there is an advantage of using the LLVM intrisics over plain inline asm, maybe you could just use asm and have it work on all compiler backends.

pzittlau · May 31, 2026, 11:57pm

In theory LLVM could know a bunch of things about the intrinsics and optimize stuff like reordering them as it chooses, choosing registers or even do constant folding - of course, keeping semantics. Raw asm blocks are AFAIK just treated as a black box and the optimizations a compiler is able to do with them depend largely on what clobbers and parameters you set.