Zig provides a basic set of operations on vectors: Documentation - The Zig Programming Language . Is it possible to use LLVM instructions or some other form of direct, perhaps platform dependent intrinsics, to get direct access to the CPU SIMD instructions, which are not implemented in the standard library? For example gather/scatter instructions, i.e. LLVM Language Reference Manual ā LLVM 23.0.0git documentation .
I have not used Zig yet, just kicking the tires.
Here is an example of dynamic shuffle that Zig does not support, but LLVM does:
// Backend detection - only use LLVM intrinsics with LLVM backend
const use_llvm_intrinsics = switch (builtin.zig_backend) {
.stage2_llvm => true,
else => false,
};
const use_inline_asm = true;
// StreamVByte shuffle implementation with multi-tier fallback
// Uses pshufb/vpshufb behavior: high bit set in mask -> output 0
fn shuffle(x: Vu8x16, m: Vu8x16) Vu8x16 {
if (use_llvm_intrinsics and has_ssse3) {
// Use LLVM intrinsic - compiles to single pshufb/vpshufb
const builtin_fn = struct {
extern fn @"llvm.x86.ssse3.pshuf.b.128"(Vu8x16, Vu8x16) Vu8x16;
}.@"llvm.x86.ssse3.pshuf.b.128";
return builtin_fn(x, m);
} else if (use_inline_asm and has_ssse3) {
// Use inline assembly fallback
var result = x;
asm ("pshufb %[mask], %[result]"
1 Like
rpkak
May 31, 2026, 7:49am
3
Maybe this will be possible with builtins in the future.
opened 12:08PM - 07 Apr 18 UTC
contributor friendly
proposal
accepted
[Current Progress](https://github.com/ziglang/zig/issues/903#issuecomment-459508⦠820)
-----
SIMD is very useful for fast processing of data and given Zig's goals of going fast, I think we need to look at how exposing some way of using these instructions easily and reliably.
### Status-Quo
#### Inline Assembly
It is possible to do simd in inline-assembly as is. This is a bit cumbersome though and I think we should strive for being able to get any speed performances in the zig language itself.
#### Rely on the Optimizer
The optimizer is good and comptime unrolling and support helps a lot, but it doesn't provide guarantees that any specific code will be vectorized. You are at the mercy of LLVM and you don't want to see your code lose a huge hit in performance simply due to a compiler upgrade/change.
### LLVM Vector Intrinsics
LLVM supports [vector types](https://llvm.org/docs/LangRef.html#vector-type) as first class objects in it's ir. These correspond to simd instructions. This provides the bulk of the work and for us, we simply need to expose a way to construct these vector types. This would be analagous to the `__attribute__((vector))__` builtin found in C compilers.
---
If anyone has any thoughts on the implementation and or usage then that would be great since I'm not very familiar with how these are exposed by LLVM. It would be great to get some discussion going in this area since I'm sure people would like to be able to match the performance of C in all areas with Zig.
If I understand this correctly, there are two ways available now to use a not supported SIMD instruction: extern fn and asm(), right?
Yes. Iām honestly not sure if there is an advantage of using the LLVM intrisics over plain inline asm, maybe you could just use asm and have it work on all compiler backends.
In theory LLVM could know a bunch of things about the intrinsics and optimize stuff like reordering them as it chooses, choosing registers or even do constant folding - of course, keeping semantics. Raw asm blocks are AFAIK just treated as a black box and the optimizations a compiler is able to do with them depend largely on what clobbers and parameters you set.
2 Likes