Oma - runtime SIMD dispatch for Zig

Zig’s @Vector is quite nice, but its not without its issues. The most “in my face” one I encountered recently was trying to target multiple architectures / cpu instruction sets with a single function invocation.

oma (one-man-army; s/o all my mw2 veterans) is an attempt to automate the painful parts of trying to target multiple architectures, and hopefully make using pure zig for these things a bit easier.

Repo: GitHub - ATTron/oma: Runtime SIMD dispatch for Zig. Compile once per CPU level, pick the best at startup


Why this exists

When I wrote about my SIMD experience with my other project astroz , it made its way onto lobst.rs , and @mitchellh pointed out that Zig still lacks an easy way to compile functions for multiple targets. For Ghostty, he ended up dropping into C++ and using Google’s highway because it handles this headache pretty well but as he said so himself:

“I’d really prefer to use Zig, though. :)”

@desttinghim ’s Dispatching SIMD functions at runtime worked out the core technique I piggy backed off: compile with different targets, link them together, select at runtime. The discussion there points to the long living issue living here: Proposal: Function multi-versioning · Issue #1018 · ziglang/zig · GitHub . I wanted to see, rather than attempting a big compiler update (which tbh would terrify me), how much work it would be to do this as a library dependency instead.

oma attemps to package all of that into a single build.zig dependency


Supported levels

x86-64: x86_64 → x86_64_v2 → x86_64_v3 → x86_64_v4 (SSE2 through AVX-512)
AArch64: aarch64 → aarch64_sve → aarch64_sve2 (NEON through SVE2)

I choose to target what I figured was probably 90% of the market, but if I’m missing something please open an issue or do a pr to add it! :slight_smile:


What it looks like

  1. Write a function:
// src/dot_product.zig
pub fn dot(a: @Vector(4, f32), b: @Vector(4, f32)) callconv(.c) f32 {
   return @reduce(.Add, a * b);
}
  1. Wire it up in build.zig:
// build.zig
oma.addMultiVersion(oma_dep, exe, .{
  .source = b.path(“src/dot_product.zig”),
  .name = “dot_product”,
});
  1. Dispatch at runtime:
const dot = oma.resolveFrom(dot_product, “dot”, io);
const result = dot(a, b); // uses AVX2/AVX-512/SVE/… depending on the CPU

Requires Zig 0.16.0-dev* or later

To be clear: none of the individual pieces here are new. All of the techniques this is using already exists and has been tried out. That’s exactly what Mitchell described doing for Ghostty before switching to Highway. My goal with oma was to take that known path and make it a zig fetch away instead of a needing to do a bespoke implementation each time you want to use it.

This is still very much new and I havent given it a proper kick around so im sure im missing something, or something is broken, etc, so please open PRs, Issues, etc if you encounter them

Thanks for checking it out! :slight_smile:


12 Likes