Hey everyone,
I’d like to share a hobby project I’ve been working on for a while – zig-torch.
I started this project about a year ago with the idea of creating a PyTorch extension written in Zig. Back then, fueled by enthusiasm (and perhaps some favorable micro-benchmarks), I even posted on Reddit claiming I achieved a 94% performance improvement over the original implementation.
Realitt and a deeper dive into the subject (especially after my university exams) verified those initial results. Currently, I treat this project as a testing ground to learn Zig, explore low-level optimization (SIMD), and handle FFI with Python.
What is zig-torch?
It’s an attempt to write tensor operations (primarily matrix multiplication, mm) in pure Zig and expose them to Python so they can function independently or alongside PyTorch.
What’s currently working:
-
Build System:
build.zigcompiles the code into a shared library (.so/.dll), which Python loads viactypes. This is probably the most enjoyable part of working with Zig—the tooling just works. -
Matrices: The
zig_mmimplementation utilizes tiling (cache blocking) and SIMD vectorization (@Vector(8, f32)). -
Integration: A simple C API (
callconv(.c)) allows for painless pointer exchange with NumPy/PyTorch libraries.
Performance (The Elephant in the Room):
Right now, my implementation beats pure NumPy in certain scenarios, but going up against PyTorch’s optimized C++ backend (utilizing MKL/BLAS) is a tough battle.
For 1024x1024 matrices, Zig currently lags behind Torch, which really highlights just how complex numerical engineering is “under the hood.” Despite this, for smaller operations the results are promising, and the satisfaction of writing my own kernel is huge.
Roadmap:
-
Implement multithreading (currently running single-threaded).
-
Better handling of “edge” cases in memory blocking.
-
Refactoring comments (some are currently in Polish; I’m migrating everything to English).
The code is available here: https://github.com/kitajusSus/zig-torch
I’d love to get some feedback on the project structure and build.zig, as well as any tips on how to squeeze more out of @Vector for matrix multiplication.
Cheers!
zig wins against solo numpy
[!IMPORTANT]
without TORCH, ONLY NUMPY
python tests/benchmark.py
Size M×K × K×N Torch (ms) NumPy (ms) Zig (ms) Zig vs Torch Zig vs NumPy Correct
---------------------------------------------------------------------------------------------------
32×32 × 32×32 n/a 0.019 0.018 n/a 1.04x True
64×64 × 64×64 n/a 0.143 0.055 n/a 2.62x True
128×128 × 128×128 n/a 1.178 0.358 n/a 3.29x True
256×256 × 256×256 n/a 9.086 2.778 n/a 3.27x True
512×512 × 512×512 n/a 69.996 23.019 n/a 3.04x True
1024×1024 × 1024×1024 n/a 553.027 196.582 n/a 2.81x True
1024×512 × 512×256 n/a 72.071 22.113 n/a 3.26x True