Model2vec-zig: static text embeddings in pure Zig, in a single binary

[model2vec-zig]: static text embeddings compiled into a single binary


I was inspired by MinishLab’s Model2Vec (GitHub - MinishLab/model2vec: Fast State-of-the-Art Static Embeddings · GitHub) work, and wanted to push it into the application layer. That led me to build model2vec-zig (GitHub - PaytonWebber/model2vec-zig: Model2Vec inference in pure Zig. · GitHub), with the goal of compiling their potion embedding models into a single static binary, which Zig is especially well-suited for.

The reason I’m posting is the 4-bit quantization result, which I think is genuinely worth sharing. I read Google’s TurboQuant paper and applied it to the potion models: each embedding row gets rotated by a fixed random orthonormal matrix, then stored as 4-bit values with one scale per row. The rotation is never stored anywhere; cosine similarity doesn’t change under rotation, so the runtime never needs to know it happened. This takes the 129 MB retrieval-tuned potion model down to 16.4 MB. Before trusting any quality numbers, I first reproduced MinishLab’s published MTEB scores per-task on the same harness, then measured the cost: 0.0020 NDCG@10 at worst, 0.0005 or less on the other three suites. Mean pooling averages away per-token quantization noise, which is why static embedders compress this well. Full tables and methodology are in docs/turboquant.md in the repo.

Supported Zig versions

0.16.0

AI / LLM usage disclosure

I used LLM heavily for the implementation, the eval harness, under my direction. Rather than relying on my review alone, I verified the work against external references: output vectors match the Python implementation to under 1e-5, the i8 quantizer’s output is byte-identical to the reference quantizer’s, and MinishLab’s published MTEB scores were reproduced per-task. Everything is reproducible with scripts/mteb_eval.py in the repo.