Scalar-valued reverse mode autodiff in Zig

nurpax · November 4, 2023, 12:23pm

Sharing my Zig weekend project that re-implements Andrej Karpathy’s Micrograd in Zig.

You can find the source code here: GitHub - nurpax/zigrograd: Micrograd in Zig

It’s a neural network engine that can automatically compute gradients for arbitrary scalar-valued expressions. I provide an example that trains a 3-layer neural network to classify hand-written digits (MNIST).

It’s not intended for real use, it was mainly a learning experience for me. I think it turned out quite nicely in Zig. I especially like how well the “arena allocator” pattern works for a machine learning training loop: use two separate arenas, one for “init” that allocates model parameters and other more persistent state, and one “forward” arena that’s used within the training loop to run the forward pass and backpropagation. The latter allocations are reset after each trained minibatch.

A good place to start reading the code is the training loop: zigrograd/src/main.zig at main · nurpax/zigrograd · GitHub

I hope you find it an interesting.

AndrewCodeDev · November 4, 2023, 4:42pm

It’s great to see more projects like this in Zig. I’ve just completed an autograd system myself that I’m thinking about sharing at some point in the future.

Machine learning is stuck in dependency hell and the build systems are a nightmare. Many of the large open-source systems regularly have broken components that irregularly crash at runtime. I honestly hope Zig makes some more moves in this direction.

Great stuff

nurpax · November 18, 2023, 5:11pm

BTW, I upgraded the scalar-valued “zigrograd” to vector-valued: GitHub - nurpax/zigrograd: Micrograd in Zig

It implements a “mini numpy” in src/ndarray.zig that’s used for all the math ops. The math lib mostly implements what I happened to need in Zigrograd and may be buggy if you use it for something else. But it might interesting to some. I’d also be curious to hear about any improvements to how to make it nicer to use.

Backprop operates on tensors, so it’s a lot faster than the original scalar-based implementation.