# Neural Networks in Zig

Hello again, I hope everyone is doing well!

I’m aiming to do a series of posts where we build well-established machine learning models and utilities. These could include anything from semantic search to transformers (I’m particularly excited about building MAMBA). I’d like to kick it off today with a simple one: Neural Networks.

Instead of making this a detailed tutorial on how they work, I’d like to show you how easy it is to build using Metaphor. If you’d like to follow along, you can find the full example file in my library: Metaphor/src/examples/feedforward.zig at main · andrewCodeDev/Metaphor · GitHub

# Feed-Forward

Neural networks are series of matrix multiplications, followed by a vector addition and then run through an activation function. Activation functions bring in non-linearity to the calculation which allows for modelling things that can’t be captured by merely placing straight lines. For instance, think think of the parabola created by `f(x) = x^2` - the line bends upwards at both ends, so we need that kind of behaviour if we want to capture that shape.

Basically, we need something that goes like : `y = f(A.x + b)` where we can repeat this operation continually. Each time we repeat the operation, we’ll call that a layer. Here’s how that looks:

``````pub fn FeedForward(comptime Tag: mp.scalar.Tag) type {
return struct {
const Self = @This();
const T = mp.scalar.Tag.asType(Tag);

W: mp.types.LeafTensor(T, .wgt),
b: mp.types.LeafTensor(T, .wgt),
y: mp.types.NodeTensor(T) = undefined,
alpha: f16,

pub fn init(G: *mp.Graph, m: usize, n: usize) Self {
return .{
.W = G.tensor(.wgt, Tag, mp.Rank(2){ m, n }),
.b = G.tensor(.wgt, Tag, mp.Rank(1){ m }),
.alpha = 1.0 / @as(f16, @floatFromInt(n)),
};
}

pub fn forward(self: *Self, x: anytype) mp.types.NodeTensor(T) {
self.y = mp.ops.selu(mp.ops.linearScaled(self.W, x, self.alpha, self.b, "ij,j->i"));

self.y.detach();

return self.y;
}

pub fn reverse(self: *Self, cleanup: bool) void {
self.y.reverse(if (cleanup) .free else .keep);
}
};
}
``````

That’ll do it! Not too shabby.

You may notice that I’m setting alpha to `1/n` - that’s common and used to trim down the size of the resulting values… it stops them from skyrocketing. I’m using `selu` as an activation function (Scaled Exponential Linear Unit), but this is optional. Selu has good reversal properties, has no worries of causing an over/underflow, and is cheap to compute.

# Building the Network

To add some more information to our network, we can repeat multiple feed-forwards one after another, passing the output of one as the input to another. Sounds like an array (if you ask me, lol). Sure enough, an array of these will certainly do the job. I’ve also added two additional ones called `head` and `tail` (the array is `body`). The `head` block first projects the incoming vector to a bigger one (if I use a matrix with dimensions (N, M) against a vector of size (M), the resulting vector will have size N which could be larger). The `tail` does the opposite - it returns us back to the vector size we originally started with. Here’s the code for that:

``````pub fn NeuralNet(
comptime Tag: mp.scalar.Tag,
comptime layers: usize
) type {

if (comptime layers == 0) {
@compileError("NeuralNet needs at least 1 layer.");
}

return struct {
const Self = @This();
const T = mp.scalar.Tag.asType(Tag);

body: [layers]FeedForward(Tag),
tail: FeedForward(Tag),
cleanup: bool,

pub fn init(G: *mp.Graph, m: usize, n: usize, cleanup: bool) Self {

var body: [layers]FeedForward(Tag) = undefined;

for (0..layers) |i| {
body[i] = FeedForward(Tag).init(G, m, m);
}
return .{
.body = body,
.tail = FeedForward(Tag).init(G, n, m),
.cleanup = cleanup,
};
}

pub fn forward(self: *Self, x: mp.types.LeafTensor(T, .inp)) mp.types.NodeTensor(T) {

for (0..layers) |i| {
h = self.body[i].forward(h);
}
return self.tail.forward(h);
}

pub fn reverse(self: *Self) void {

self.tail.reverse(self.cleanup);

var i: usize = layers;
while (i > 0) {
i -= 1;
self.body[i].reverse(self.cleanup);
}
}
};
``````

Again, nothing to it. Believe it or not, that’s all we need!

Let me explain one thing you may have picked up on above. First `detach`… what’s up with that? Detach helps us form sub-graphs. Each sub-graph can be independently freed, allowing us to clean-up as we go along. These freed values are caught by the caching allocator, meaning we can reuse them as we go backwards (we can offload values to the cpu and copy them back over going forward, giving us an additional massive reduction in memory usage). To see more about sub-graphs, checkout: Metaphor/src/examples/subgraphs.zig at main · andrewCodeDev/Metaphor · GitHub

In the main file, I did some testing with toy data to see how it all turns out - the results look good

First couple runs…

``````info: score - 3.8343
info: score - 3.7344
info: score - 3.6351
info: score - 3.5362
info: score - 3.4379
``````

Towards the end…

``````info: score - 0.1938
info: score - 0.1906
info: score - 0.1875
info: score - 0.1845
info: score - 0.1815
``````

Anyhow, if this kind of thing interests people here, I’ll make a note of picking out the best networks (or even take suggestions) and make publicly available versions of them using Metaphor that you can use in the comfort of your own Zig projects

Thanks! Let me know if you want me to build any models in particular!

13 Likes

Awesome progress! Don’t know how you got so deep in the weeds of deep learning to be able to kick off your own library from scratch, but I’m really glad to see someone making such devoted efforts to expand the Zig ecosystem in this direction. Seriously impressive!

4 Likes

Thanks @tensorush, honestly means a lot - especially coming from a developer that I seriously respect as much as yourself.

If you ever get the motivation, I’d love to have some extra sets of eyes on the code and I’d appreciate your review. Thanks again - really appreciate your support.

2 Likes

Hey @jaime, good to hear from ya

You are correct. You can see the definition of it here (line 85): Metaphor/src/graph.zig at main · andrewCodeDev/Metaphor · GitHub

They are what’s often referred to as a “handle” type. The graph contains all the information, but the handle knows where its information is in the graph (it’s just an index into contiguous arrays). It also knows what type of tensor was created, so it can directly pull out information from unions (it does this via type-deduction, so there’s no switch involved). They are very lightweight (same size as a slice).

1 Like

Omg that is a lot of work over there! I missed the latest commits to your Metaphor library. What a great job you are doing…I tried to catch up with the CUDA part but I didn’t have enough time due to another working project atm.
Keep the good work. Going through your code is part of my zig learning
Thanks for your quick clarification @AndrewCodeDev !

2 Likes