Writing a Brainf**k compiler in Zig (solved)

sbrow · April 6, 2025, 1:21pm

Brief aside: I apologize if f-bombs go against Ziggit’s TOS, but given that is the name of the language I am targeting, I am not sure what else I can do except for refer to it as BF for the remainder of the post.

I’ve decided that for my next exercise I’d like to try to write a compiler for the BF programming language in Zig.

Ideally I’d like it to be a freestanding binary with no dependencies, but given I’ve never written something like this before, that may require more effort than I’m willing to invest.

I’m not interested in cross-platform support, just targeting either x86_64 linux, or potentially arm linux if the assembly is eaiser.

With that in mind, here are the options I’ve come up with, in descending order of preference:

Hand-roll an ELF at the machinecode level.
Translate the BF code into Zig, and then build the Zig file.
Translate the BF code into assembly, and then build that (with something).
Translate the BF code into LLVM IR.

#2 seems like a natural starting point, but given that I’d like to avoid runtime dependencies, if I can transition into #1 I think that would be ideal. But again, having never done this, it could be that it is way more complicated than I’m giving it credit for.

Any advice would be greatly appreciated! So far I’ve written a simple (naive) tokenizer:

pub fn char_to_token(src: u8) ?Token {
    return switch (src) {
        '<' => .left,
        '>' => .right,
        '+' => .inc,
        '-' => .sub,
        ',' => .input,
        '.' => .output,
        '[' => .loop,
        ']' => .loopback,
        else => null,
    };
}

const Token = enum {
    //
    left,
    right,
    inc,
    dec,
    input,
    output,
    loop,
    loopback,
};

const std = @import("std");

pub fn tokenize(src: []u8) std.ArrayList(Token) {
    var tokens: std.ArrayList(Token) = undefined;
    tokens.initCapacity(src.len / 2);

    for (src) |char| {
        if (char_to_token(char)) |token| {
            tokens.addOne(token);
        }
    }

    return tokens;
}

cryptocode · April 6, 2025, 3:35pm

Since it’s such a simple language, another cool approach is to JIT it straight to machine code, as Luuk did here:

github.com/Luukdegram/bfj

src/main.zig

master

const std = @import("std");
const os = std.os;
const mem = std.mem;
const Allocator = mem.Allocator;

const page_size = std.mem.page_size;
const Memory = []align(page_size) u8;

pub const Program = []const u8;

const program_memory: [4096]u8 = undefined;
const bracket_stack: std.ArrayList(u32);

/// Parses the given source code into 
pub fn parse(gpa: *Allocator, source: []const u8) error{OutOfMemory}!Program {
    var i: usize = 0;
    var list = std.ArrayList(u8).init(gpa);
    defer list.deinit();
    return while (true) : (i += 1) {
        if (i == source.len) return list.toOwnedSlice();

This file has been truncated. show original

Do NOT run this on macOS without adapting the machine code, because of the way syscalls numbers are mapped, this will forkbomb on macOS

sbrow · April 6, 2025, 4:04pm

Thanks for the suggestion, this is a great starting point! As this is purely an educational exercise for me, I’d like to take it a step further, but this should get me on the right track.

sbrow · April 6, 2025, 4:17pm

Alright, 30 seconds of googling found me a detailed article about this exact topic.

I need to stop posting problems to Ziggit until I’ve done my due diligence