Brief aside: I apologize if f-bombs go against Ziggit’s TOS, but given that is the name of the language I am targeting, I am not sure what else I can do except for refer to it as BF for the remainder of the post.
I’ve decided that for my next exercise I’d like to try to write a compiler for the BF programming language in Zig.
Ideally I’d like it to be a freestanding binary with no dependencies, but given I’ve never written something like this before, that may require more effort than I’m willing to invest.
I’m not interested in cross-platform support, just targeting either x86_64 linux, or potentially arm linux if the assembly is eaiser.
With that in mind, here are the options I’ve come up with, in descending order of preference:
- Hand-roll an ELF at the machinecode level.
- Translate the BF code into Zig, and then build the Zig file.
- Translate the BF code into assembly, and then build that (with something).
- Translate the BF code into LLVM IR.
#2 seems like a natural starting point, but given that I’d like to avoid runtime dependencies, if I can transition into #1 I think that would be ideal. But again, having never done this, it could be that it is way more complicated than I’m giving it credit for.
Any advice would be greatly appreciated! So far I’ve written a simple (naive) tokenizer:
pub fn char_to_token(src: u8) ?Token {
return switch (src) {
'<' => .left,
'>' => .right,
'+' => .inc,
'-' => .sub,
',' => .input,
'.' => .output,
'[' => .loop,
']' => .loopback,
else => null,
};
}
const Token = enum {
//
left,
right,
inc,
dec,
input,
output,
loop,
loopback,
};
const std = @import("std");
pub fn tokenize(src: []u8) std.ArrayList(Token) {
var tokens: std.ArrayList(Token) = undefined;
tokens.initCapacity(src.len / 2);
for (src) |char| {
if (char_to_token(char)) |token| {
tokens.addOne(token);
}
}
return tokens;
}