Allocating AST nodes

joelreymont · January 12, 2025, 5:13am

I wrote a source-to-source compiler in Zig and may add a VM / JIT.

AST trees are self-referential and nodes often need allocation (boxing).

Consider the following snippet…

pub const Expr = union(enum) {
    unary: Unary,
    id: []const u8,

    pub const Unary = struct {
        op: UnaryOp,
        rhs: *Expr,
    };

    pub const UnaryOp = enum {
        @"!",
        @"~",
    };
};

fn parsePrefixExpr(self: *Self) Error!AST.Expr {
    var p = self.parser;
    const token = p.peekToken();
    const tag: AST.Expr.UnaryOp = switch (token.tag) {
        .bang => .@"!",
        .eof => try p.fail(AST.Expr.UnaryOp, .unexpected),
        else => return self.parsePrimaryExpr(),
    };
    _ = p.popToken();
    const expr = try self.parsePrefixExpr();
    const rhs = try p.alloc.create(AST.Expr);
    rhs.* = expr;
    return .{ .unary = .{ .op = tag, .rhs = rhs } };
}

parsePrefixExpr returns Expr but Expr.Unary.rhs is a pointer. Is this the most efficient way to create the Expr.Unary node?

    const expr = try self.parsePrefixExpr();
    const rhs = try p.alloc.create(AST.Expr);
    rhs.* = expr;
``

P.S. Yes, I know that the Zig compiler does it very differently but I'm not ready to go down that path just yet.

squeek502 · January 12, 2025, 10:24am

Having a mix of stack and heap allocated AST nodes sounds like it’d be complicated. When I wrote a parser, I just went with heap allocating everything. Also worth noting that an AST is a good candidate for an arena allocator, since all the nodes will have the same lifetime and using an arena will make all that heap allocation much faster.

I, too, didn’t do any data-oriented design for my AST, so it might be worth looking at as a reference if you’re interested:

joelreymont · January 12, 2025, 10:43am

Allocating AST nodes has been working fine for me but…

I need to generate C++ in my transpiler. I could put together the output syntax tree by hand but this gets old quickly.

Yes, I can just format C++ using std.fmt but the output code won’t look nice and I would have to run clang-format or similar to get it into shape.

What I’d really like to do is comptime templating where I parse the C++ code at compile time, convert that into AST and plug the missing bits into the AST at runtime. Then I can pretty-print the AST and be happy!

My concern is that the parser needs to run without allocation since allocating at comptime seems to be disallowed at the moment. This is why I was looking at flattening the AST into array the way the Zig parser does it.

squeek502 · January 12, 2025, 11:04am

I’m not totally sure I understand what you’re trying to do, so I’ll give a general answer (I have a feeling there might be some sort of misunderstanding of comptime involved here, but I can’t tell).

You can get around the lack of heap allocation at comptime by either precomputing the size of the array you’ll need, using an upper bound of the array length you need, or concatenating arrays together.

However, doing anything as intensive as you’re talking about will hit comptime currently being absurdly slow very quickly.

If you really want to do something intensive at compile-time, you might end up needing to write something that generates Zig code, either via a custom build step or just a program that outputs a .zig file that gets built and run during the build process. Here’s an example of a custom build step: