Pretty printing the Zig AST to JSON?

I wonder if anyone’s written any code to pretty print the Zig AST (as parsed by f.ex. std.zig.Ast.parse()) into JSON format? Or actually, pretty printing to any human readable format would be welcome. :slight_smile:

I’d like to walk the AST in Python to play with some binding generation and using JSON as an intermediate format seems like a good way to get started.

If not, I guess I’ll write my own, just feels like something someone might’ve already done and shared.

4 Likes

I went ahead and wrote a quick proto. Copy&pasting source and what it outputs below in case it’s helpful to someone.

I should mention that including the Zig AST (abstract syntax tree) and a Zig syntax parser in the Zig standard library is a seriously smart thing to do. The Zig developers must be geniuses. :slight_smile:

The Stream comptime arg for AstEmitter is a little ugly, but I couldn’t figure out a more straightforward way to get the WriteStream’s Error type. Error types cannot be inferred in recursive functions, so explicit error types are required. The WriteStream is otherwise pretty handy for this type of thing!

// See https://mitchellh.com/zig/parser for information on AST

const std = @import("std");

fn AstEmitter(comptime Stream: type) type {
    return struct {
        ast: *std.zig.Ast,
        ws: Stream,

        const EmitError = Stream.Error;
        const Self = @This();

        pub fn init(ws: Stream, ast: *std.zig.Ast) @This() {
            return @This(){
                .ws = ws,
                .ast = ast,
            };
        }

        fn write(self: *Self, v: anytype) EmitError!void {
            try self.ws.write(v);
        }

        fn objectField(self: *Self, f: []const u8) EmitError!void {
            try self.ws.objectField(f);
        }

        fn beginObject(self: *Self) EmitError!void {
            try self.ws.beginObject();
        }

        fn endObject(self: *Self) EmitError!void {
            try self.ws.endObject();
        }

        fn beginArray(self: *Self) EmitError!void {
            try self.ws.beginArray();
        }

        fn endArray(self: *Self) EmitError!void {
            try self.ws.endArray();
        }

        pub fn emitRoot(self: *Self) EmitError!void {
            try self.beginArray();
            for (self.ast.rootDecls()) |idx| {
                try self.emitNode(idx);
            }
            try self.endArray();
        }

        fn visibToken(self: *Self, tok_idx: ?std.zig.Ast.TokenIndex) EmitError!void {
            try self.objectField("visib_token");
            if (tok_idx) |idx| {
                try self.emitToken(null, idx);
            } else {
                try self.write(null);
            }
        }

        fn fnProto(self: *Self, proto: std.zig.Ast.full.FnProto) EmitError!void {
            try self.emitToken("name_token", proto.ast.fn_token + 1);

            try self.objectField("params");
            try self.beginArray();
            var param_it = proto.iterate(self.ast);
            while (param_it.next()) |param| {
                try self.beginObject();
                try self.objectField("name_token");
                if (param.name_token) |nt| {
                    try self.emitToken(null, nt);
                } else {
                    try self.write(null);
                }

                try self.objectField("type_expr");
                try self.emitNode(param.type_expr);

                try self.endObject();
            }
            try self.endArray();

            try self.visibToken(proto.visib_token);
        }

        fn fnDecl(self: *Self, node_idx: std.zig.Ast.Node.Index) EmitError!void {
            var buffer: [1]std.zig.Ast.Node.Index = undefined;
            if (self.ast.fullFnProto(&buffer, node_idx)) |p| {
                try self.fnProto(p);
            }
        }

        fn typeNode(self: *Self, node_idx: std.zig.Ast.Node.Index) EmitError!void {
            try self.objectField("type_node");
            try self.write(@tagName(self.ast.nodes.items(.tag)[node_idx]));
        }

        fn simpleVarDecl(self: *Self, node_idx: std.zig.Ast.Node.Index) EmitError!void {
            const vd = self.ast.simpleVarDecl(node_idx);

            try self.objectField("type_node");
            if (vd.ast.type_node != 0) {
                try self.emitNode(vd.ast.type_node);
            } else {
                try self.write(null);
            }

            try self.visibToken(vd.visib_token);

            try self.objectField("init_node");
            try self.emitNode(vd.ast.init_node);
        }

        fn containerFieldInit(self: *Self, node_idx: std.zig.Ast.Node.Index) EmitError!void {
            const cfi = self.ast.containerFieldInit(node_idx);

            try self.emitToken("main_token", cfi.ast.main_token);

            try self.objectField("type_expr");
            try self.emitNode(cfi.ast.type_expr);

            try self.objectField("value_expr");
            if (cfi.ast.value_expr != 0) {
                try self.emitNode(cfi.ast.value_expr);
            } else {
                try self.write(null);
            }
        }

        fn containerDecl(self: *Self, node_idx: std.zig.Ast.Node.Index) EmitError!void {
            const cd = self.ast.containerDecl(node_idx);

            try self.emitToken("main_token", cd.ast.main_token);

            try self.objectField("members");
            try self.beginArray();
            for (cd.ast.members) |m| {
                try self.emitNode(m);
            }
            try self.endArray();
        }

        fn emitToken(self: *Self, field: ?[]const u8, token_idx: std.zig.Ast.TokenIndex) EmitError!void {
            if (field) |f| {
                try self.objectField(f);
            }
            try self.beginObject();
            try self.objectField("value");
            try self.write(self.ast.tokenSlice(token_idx));
            try self.endObject();
        }

        fn emitNode(self: *Self, node_idx: std.zig.Ast.Node.Index) EmitError!void {
            const n = self.ast.nodes.get(node_idx);
            //std.debug.print("AST node {d}: {any}\n", .{ node_idx, n });
            try self.beginObject();
            try self.objectField("tag");
            try self.write(@tagName(n.tag));
            switch (n.tag) {
                .fn_decl => {
                    try self.fnDecl(node_idx);
                },
                .simple_var_decl => {
                    try self.simpleVarDecl(node_idx);
                },
                .container_decl => {
                    try self.containerDecl(node_idx);
                },
                .container_field_init => {
                    try self.containerFieldInit(node_idx);
                },
                .identifier => {
                    try self.emitToken("main_token", n.main_token);
                },
                .number_literal => {
                    try self.emitToken("main_token", n.main_token);
                },
                else => {
                    // do nothing for unhandled ast nodes
                },
            }
            try self.endObject();
        }
    };
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();

    const src =
        \\const std = @import("std");
        \\
        \\pub const Player = struct {
        \\    x: i32,
        \\    score: i32 = 0,
        \\
        \\    pub fn init(self: *@This()) void {
        \\        self.score = 0;
        \\    }
        \\    pub fn print(self: @This()) void {
        \\        std.debug.print("player: {}\n", .{self.score});
        \\    }
        \\};
    ;

    var ast = std.zig.Ast.parse(gpa.allocator(), src, .zig) catch {
        std.debug.print("failed to parse source file.", .{});
        return;
    };
    defer ast.deinit(gpa.allocator());

    const stdout = std.io.getStdOut().writer();
    var ws = std.json.writeStream(stdout, .{ .whitespace = .indent_2 });
    defer ws.deinit();

    var emit = AstEmitter(@TypeOf(ws)).init(ws, &ast);
    try emit.emitRoot();
}

If you run it, it outputs something like this:

[
  {
    "tag": "simple_var_decl",
    "type_node": null,
    "visib_token": null,
    "init_node": {
      "tag": "builtin_call_two"
    }
  },
  {
    "tag": "simple_var_decl",
    "type_node": null,
    "visib_token": {
      "value": "pub"
    },
    "init_node": {
      "tag": "container_decl",
      "main_token": {
        "value": "struct"
      },
      "members": [
        {
          "tag": "container_field_init",
          "main_token": {
            "value": "x"
          },
          "type_expr": {
            "tag": "identifier",
            "main_token": {
              "value": "i32"
            }
          },
          "value_expr": null
        },
        {
          "tag": "container_field_init",
          "main_token": {
            "value": "score"
          },
          "type_expr": {
            "tag": "identifier",
            "main_token": {
              "value": "i32"
            }
          },
          "value_expr": {
            "tag": "number_literal",
            "main_token": {
              "value": "0"
            }
          }
        },
        {
          "tag": "fn_decl",
          "name_token": {
            "value": "init"
          },
          "params": [
            {
              "name_token": {
                "value": "self"
              },
              "type_expr": {
                "tag": "ptr_type_aligned"
              }
            }
          ],
          "visib_token": {
            "value": "pub"
          }
        },
        {
          "tag": "fn_decl",
          "name_token": {
            "value": "print"
          },
          "params": [
            {
              "name_token": {
                "value": "self"
              },
              "type_expr": {
                "tag": "builtin_call_two"
              }
            }
          ],
          "visib_token": {
            "value": "pub"
          }
        }
      ]
    }
  }
]```
11 Likes

I wonder, would it makes sense to include something like this to std.ast? For new contributors, understanding what AST actually looks like is tremendously important. I am frankly surprised that there isn’t such a function already, I won’t be able to work with compiler without a way to visualise AST :rofl:

4 Likes

I’m new to Zig and wanting to use it with Arduino. Pretty printing the Zig AST is exactly what I was looking for as one of the things I’d like to see is removing the LLVM dependency in this area. This post is very useful to me @nurpax ! Thank you!

1 Like

I don’t know for a fact that there isn’t an AST pretty printer, maybe I just haven’t found it. :slight_smile:

FWIW, I need the AST for generating Lua bindings glue code from Zig APIs to be used with Ziglua. I generate JSON and then walk the tree in Python to produce Zig code and potentially type declarations for Luau. While I don’t intend to release a general tool that claims to support a large spectrum of use-cases, I’m anyway going to OSS this code, and the JSON part will be included.

I found this blog very useful for understanding the std.ast module. It’s a little easier than in the blog post, because you actually don’t need the “decompress” the node structure yourself, rather almost every tricky node type has a related function that can be used to extract all the fields into an easy to use structure.

5 Likes

+1 to this being a highly desired feature. I’m starting to dip my toes in and debugging an issue where somewhere along the translate-c chain of:
translate C code into LLVM AST -> translate LLVM AST into Zig AST -> render Zig AST into Zig code

Some undesirable behavior is happening, and it would be super nice to examine a JSON dump of the Zig AST!

Is there the reverse function? How to get the JSON back into Zig?