I need to traverse the various versions of the syntax tree in my compiler and I’d like to do this in a generic way.
I have implemented this previously but I’m unsatisfied with my solution.
Is there a better way?
I need to traverse the various versions of the syntax tree in my compiler and I’d like to do this in a generic way.
I have implemented this previously but I’m unsatisfied with my solution.
Is there a better way?
In rust-analyzer, I’ve found that just a pull-based iterator over syntax nodes work best. Spitballing Zig version of the API:
var traversal = tree.descendants(root);
while (traversal.next()) |event| switch (event) {
.enter => |node| {
if (condition) traversal.skip_subtree();
}
.exit => |node| {
}
}
Can you elaborate a bit, please?
What does tree.descendants return for a given tree?
Would it return separate events for the lhs and rhs of a binary node? Where would op go then?
Would the root of an expression tree return the binary tree itself so that I would call descendants for lhs and rhs where does recursion happen then?
I guess I don’t understand how descendants flattens a tree into events.
What does
tree.descendantsreturn for a given tree?
An iterator of all the nodes that are descendants of the given root node. If the tree has parent pointers, such an iterator can be implemented in O(1) space:
If the tree lacks parent pointers, then the iterator itself would have to maintain a stack of the parents.
Though I imagine if the tree is laid out in memory as a flat array, the iterator could use the underlying slice? You could also do the pointer-reversal trick (Patterns for failure-free, bounded-space, and bounded-time programming ⁑ Dercuano)
It doesn’t really matter how you implement the iterator, the idea here is that Visitor=internal iterator + double dispatch, which is equivalent to external iterator + pattern matching, and the latter version doesn’t require call-site to use closures. As sketch (didn’t try to compile, probably buggy):
const std = @import("std");
const assert = std.debug.assert;
pub const Node = enum(u32) {
_,
pub const Data = union(enum) {
Int: u32,
Binary: struct {
op: enum { @"+", @"-", @"*", @"/" },
operands: [2]Node,
},
};
};
pub const Tree = struct {
root: Node,
nodes: []Node.Data,
depth: u32,
pub fn get(tree: *const Tree, node: Node) *const Node.Data {
return tree.nodes[@intFromEnum(node)];
}
pub fn descendats(tree: *const Tree, gpa: std.mem.Allocator, node: Node) !Iterator {
const result: Iterator = .{
.tree = tree,
.gpa = gpa,
.stack = .{},
.start = node,
.event = .{ .enter = node },
.done = false,
};
errdefer result.deinit(gpa);
try result.stack.ensureTotalCapacity(tree.depth);
return result;
}
const Iterator = struct {
tree: *const Tree,
stack: std.ArrayListUnmanaged(Node),
start: Node,
event: Event,
done: bool = false,
const Event = union(enum) {
enter: Node,
exit: Node,
};
pub fn next(it: *Iterator) ?Event {
if (it.done) return null;
const result = it.event;
it.event = switch (it.state) {
.enter => |node| switch (it.tree.get(node)) {
.Int => .{ .exit = node },
.Binary => |binary| blk: {
// NB: all allocation in init, iteration is infallible;
it.stack.appendAssumeCapacity(it.gpa, node);
break :blk .{ .enter = binary.operands[0] };
},
},
.exit => |node| blk: {
if (node == it.start) {
it.done = true;
break :blk undefined;
}
const parent = it.stack.getLast();
const parent_data = it.tree.get(parent).Binary;
if (parent_data.operands[0] == node) {
break :blk .{ .enter = parent_data.operands[1] };
} else {
assert(parent_data.operands[1] == node);
it.stack.pop();
break :blk .{ .exit = parent };
}
},
};
return result;
}
pub fn deinit(it: *Iterator, gpa: std.mem.Allocator) void {
it.stack.deinit(gpa);
it.* = undefined;
}
};
};
fn sum_up(gpa: std.mem.Allocator, tree: *Tree) u64 {
var result: u64 = 0;
var it = tree.descendats(gpa, tree.root);
defer it.deinit(gpa);
while (it.next) |event| switch (event) {
.enter => |node| switch (tree.get(node)) {
.Int => |value| result += value,
else => {},
},
else => {},
};
return result;
}
What you are proposing, essentially, is a zipper data structure for Zig.
It should be possible to write a generic one, I think, as long as nodes implement certain fields or methods.