Comptime segfault on recursive types

Hi everyone! I’m maintaining pretty lib that recursively pretty prints arbitrary data structures. Someone reported a case (Fails to compile if `max_depth` option is 0 and printed type is recursive · Issue #4 · timfayz/pretty · GitHub) where the printer fails to traverse the recursive type like this:

const A = union(enum) {
    int: u8,
    a: *const @This(),
};
try pretty.print(alloc, A{ .int = 10 }, .{ .max_depth = 0 }); // max_depth controls the recursion depth, 0 = no limit 

Running the above gives:

zsh: segmentation fault  zig run main.zig

I was trying to reduce the noise behind full pretty logic into a slim reproducible example but when I do so, it begins to work just fine :). Anyway, here is the “scheme”:

const std = @import("std");

fn traverse(val: anytype) void {
    const T = @TypeOf(val);
    switch (@typeInfo(T)) {
        .Union => {
            if (@typeInfo(T).Union.tag_type) |Tag| {
                switch (@as(Tag, val)) {
                    inline else => |tag| {
                        traverse(@field(val, @tagName(tag)));
                    },
                }
            }
        },
        .Pointer => {
            switch (@typeInfo(T).Pointer.size) {
                .One => {
                    traverse(val.*); // this is what in the original code leads to segfault (commenting it out makes the failure disappear)
                },
                else => |load| std.log.err("{}", .{load}),
            }
        },
        else => |load| std.log.err("{}", .{load}),
    }
}

pub fn main() !void {
    const Union = union(enum) {
        int: u8,
        self: *const @This(),
    };
    const val = Union{ .int = 10 };
    traverse(val);
}

I put a comment above on the line which, in the original code, leads to segfault. Here is the link to that line in source: pretty/src/pretty.zig at 794e0408c39781a8fdf4b3f070e786526c7f3eca · timfayz/pretty · GitHub

Zig version: 0.12.0.

My best guess is that it happens because zig, during comptime, attempts to unroll the chain of calls to set up proper types for anytype arguments in every traverse(val: anytype) call. I think so because even if I do @comptimeLog(1) at the very first line of the pretty.print(), zig cannot reach it out because it does some other analysis first and crashes. I’m kinda stuck in the loop. Any ideas where can I start?

I may be misunderstanding something here, but it seems like we shouldn’t be following that pointer in this example?

try pretty.print(alloc, A{ .int = 10 }, .{ .max_depth = 0 });

In this case, you’ve built a union with the integer member initialized. Why are we trying to dereference a: *const @This() in this case?

In your union branch, I’m seeing this:

.Union => {
    // Tagged union
    if (@typeInfo(T).Union.tag_type) |tag_type| {
        try self.writeBracket(.Open, c); // inline mode only
        self.idx = 0;
        switch (@as(tag_type, val)) {
            inline else => |tag| {
                try self.traverse(@field(val, @tagName(tag)), c.setField(@tagName(tag)));
            },
        }
        try self.writeBracket(.Closed, c); // inline mode only
        return;
    }

I’m taking a stab here, but it seems like we’re getting into an invalid branch. I’d probably rewrite it with std.meta.activeTag.

I’d throw in some print statements to see where you’re getting into and decide if those are paths you want to go down. That would certainly segfault in this example if they aren’t. Again, I may be misunderstanding something about your design but that’s where I’d start.

3 Likes

I’m starting to think that’s what’s going on here. In godbolt, I can remove the logic of your branches and just see what branches we’re getting into for your minimal example:

const std = @import("std");

fn traverse(comptime val: anytype) void {
    const T = @TypeOf(val);
    switch (@typeInfo(T)) {
        .Union => {
            if (@typeInfo(T).Union.tag_type) |Tag| {
                switch (@as(Tag, val)) {
                    inline else => |tag| {
                        traverse(@field(val, @tagName(tag)));
                    },
                }
            }
        },
        .Pointer => {
            @compileError("In pointer branch");
        },
        .Int => {
            @compileLog("In integer branch");
        },
        else => {}
    }
}

For the following code, we hit the integer branch only:

pub fn main() !void {
    const Union = union(enum) {
        int: u8,
        self: *const @This(),
    };
    const val = Union{ .int = 10 };
    comptime traverse(val);
}
2 Likes

I think this can be written as:

switch (val) {
    inline else => |field_val| traverse(field_val),
}
3 Likes

Right, thank you! I’ll take the @Sze approach:

...
// Before:
// switch (@as(Tag, val)) {
//     inline else => |tag| {
//         try self.traverse(@field(val, @tagName(tag)), c.setField(@tagName(tag)));
//     },
// }

// After:
switch (val) {
    inline else => |v, tag| { 
        try self.traverse(v, c.setField(@tagName(tag)));
    },
}
...

Unfortunately, no prints or comtime logging available, zig just immediately crashes. I think I run out of options :slight_smile: . This is certainly a comptime recursion problem, which is related to “static analysis”. As soon as I increase max_depth from 100 to 1000, zig segfaults. Or as soon as I remove dereferencing from the .Pointer branch, things go back to normal. It feels like zig can’t stop dereferencing pointer during analysis. What else could it be? The union value is certainly non-pointer (we can see it), and I don’t see any issues in .Union branch that dispatches tagged unions into appropriate path.

Wouldn’t it be better to handle recursion properly? Given that there is a practical limit to far deep the tree can be–namely the wide of a typical screen–it’s not unreasonable to use an array on the stack to keep track of pointers already seen.

Adding a pointer to this struct to traverse() might make the comptime problem go away since the function is runtime only now.

Could you please be more specific here? (I’m not picking; I may just be misunderstanding)

I totally agree here wrt keeping track of pointers (even though I’ve never done it in my practice, wanted to do exactly what you’ve suggested). The point is, however, I don’t think it would help. I assume pointers are runtime things already and even if I put some comparison before going into the depths of traversing, zig will still crash:

.Pointer => {
    switch (@typeInfo(T).Pointer.size) {
        .One => {
             ...
            // [Option] Follow the pointer
            if (opt.ptr_deref) {
                _ = &val; // afaik this is a standard way to force val runtime
                if (val == @as(T, @ptrFromInt(1024))) // phony comparison
                    try self.traverse(val.*, c); // zig still crashes (tested)

@chung-leong I’m open to hear comments as I may misinterpret things in complete different/dumb way.

I think an important note here is that max_depth applies to comptime depth of recursion, not a runtime.

What I meant is something like this:

pub fn print(value: anytype) void {
    const Traverser = struct {
        pointers: [80]*anyopaque = undefined,
        pointer_count: usize = 0,
        current_depth: usize,

        fn find(self: *@This(), ptr: *anyopaque) bool {
            return for (0..self.pointer_count) |index| {
                if (self.pointers[index] == ptr) {
                    break true;
                }
            } else false;
        }

        fn push(self: *@This(), ptr: *anyopaque) bool {
            if (self.pointer_count < self.pointers.len) {
                self.pointers[self.pointer_count] = ptr;
                self.pointer_count += 1;
                return true;
            } else {
                return false;
            }
        }

        fn pop(self: *@This()) void {
            self.pointer_count -= 1;
        }

        fn print(self: *@This(), val: anytype) void {
            const T = @TypeOf(val);
            switch (@typeInfo(T)) {
                .Pointer => |pt| {
                    switch (pt.size) {
                        .One => {
                            if (self.find(@ptrCast(val))) {
                                // print "recursion" or something
                            } else if (self.push(@ptrCast(val))) {
                                self.print(val.*);
                                self.pop();
                            } else {
                                // print ellipsis or something
                            }
                        },
                        // ...other cases...
                    }
                },
                // ...other cases...
            }
        }
    };
    var traverser: Traverser = .{};
    traverser.print(value);
}

The pointer to a var makes it clear to the compiler that it’s dealing with a runtime function.

1 Like