Why does using reader.interface as a variable cause a segfault?

Hi.

I have recently noticed something.

here is my code snippet:

pub var writer: std.Io.File.Writer = undefined;
pub var reader: std.Io.File.Reader = undefined;
pub var allocator: std.mem.Allocator = undefined;
pub fn main(init: std.process.Init) !void {
    const io = init.io;

    const args = init.minimal.args;

    if (args.vector.len == 0) {
        std.debug.print("yay!", .{});
    }

    allocator = init.gpa;

    const writerbuffer: []u8 = try allocator.alloc(u8, 4096);
    defer allocator.free(writerbuffer);

    const readerbuffer: []u8 = try allocator.alloc(u8, 1);
    defer allocator.free(readerbuffer);

    writer = std.Io.File.stdout().writer(io, writerbuffer);

    try writer.file.writeStreamingAll(io, "Hello world!\n");

    reader = std.Io.File.stdin().reader(io, readerbuffer);

    const w = writer.interface;

    var r = reader.interface;

    r.takeByte();
...

If I compile and run this I get a segfault:

Hello world!
Segmentation fault at address 0x1289f71
???:?:?: 0x1289f71 in ??? (./zig-out/bin/stevi)
Unwind error at address `./zig-out/bin/stevi:0x1289f71` (unwind info unavailable), remaining frames may be incorrect
/home/simon/Downloads/zig-x86_64-linux-0.16.0/lib/std/Io/File.zig:475:27: 0x10390f7 in readStreaming (std.zig)
    return (try io.operate(.{ .file_read_streaming = .{
                          ^
/home/simon/Downloads/zig-x86_64-linux-0.16.0/lib/std/Io/File/Reader.zig:283:35: 0x1038bfb in readVecStreaming (std.zig)
    const n = r.file.readStreaming(io, dest) catch |err| switch (err) {
                                  ^
/home/simon/Downloads/zig-x86_64-linux-0.16.0/lib/std/Io/File/Reader.zig:236:65: 0x10376a0 in readVec (std.zig)
        .streaming, .streaming_simple => return readVecStreaming(r, data),
                                                                ^
/home/simon/Downloads/zig-x86_64-linux-0.16.0/lib/std/Io/Reader.zig:1124:56: 0x1031fdd in fillUnbuffered (std.zig)
    while (r.end < r.seek + n) _ = try r.vtable.readVec(r, &bufs);
                                                       ^
/home/simon/Downloads/zig-x86_64-linux-0.16.0/lib/std/Io/Reader.zig:1110:26: 0x1031d28 in fill (std.zig)
    return fillUnbuffered(r, n);
                         ^
/home/simon/Downloads/zig-x86_64-linux-0.16.0/lib/std/Io/Reader.zig:1150:13: 0x107ec30 in peekByte (std.zig)
    try fill(r, 1);
            ^
/home/simon/Downloads/zig-x86_64-linux-0.16.0/lib/std/Io/Reader.zig:1158:32: 0x107e99f in takeByte (std.zig)
    const result = try peekByte(r);
                               ^
/home/simon/programming/stevi/main.zig:41:23: 0x11d25e7 in main (main.zig)
    _ = try r.takeByte();
                      ^
/home/simon/Downloads/zig-x86_64-linux-0.16.0/lib/std/start.zig:737:30: 0x11d32be in callMain (std.zig)
    return wrapMain(root.main(.{
                             ^
/home/simon/Downloads/zig-x86_64-linux-0.16.0/lib/std/start.zig:190:5: 0x11d20e1 in _start (std.zig)
    asm volatile (switch (native_arch) {
    ^

But if I don’t use var r = reader.interface; and just run reader.interface.takeByte() directly, it doesn’t cause a segfault.

Any clue?

This better fits in the Explain category


Because the reader and writer interface don’t contain a pointer to implementation data, instead they are embedded as fields into the implementation that then uses @fieldParentPtr to convert the pointer to the interface into a pointer to the implementation state.

Currently, zig does not, and cannot, safety check that the interface pointer is truly a pointer to the expected field in the implementation, or if it is even in the expected type at all.

So when you copy the interface into a variable and use it, the implementations functions don’t know that, they convert the pointer which results in treating arbitrary data on the stack as if it were the implementation state. This is illegal and undefined behaviour

This style of interface is called an “intrusive interface”, it is quite common in c, less so in zig, but it used to be more common. Another example in zig std is the build system steps, which are heap allocated so you encounter this foot gun less, and in a quite old version of zig Allocator was also intrusive.

the other common interface pattern contains a ptr to the implementation, examples include Allocator and Io, they are effectively fat pointers.

This foot gun is an obvious downside, zig has plans to be able to detect this at runtime among other ptr cast safety features.

But intrusive interfaces have a big advantage as well! When the interface has state, like the readers/writers buffer, that the implementation would need to interact with if it were to extend the interface, then intrusive interfaces are the best!

Take File.Reader as an example, it supports seeking which by nature requires removing buffered data, otherwise the next data would not come from the position you just moved to! Since File.Reader has the interface state directly accessible as just a field this is trivial. Try to think how you would do that using the fat ptr interface style like Allocator does, any solution would be very convoluted and error-prone.

10 Likes

To add if vulpesx amazing explanation would seems as too technical and not answering the question. You have to use std.Io.Reader only through pointer:

const r = &reader.interface;

Another gotcha: generally you can’t keep these pointers around. Original reader and writer implementations could be relocated and pointers become invalid.

EDIT: changed var to const

1 Like
3 Likes

make it const not var

3 Likes