Understanding common std.Io.Reader/Writer mistakes (0.15.1)

I am trying to understand common mistakes we can make when using the new std.Io.Reader/Writer interfaces, but I am struggling to grasp the behavior of several. I have this basic program:

const std = @import("std");

pub fn main() !void {
    var stdin_buffer: [1024]u8 = undefined;
    // mistake 1:
    var stdin_ioreader: std.Io.Reader = std.fs.File.stdin().reader(&stdin_buffer).interface;
    //
    //mistake 2
    // const reader = std.fs.File.stdin().reader(&stdin_buffer);
    // var stdin_ioreader: std.Io.Reader = reader.interface;

    while (stdin_ioreader.takeByte()) |char| {
        if (char == '\n') continue;
        std.debug.print("you typed: {c}\n", .{char});
        if (char == 'q') break;
    } else |err| {
        std.debug.print("an error occured: {any}\n", .{err});
    }
}

Mistake 1: I make a copy of the interface field of a temporary value. This compiles (debug) and prints “an error occured: error.ReadFailed”. I don’t know why, I was expecting it to crash, like mistake 2.

Mistake 2: It first stores an std.fs.File.Reader, and then copies the interface field. Compiles and crashes. Makes sense to me, it tries to do @fieldParentPtr and end up with some bogus address

(mistake 3: it’s 3am and sleep deprived)

I know there are some lifetime and memory location issues, but I can’t really confidently tell what’s happening

anyway thank you for you attention! have a good night, good evening, good morning, good afternoon :slight_smile:

1 Like

Because the interface is a field of the implementation and fields are just an offset from a structs position, the functions passed to the interface vtable can subtract the field offset from the interface pointer and get a pointer to the implementation.

That’s all fine until you copy the interface out of the implementation, now the calculated pointer isn’t actually pointing to the implementation but some arbitrary memory which it will then try to use.

What then happens is determined by two things, the location of memory that’s being pointed to and the data in that location.

Welcome to undefined behaviour.

If you are lucky the location being pointed to is not assigned to your program, trying to access such memory will cause your program to be terminated, this is called a segfault (SIGSEGV).

If you’re unlucky the location being pointed to is assigned to your program, whatever happens next is determined by the data in that location, it will be treated as the implementation.

If you are lucky, the data in that location will lead to some operation crashing or causing an error.

If you are unlucky, your program will continue to run with bogus data.

If you’re lucky, the result of your program will be incomprehensible, leading to you trying to find the problem.

If you’re unlucky, the result of your program will look normal, leading to you not knowing there is a problem, resulting in all kinds of bad things happening in the future.

5 Likes

When you copy an interface and find that it needs to be var to use it, be careful: you are probably making a mistake and should use its pointer.

They clearly understand that they are mistakes, but don’t understand what the exact problems are.

1 Like

I think the underlying reason is very simple. It may be because the OP is used to the object-oriented language that all objects are pointers, and virtual table interfaces such as allocator, and forgets that directly modifying a copied part of the structure in order to modify the state of the structure is always wrong.

always, is an exaggeration.

For example, if you want to compute the entire modification before applying it to the original field, you would make a copy, modify the copy, then write the new state back to the field.

In the specific case of intrusive interfaces, you’re correct that you shouldn’t copy the interface out.

Though, I’m sure there is some niche reason to do such sketchy things.

I agree that this usage exists, but on the one hand, it necessarily involves a writeback; without it, it’s definitely incorrect. On the other hand, using this approach requires the user to clearly understand what they’re doing. If a user simply attempts to use an interface and discovers that the copied interface is mutable, and they didn’t anticipate any writebacks during use, this is already an error, regardless of whether the interface internally expects itself to belong to another parent structure.

Whether a copied interface is mutable is an effective way for users, upon first encountering an interface, to understand whether they’re using it correctly simply by looking at its API implementation, without having to examine the entire source code for every interface.

2 Likes

I was talking more generally, in the specific case of intrusive interfaces your advice is correct, I just wanted to ensure it isn’t misunderstood to apply to cases where no intrusive interface is involved.

if an error occurs during the modification, I think most cases would be easier to recover/fall back if the state hasn’t changed during the failed attempt.

Again speaking more generally.

1 Like

Good point.
BTW this is quite similar to why the NOCOPY hint for IN OUT or OUT parameters in PL/SQL exists. Without the hint, in case of an exception, the original value is restored, with the hint, it’s not restored (I don’t know the exact details), so technically a pointer can be used without the need to create a copy the data, giving better performance.

Regarding non-intrusive interfaces suitable for copying, my personal opinion is that different designs and usage methods for intrusive and non-intrusive interfaces from the outset can effectively avoid confusion between their usage.
For intrusive interfaces, operations are designed based on pointers:

pub fn interfaceMethod(interface: *Interface) void {
    ...
}
...
// use case
const interface = &super.interface;
interface.interfaceMethod();
interface.anotherMethod();
interface.thirdMethod();

For non-intrusive interfaces suitable for copy-modify-writeback, operations are designed based on chained copying. This design is based on the assumption that if an interface is suitable for independent copying and writeback, it is always more suitable for pass-by-value rather than pass-by-reference.

pub fn interfaceMethod(interface: Interface) Interface {
    ...
}
...
// Use case
var interface = super.interface;
interface = interface.interfaceMethod();
interface = interface.anotherMethod().thirdMethod();
super.interface = interface;

However, I admit that for users accustomed to object-oriented languages, the chaining syntax doesn’t clearly indicate that this is a non-intrusive interface, as many object-oriented languages ​​also use the chaining syntax to return an immutable pointer.
Furthermore, parameter reference optimization may break this usage.

1 Like

im willing to bet the details is a copy of some kind, maybe a little smarter in that it only copies what it needs to.

We’ll never know, because it’s closed source.

Thank you for your replies, I was trying to understand the difference between the two observed behaviors (crashing vs error), but since this is UB, the actual behaviors are irrelevant!

I’m very sorry. After reflection, I confirmed that my assertion that “copying const can guarantee correctness” was wrong.

I have been following this discussion with very little attention, but now I find myself needing to read data from stdin. Is the following correct, or should I not assign the interface field to the stdin constant? It seems to work fine, but I don’t know if it will explode later on.

var buf: [1024]u8 = undefined;
var reader = std.fs.File.stdin().reader(&buf);
const stdin = &reader.interface;
const line = try stdin.takeDelimiterExclusive('\n');

If I am not supposed to assign the interface field, am I supposed to write try reader.interface.takeDelimiterExclusive('\n') every time I need to call a method on that reader?

Also, how would I write a loop to read stdin line by line? takeDelimiterExclusive() returns EndOfStream as a possible error, but not an optional – how do I use it ergonomically from a while loop?

2 Likes

how do I use it ergonomically from a while loop?

Summary

Totally untested but taking a crack at it :wink:

while (true) read: {
  const line = stdin.takeDelimiterExclusive('\n') catch |err| {
    if (err == error.EndOfStream) break :read;
    return err;
  }

  // Do stuff with the line
}

Not the prettiest, but should do the job I think?

EDIT: @phatchman has the correct approach which takes advantage of while with error unions.

Also:

I think the main thing here is to understand the semantics of @fieldParentPtr and what it’s doing, and given that what would be known known at the callsite when @fieldParentPtr is called. Here, you’re correctly taking the address of the interface field instead of assigning a new value. So when the method is called, the receiver field will be already correctly set to the pointer value of the std.Io.Reader, for which @fieldParentPtr would be able to correctly look up off of. Given that the amount of info that takeDelimiterExclusive has will be the same regardless of whether or not it’s called your way or via reader.interface, there should be no harm in doing it this way. Just make sure that reader does not go out of scope.

PS: Don’t forget you can always read the stdlib docs and follow the code which should be pretty helpful in understanding what you can and can’t do. :slight_smile: If you go down far enough you’ll get to std.fs.File’s implementation of stream where you can see how @fieldParentPtr is used, as an example.

PPS: @kj4tmp has a great writeup that goes more into detail about this stuff: https://ziggit.dev/t/zig-0-15-1-reader-writer-dont-make-copies-of-fieldparentptr-based-interfaces/

1 Like

I think this is the most common pattern:

while (stdin.takeDelimiterExclusive('\n')) |line| {
        // do stuff with line
    } else |err| {
        if (err != error.EndOfStream) return err;
        // handle other errors if needed.
    }
}
2 Likes

Use the pointer here is ok.