I am trying to understand common mistakes we can make when using the new std.Io.Reader/Writer interfaces, but I am struggling to grasp the behavior of several. I have this basic program:
Mistake 1: I make a copy of the interface field of a temporary value. This compiles (debug) and prints âan error occured: error.ReadFailedâ. I donât know why, I was expecting it to crash, like mistake 2.
Mistake 2: It first stores an std.fs.File.Reader, and then copies the interface field. Compiles and crashes. Makes sense to me, it tries to do @fieldParentPtr and end up with some bogus address
(mistake 3: itâs 3am and sleep deprived)
I know there are some lifetime and memory location issues, but I canât really confidently tell whatâs happening
anyway thank you for you attention! have a good night, good evening, good morning, good afternoon
Because the interface is a field of the implementation and fields are just an offset from a structs position, the functions passed to the interface vtable can subtract the field offset from the interface pointer and get a pointer to the implementation.
Thatâs all fine until you copy the interface out of the implementation, now the calculated pointer isnât actually pointing to the implementation but some arbitrary memory which it will then try to use.
What then happens is determined by two things, the location of memory thatâs being pointed to and the data in that location.
Welcome to undefined behaviour.
If you are lucky the location being pointed to is not assigned to your program, trying to access such memory will cause your program to be terminated, this is called a segfault (SIGSEGV).
If youâre unlucky the location being pointed to is assigned to your program, whatever happens next is determined by the data in that location, it will be treated as the implementation.
If you are lucky, the data in that location will lead to some operation crashing or causing an error.
If you are unlucky, your program will continue to run with bogus data.
If youâre lucky, the result of your program will be incomprehensible, leading to you trying to find the problem.
If youâre unlucky, the result of your program will look normal, leading to you not knowing there is a problem, resulting in all kinds of bad things happening in the future.
I think the underlying reason is very simple. It may be because the OP is used to the object-oriented language that all objects are pointers, and virtual table interfaces such as allocator, and forgets that directly modifying a copied part of the structure in order to modify the state of the structure is always wrong.
For example, if you want to compute the entire modification before applying it to the original field, you would make a copy, modify the copy, then write the new state back to the field.
In the specific case of intrusive interfaces, youâre correct that you shouldnât copy the interface out.
Though, Iâm sure there is some niche reason to do such sketchy things.
I agree that this usage exists, but on the one hand, it necessarily involves a writeback; without it, itâs definitely incorrect. On the other hand, using this approach requires the user to clearly understand what theyâre doing. If a user simply attempts to use an interface and discovers that the copied interface is mutable, and they didnât anticipate any writebacks during use, this is already an error, regardless of whether the interface internally expects itself to belong to another parent structure.
Whether a copied interface is mutable is an effective way for users, upon first encountering an interface, to understand whether theyâre using it correctly simply by looking at its API implementation, without having to examine the entire source code for every interface.
I was talking more generally, in the specific case of intrusive interfaces your advice is correct, I just wanted to ensure it isnât misunderstood to apply to cases where no intrusive interface is involved.
if an error occurs during the modification, I think most cases would be easier to recover/fall back if the state hasnât changed during the failed attempt.
Good point.
BTW this is quite similar to why the NOCOPY hint for IN OUT or OUT parameters in PL/SQL exists. Without the hint, in case of an exception, the original value is restored, with the hint, itâs not restored (I donât know the exact details), so technically a pointer can be used without the need to create a copy the data, giving better performance.
Regarding non-intrusive interfaces suitable for copying, my personal opinion is that different designs and usage methods for intrusive and non-intrusive interfaces from the outset can effectively avoid confusion between their usage.
For intrusive interfaces, operations are designed based on pointers:
pub fn interfaceMethod(interface: *Interface) void {
...
}
...
// use case
const interface = &super.interface;
interface.interfaceMethod();
interface.anotherMethod();
interface.thirdMethod();
For non-intrusive interfaces suitable for copy-modify-writeback, operations are designed based on chained copying. This design is based on the assumption that if an interface is suitable for independent copying and writeback, it is always more suitable for pass-by-value rather than pass-by-reference.
pub fn interfaceMethod(interface: Interface) Interface {
...
}
...
// Use case
var interface = super.interface;
interface = interface.interfaceMethod()ďź
interface = interface.anotherMethod().thirdMethod();
super.interface = interface;
However, I admit that for users accustomed to object-oriented languages, the chaining syntax doesnât clearly indicate that this is a non-intrusive interface, as many object-oriented languages ââalso use the chaining syntax to return an immutable pointer.
Furthermore, parameter reference optimization may break this usage.
Thank you for your replies, I was trying to understand the difference between the two observed behaviors (crashing vs error), but since this is UB, the actual behaviors are irrelevant!
I have been following this discussion with very little attention, but now I find myself needing to read data from stdin. Is the following correct, or should I not assign the interface field to the stdin constant? It seems to work fine, but I donât know if it will explode later on.
var buf: [1024]u8 = undefined;
var reader = std.fs.File.stdin().reader(&buf);
const stdin = &reader.interface;
const line = try stdin.takeDelimiterExclusive('\n');
If I am not supposed to assign the interface field, am I supposed to write try reader.interface.takeDelimiterExclusive('\n') every time I need to call a method on that reader?
Also, how would I write a loop to read stdin line by line? takeDelimiterExclusive() returns EndOfStream as a possible error, but not an optional â how do I use it ergonomically from a while loop?
while (true) read: {
const line = stdin.takeDelimiterExclusive('\n') catch |err| {
if (err == error.EndOfStream) break :read;
return err;
}
// Do stuff with the line
}
I think the main thing here is to understand the semantics of @fieldParentPtr and what itâs doing, and given that what would be known known at the callsite when @fieldParentPtr is called. Here, youâre correctly taking the address of the interface field instead of assigning a new value. So when the method is called, the receiver field will be already correctly set to the pointer value of the std.Io.Reader, for which @fieldParentPtr would be able to correctly look up off of. Given that the amount of info that takeDelimiterExclusive has will be the same regardless of whether or not itâs called your way or via reader.interface, there should be no harm in doing it this way. Just make sure that reader does not go out of scope.
PS: Donât forget you can always read the stdlib docs and follow the code which should be pretty helpful in understanding what you can and canât do. If you go down far enough youâll get to std.fs.Fileâs implementation of stream where you can see how @fieldParentPtr is used, as an example.
while (stdin.takeDelimiterExclusive('\n')) |line| {
// do stuff with line
} else |err| {
if (err != error.EndOfStream) return err;
// handle other errors if needed.
}
}