Zig 0.15.1 reader/writer: Don't make copies of @fieldParentPtr()-based interfaces

Here is a dedicated write-up for those not familiar with @fieldParentPtr-based interfaces, which is how the new reader/writer interface is implemented.

These two programs produce drastically different results:

bad_program.zig:

const std = @import("std");

pub fn main() !void {
    const my_struct: struct { hello: []const u8 } = .{ .hello = "world" };

    var buffer: [1024]u8 = undefined;
    var file_writer = std.fs.File.stdout().writer(&buffer);
    var writer_instance = file_writer.interface;
    try std.json.Stringify.value(
        my_struct,
        .{ .whitespace = .indent_2 },
        &writer_instance,
    );
    try file_writer.interface.flush();
}
$ zig run bad_program.zig

Oops! Looks like nothing is getting printed here!

good_program.zig:

const std = @import("std");

pub fn main() !void {
    const my_struct: struct { hello: []const u8 } = .{ .hello = "world" };

    var buffer: [1024]u8 = undefined;
    var file_writer = std.fs.File.stdout().writer(&buffer);
    const writer_ptr = &file_writer.interface;
    try std.json.Stringify.value(
        my_struct,
        .{ .whitespace = .indent_2 },
        writer_ptr,
    );
    try file_writer.interface.flush();
}
$ zig run good_program.zig
{
  "hello": "world"
}

Great, we got the expected output!

Whats the problem?

Short answer:

The vtable in the writer interface is implemented using @fieldParentPtr. When we made a copy of the interface with var writer_instance = file_writer.interface;, @fieldParentPointer uses a byte offset from our copy (called writer_instance) instead of a byte offset from the interface field within file_writer. This gives us bogus results / undefined behavior. To get the correct behavior, we must be careful to not make a copy of the interface field, and only directly reference the interface field when passing it around.

Long Answer:

Lets add some type annotations and comments to good_program.zig:

pub fn main() !void {
    const my_struct: struct { hello: []const u8 } = .{ .hello = "world" };

    var buffer: [1024]u8 = undefined;

    // get a File that corresponds to stdout.
    // On linux, "everything is a file", including stdout.
    // Writing to this file shows text on the screen.
    const stdout: std.fs.File = std.fs.File.stdout();

    // Its confusing that both of these types are called Writer.
    // Not much I can do about that.
    // What we really care about is how `writer_ptr` is implemented.
    var file_writer: std.fs.File.Writer = stdout.writer(&buffer);
    const writer_ptr: *std.Io.Writer = &file_writer.interface;

    try std.json.Stringify.value(
        my_struct,
        .{ .whitespace = .indent_2 },
        writer_ptr,
    );

    try file_writer.interface.flush();
}

Lets dive into how writer_ptr is implemented, after a few function calls, the .interface field of our file_writer is populated with the following function:

pub fn initInterface(buffer: []u8) std.Io.Writer {
    return .{
        .vtable = &.{
            .drain = drain,
            .sendFile = switch (builtin.zig_backend) {
                else => sendFile,
                .stage2_aarch64 => std.Io.Writer.unimplementedSendFile,
            },
        },
        .buffer = buffer,
    };
}

It constructs a vtable (a struct of function pointers), using the drain implementation in std.fs.File.Writer. drain is what actually writes bytes to our stdout file. Here is drain:

pub fn drain(io_w: *std.Io.Writer, data: []const []const u8, splat: usize) std.Io.Writer.Error!usize {
    const w: *Writer = @alignCast(@fieldParentPtr("interface", io_w));
    // lots more code after this...

The drain implementation is given only a pointer to the Writer interface (std.Io.Writer), but it needs access to the surounding std.fs.File.Writer.

@fieldParentPtr("interface", io_w) tells the compiler, "I promise this io_w I’m working with is the interface field of a std.fs.File.Writer, please compute a byte offset from the the address at io_w to get to a std.fs.File.Writer so I can manipulate it later in this function`.

Lets visualize this byte offset jump:

std.fs.File.Writer might be arranged like this in memory:

pub const Writer = struct {   // memory address 12
    file: File,               // memory address 12

    // some more fields...
    // ...

    interface: std.Io.Writer, // memory address 40
}

The actual memory locations / alignment details don’t matter, what’s important to understand is that the compiler knows the memory layout of the std.fs.File.Writer type. It knows that given the location of the interface field at address 40, it can walk 28 bytes up to reach where the file field is.

This is what @fieldParentPtr("interface", io_w) is doing. We are telling the compiler: “Walk 28 bytes up, I promise you can reinterpret the memory at that location as a std.fs.File.Writer.”.

The problem with bad_program.zig is that we are breaking our promise to the compiler, we are making a copy of the interface, such that our interface (std.Io.Writer) is no longer a member of a surrounding struct (std.fs.File.Writer).

Here is bad_program.zig again with comments:

const std = @import("std");

pub fn main() !void {
    const my_struct: struct { hello: []const u8 } = .{ .hello = "world" };

    var buffer: [1024]u8 = undefined;
    var file_writer = std.fs.File.stdout().writer(&buffer);

    // oops! I made a copy here.
    // Now I have two instances of `std.Io.Writer`.
    // I have an instance here (called "writer_instance"), and
    // another instance stored in file_writer.interface.
    var writer_instance: std.Io.Writer = file_writer.interface;


    try std.json.Stringify.value(
        my_struct,
        .{ .whitespace = .indent_2 },
        &writer_instance,
    );
    try file_writer.interface.flush();
}

The memory layout might look like this:

struct {                           // memory address 12
    file: File,                    // memory address 12

    // some more fields...
    // ...

    interface: std.Io.Writer,      // memory address 40
}

writer_instance: std.Io.Writer,    // memory address 42

So when std.json.Stringify.value() attempts to use writer_instance, it will compute a byte offset to some garbage location that shouldn’t be interpretted as a std.fs.File.Writer.

The solution is to not make copies of @fieldParentPointer-based interfaces. As we demonstrated in good_program.zig.

12 Likes

I sometimes wonder if Zig should have a struct attribute which forbids assignment (e.g. only taking the address is allowed) - but making this waterproof might be a challenge (e.g. taking a pointer and then copying the struct out of the pointer - e.g. this ‘no-copies-allows’ attribute would at least also need to carry over to pointers of such structs.

There’s probably a proposal for this already somewhere :slight_smile:

3 Likes
9 Likes

The more relevant proposal is, I think, adding a safety check to illegal pointer casts: add safety checks for pointer casting ¡ Issue #2414 ¡ ziglang/zig ¡ GitHub

As it is, pinning structs of the reader/writer interfaces would require other changes to coding practices, or changes to bringing return location optimization into the semantics of the language. Eg, std.fs.File.write() returns a std.fs.File.Writer, copying it through the return value. Indeed there are plenty of legal copies of Writer/Reader. Even just the interface can be copied in the case of std.Io.Writer.fixed().

3 Likes

I’m actually a little surprised that so many people stumble on this issue, because in fact, even if an interface does not use @fieldParentPtr(), directly copying the substructure inside a structure is obviously wrong behavior, because the substructure often records the state of the parent structure. Copying them out for use means that the parent structure will lose correct tracking of the state.

This mistake may be due to the influence of Allocator, but I believe fat pointer structures are relatively rare.

Compared to the pinned struct solution, I actually believe that most structures should not be copied. Instead, keyword designations should be added to basic types or pseudo-pointer structures designed to be suitable for copying.

Another point worth discussing is that copying pointers within a structure should be similarly restricted - this is often the source of dangling pointers and will render poisoning the structure itself ineffective. Only specified pointers within a structure should be allowed to be copied. However, it must be admitted that this is often inconvenient to use, damn, and it makes it a bit like Rust!

5 Likes

I totally got owned by this and didn’t realize; I restarted my project anyway, but hopefully I will get it right the second time around! Thank you so much for writing this up.

I had totally assumed that the interface had the pointer in it like Allocator and didn’t even know something like this could happen and also not trigger a crash! Not having used any of the @fieldParentPointer interfaces before, this definitely threw me. Makes sense in hindsight with interface being a child field of the struct :woman_facepalming:.

2 Likes

Most programmers have more experience with object-identity languages, as opposed to a value-identity language like Zig.

In languages like Python, a value which isn’t immutable is implicitly passed by pointer / reference, and if you want a copy, you have to do explicitly make a new object and fill it up. Which is the opposite of how things work in value-identity languages.

In any case, I’m glad @andrewrk linked to the pinned-attribute issue, because this is going to be a repeated stumbling block for using the new read/write interface until we gain the ability to specify that the location of a type in memory is an invariant property of that type instance, which cannot be moved somewhere else without annotating that the move is on purpose (casting, more or less).

I know some of us with Rust experience have, idk, call it Pin fatigue, but the underlying concept is admirably simple and it will be good for the language to gain it. In Rust, pinning connects to a bunch of gnarly concepts which Zig has declined to add to the type system; in my experience it’s the interaction of those things which makes Pin such a headache, not the fundamental idea of specifying that data must remain where it’s put in order to be useful.

5 Likes

I think a combination of both pinned structs and the safety checks for pointer casting is necessary.

The reason this example does not print anything (and doesn’t segfault), is because I never exceeded the buffer, so drain() was never called in my std.json.Stringify.value() call.

The pinned structs proposal will cause a compile error on the copy, and the safety checks for pointer casting will cause a runtime panic in debug mode if my std.json.Stringify.value() exceeded the capacity of the buffer and called drain.

Forgive me if it’s not pertinent, but wouldn’t it be simpler to consider it a problem of the standard library, rather than of the language?

If you use @fieldParentPtr yourself, you know where the troubles lie, but if it’s in std, you have to read std, which also changes across versions.

By reading I'm too dumb for Zig's new IO interface, I noticed that in std.net there is actually a interface method

            pub fn interface(r: *Reader) *Io.Reader {
                return &r.interface_state;
            }

By introducing the practice of getting interfaces by methods and not directly you’d bypass the problem.

Moreover, if std exposes methods to access the interfaces, then the implementation could be changed without breaking existing code, because the methods could be modified to adapt to the changes.

So in std.fs.File, interface could become io_reader, and the method would be

    pub fn interface(r: *Reader) *std.Io.Reader {
        return &r.io_reader;
    }

std wouldn’t even need to change its code, since these methods would be made for the users. Honestly I think it’s a bit hard to require everybody to study std to avoid pitfalls.

It’s basically what I suggested here without the pseudo-private field.

Again, forgive me if not pertinent.

3 Likes

I maintain that this issue isn’t related to @fieldParentPtr() itself, as attempting to copy a substructure is generally an error, regardless of whether the substructure uses @fieldParentPtr() to access the parent structure. A substructure can only be safely copied in the very limited case where all its members are pointers.

However, I agree that adding an inline interface() API to obtain an interface pointer is preferable to requiring users to directly manipulate internal structures. This is especially true given the inconsistent naming of different Writer interfaces— std.Io.Writer.Allocating, std.Io.Writer.Discarding, std.Io.Writer.Hashed, and std.Io.Writer.Hashing all have writer as their interface name, while std.fs.File.Writer has interface as its interface name. This inconsistent naming is indeed confusing.

The interface() API doesn’t always work. For example, std.Io.Writer.fixed() doesn’t obtain a parent structure. However, this could be achieved by simply adding a wrapper structure with no fields other than writer to align usage with other writer types.

In general, the API wrapper could be further redesigned to make its usage more consistent.

EDIT: Actually, allowing users to manipulate interfaces directly does have some advantages, namely flexibility. For example, a parent structure can contain any number of Writers. Of course, I don’t have any real-world examples demonstrating this flexibility is actually useful.
Regarding a consistent interface() interface, I believe the standard library’s role is simply to establish a “convention” to help users establish a consistent user experience. Therefore, even if interface() is designed, it doesn’t necessarily limit the expressive power of multiple Writers.

I’m in the opposite camp, most structs should be trivially copyable and passed around by value - such structs often don’t have a heap location in the first place, they often only exist on the stack. This also means that such structs should be self-contained, not have any references to external data and are fairly small.

The other kind of structs which lives on the heap, may have references to external data, needs special care when copying or moving opens up a whole can of worms and before you know you it you either invented C++ RAII or Rust’s strict ownership model. At the least I would make such special objects module-private and only hand out opaque handles to the outside world which don’t allow direct access from outside the module (e.g. GitHub - zig-gamedev/zpool: Generic pool & handle implementation for Zig.), the big downside is though that you lose the obj.method() syntax, unless such handles become a language feature.

4 Likes

It is true that saying “most structures should not be copied” is a bit arbitrary, but I think “most substructures within structures should not be copied” is basically correct.

2 Likes

I am in that camp too.

1 Like

Also arguable tbh - although it may depend on the direction of the copy (copying into the struct is more common than copying out).

It often is useful to compose structs from smaller nested structs so that those nested structs can be set with a single assignment. For instance in sokol-gfx I have a render pass struct which is composed from three nested structs: a ‘pass action’, an ‘attachments struct’ and a ‘swapchain struct’, and then you can have helper functions which return a pass action or a swapchain struct, e.g.:

sg.beginPass(.{
    .pass_action = getPassAction(),
    .swapchain = getSwapChain(),
});

…but even copying out maybe useful e.g. you might have another composite struct which is a pass-action and attachments, but not swapchain, and this could look like this:

const passActionAndAttachments = ...;
sg.beginPass(.{
    .pass_action = passActionAndAttachments.pass_action,
    .attachments = passActionAndAttachments.attachments,
});

…e.g. long story short, seeing structs not just as a flat collection of data items, but as ‘composable data building blocks’ which are built from nested structs that can be populated with a few struct assignments instead of copying lots of individual items allows to create (for lack of a better name) ‘data construction lego kits’) - think of functions that return little struct building blocks which are then used to populate bigger structs, and those bigger structs in turn are inputs to other functions. This allows to build very expressive yet small APIs, it’s quite similar to pluggable data pipes or ‘visual programming node editors’, just as code.

PS: what’s maybe notable is that this kind of struct usage is very ‘functional’, e.g. those structs don’t hold mutable state, they are only shortlived immutable data containers, their existence starts as a function result and ends when they are assigned to another struct or as direct input into a function.

I feel like when we discuss structures, we’re actually discussing something fundamentally different.
For example, sokol-gfx, in my understanding, is “data itself,” similar to an extension of basic types like u8 and i64, but providing richer semantics beyond the basic data.

Other times, structures are used as “self-contained records of behavioral state,” such as Writer, which records the state of a writer. When it’s copied, the state is effectively recorded in the copy. However, the old copy, unable to keep up with the state record, is effectively invalid. If the old copy isn’t destroyed, using it is an error. Using copies of its substructures is even more often an error: you update a small portion of its state, but the main structure can’t track the updated state.

This usage scenario is independent of whether it contains external pointers. As long as a completely self-contained structure involves recording behavioral state, copying parts of it is problematic.

Yeah, there seem to be two main scenarios:

  1. structs as long-lived, mutable state holders
  2. structs as short-lived, immutable ‘data pipes’

Type (1) usually lives on the heap and should not be copied, type (2) usually lives on the stack and its entire purpose in life is to be copied :slight_smile:

2 Likes

If type 1 does not involve variable length and cross-thread life cycle, I prefer to put it on the stack, because allocating memory on the heap always consumes more performance, and access is more likely to cause cache misses. But the use of type 1 structures should be based on pointers. When designing an API for a structure, the function signature of the API reflects their difference. If the type of self is designed as *Self, it is type 1, and if it is designed as Self, it is sometimes type 2, but often it is actually type 3: this structure is actually used purely as a pointer, with only a wrapper added.

1 Like

From a more theoretical point of view, one could say that a substructure which depends on its parent structure is an anti-pattern.
In many cases, it should not matter if copying or referencing it. In game programming, a structure for an object might consist of a position and a velocity, which of course can be copied independently, and in case of a 2D game, are small enough to not even waste a thought about using a pointer instead of copying.

But Zig is different in many ways .

I think once the compiler prevents this kind of issue, we don’t need to talk about this any longer.

In hindsight, the new I/O should have been introduced together with error prevention support from the compiler. Afterwards you’re always wiser, as we say in Germany.

Regarding this issue, I think it’s relatively unrelated to the “child structure’s dependence on the parent structure”—even if only the parent structure depends on the child structure, and the child structure’s updates have no dependency on the parent structure, copying the child structure is still an error—the parent structure loses track of the child structure’s state updates.
As for the scenarios you mentioned where it’s safe to copy parts of the parent structure, they all have one thing in common: they are read-only. You can copy the position and velocity of a parent structure and use them as parameters for other procedures. However, once you want to modify them, you can’t copy them out and modify them. You must modify the position and velocity in the parent structure, not a copy.
So, when you copy a member of a parent structure as var, alarm bells should ring; doing so is always wrong.

Taken together, perhaps a more comprehensive conclusion is this:

A const struct is safe to copy, any part of it can be copied.

Any part of a var struct can be copied as a const. Read-only access to a part of it is fine, as long as it doesn’t involve state changes.

No part of a var struct should be copied as a var. Modifying any part of a mutable struct should be done directly on the struct itself; modifying a copy of any part of it is likely an error.

If these rules are followed, errors are unlikely.
For substruct APIs that use their parent struct, regardless of whether they involve state changes, self must be a mutable pointer, not a const pointer. This, combined with the previously mentioned copy restrictions on var structs, ensures that usage of structs involving @fieldParentPtr() will not fail due to incorrect copies.

1 Like