Zig: Why does removing the type field from a vtable-based struct changes the behaviour

Hello,

I’ve noticed this difference of behaviour and would like to get explanations from seasoned Zig devs.

With the following code

const std = @import("std");

pub const Scheduler = struct {
    type: type,
    ptr: *anyopaque,
    vtab: *const struct {
        start: *const fn (ptr: *anyopaque) void,
        stop: *const fn (ptr: *anyopaque) void,
    },

    pub fn start(self: Scheduler) void {
        self.vtab.start(self.ptr);
    }

    pub fn stop(self: Scheduler) void {
        self.vtab.stop(self.ptr);
    }

    fn init(obj: anytype) Scheduler {
        const object_type = @TypeOf(obj);

        const impl = struct {
            fn start(ptr: *anyopaque) void {
                const self: *object_type = @ptrCast(@alignCast(ptr));
                self.start();
            }
            fn stop(ptr: *anyopaque) void {
                const self: *object_type = @ptrCast(@alignCast(ptr));
                self.stop();
            }
        };
        return .{
            .type = object_type,
            .ptr = @constCast(&obj),
            .vtab = &.{
                .start = impl.start,
                .stop = impl.stop,
            },
        };
    }

    pub fn get_default() Scheduler {
        const implementation = DefaultScheduler.init();
        return Scheduler.init(implementation);
    }
};

const DefaultScheduler = struct {
    name: []const u8,

    pub fn start(self: DefaultScheduler) void {
        std.debug.print("Hidden start implementation: {s} \n", .{self.name});
    }

    pub fn stop(self: DefaultScheduler) void {
        const n = self.name;
        std.debug.print("Hidden stop implementation: {s} \n", .{n});
    }

    fn init() DefaultScheduler {
        return .{ .name = "DefaultScheduler" };
    }
};

I have the following output

Hidden start implementation: DefaultScheduler 
Hidden stop implementation: DefaultScheduler 

However, if I remove the field type in Scheduler struct; as shown below

const std = @import("std");

pub const Scheduler = struct {
    ptr: *anyopaque,
    vtab: *const struct {
        start: *const fn (ptr: *anyopaque) void,
        stop: *const fn (ptr: *anyopaque) void,
    },

    pub fn start(self: Scheduler) void {
        self.vtab.start(self.ptr);
    }

    pub fn stop(self: Scheduler) void {
        self.vtab.stop(self.ptr);
    }

    fn init(obj: anytype) Scheduler {
        const object_type = @TypeOf(obj);

        const impl = struct {
            fn start(ptr: *anyopaque) void {
                const self: *object_type = @ptrCast(@alignCast(ptr));
                self.start();
            }
            fn stop(ptr: *anyopaque) void {
                const self: *object_type = @ptrCast(@alignCast(ptr));
                self.stop();
            }
        };
        return .{
            .ptr = @constCast(&obj),
            .vtab = &.{
                .start = impl.start,
                .stop = impl.stop,
            },
        };
    }

    pub fn get_default() Scheduler {
        const implementation = DefaultScheduler.init();
        return Scheduler.init(implementation);
    }
};

const DefaultScheduler = struct {
    name: []const u8,

    pub fn start(self: DefaultScheduler) void {
        std.debug.print("Hidden start implementation: {s} \n", .{self.name});
    }

    pub fn stop(self: DefaultScheduler) void {
        const n = self.name;
        std.debug.print("Hidden stop implementation: {s} \n", .{n});
    }

    fn init() DefaultScheduler {
        return .{ .name = "DefaultScheduler" };
    }
};

I’ve the following behaviour in the console

Hidden start implementation: Hidden stop implementation: 
1 Like

Hi @LoicCode, welcome to Ziggit!

I tried your second code example and depending on the Zig version I get different results.
On 0.15.2, I get a segfault. On 0.16.0-dev.1976+8e091047b it runs fine.
I think you are running into some illegal behavior that just "happenss’ to work fine depending on struct sizes and the stack.
That being said, I don’t think you want to go down this route.

This is probably part that is causing the issue. You stack allocate an implementation and the return a value that points at the stack. This becomes invalid memory after get_default returns.

Typically (in the std lib anyway), you want your concrete implementation to create the interface, not the other way around. So you want your DefaultScheduler to have a method that returns a Scheduler interface, not the other way around. The Scheduler interface needs an internal pointer, and you can’t get it from a function. You could, however, use a global constant for the scheduler for the default. This has it’s own caveats for use (may not be thread safe).

Here’s how I would try to rewrite it:

const std = @import("std");

pub const Scheduler = struct {
    ptr: *anyopaque,
    vtab: *const struct {
        start: *const fn (ptr: *anyopaque) void,
        stop: *const fn (ptr: *anyopaque) void,
    },
 

    pub fn start(self: Scheduler) void {
        self.vtab.start(self.ptr);
    }

    pub fn stop(self: Scheduler) void {
        self.vtab.stop(self.ptr);
    }
};

const DefaultScheduler = struct {
    name: []const u8,

    pub fn start(ptr: *anyopaque) void {
        const self: *DefaultScheduler = @ptrCast(@alignCast(ptr));
        std.debug.print("Hidden start implementation: {s} \n", .{self.name});
    }

    pub fn stop(ptr: *anyopaque) void {
        const self: *DefaultScheduler = @ptrCast(@alignCast(ptr));
        const n = self.name;
        std.debug.print("Hidden stop implementation: {s} \n", .{n});
    }

    fn init() DefaultScheduler {
        return .{ .name = "DefaultScheduler" };
    }

    pub fn scheduler(self: *DefaultScheduler) Scheduler {
        return .{
            .ptr = self,
            .vtab = &.{
                .start = start,
                .stop = stop,
            },
        };
    }
};

pub fn main() void {
    var default = DefaultScheduler.init();
    const sched = default.scheduler();
    sched.start();
    sched.stop();
}

You’ll notice that the Default scheduler knows how to conform to the scheduler interface. You’ll see this pattern in the allocators (DebugAllocator.allocator) and readers and writers (File.reader).

5 Likes

I want to add that type is only allowed at comptime, if you somehow get it at runtime it is a bug and should be reported if not already.

You shouldn’t need to know the type that is implementing your interface, but if you do for some reason, you might prefer using a tagged union instead or some other system to associate values with types.

2 Likes

Thank you for sharing your input and explanations :slight_smile:

But in my scenario, how would be possible to have the DefaultScheduler as a hidden implementation?

I did not do anything special than running this code using zig 0.15.2 on macOS.

I am not sure if it is a valid bug

if you used var it is, var is always runtime unless otherwise forced with the comptime keyword.

1 Like

Something like this maybe?

const std = @import("std");

pub const Scheduler = struct {
    ptr: *anyopaque,
    vtab: *const struct {
        start: *const fn (ptr: *anyopaque) void,
        stop: *const fn (ptr: *anyopaque) void,
    },

    pub fn start(self: Scheduler) void {
        self.vtab.start(self.ptr);
    }

    pub fn stop(self: Scheduler) void {
        self.vtab.stop(self.ptr);
    }

    fn init(obj: anytype) Scheduler {
        const object_type = @TypeOf(obj);

        const impl = struct {
            fn start(ptr: *anyopaque) void {
                const self: *object_type = @ptrCast(@alignCast(ptr));
                self.start();
            }
            fn stop(ptr: *anyopaque) void {
                const self: *object_type = @ptrCast(@alignCast(ptr));
                self.stop();
            }
        };
        return .{
            .ptr = @constCast(&obj),
            .vtab = &.{
                .start = impl.start,
                .stop = impl.stop,
            },
        };
    }

    const default: Scheduler = .init(DefaultScheduler{});
};

const DefaultScheduler = struct {
    name: []const u8 = "DefaultScheduler",

    pub fn start(self: DefaultScheduler) void {
        std.debug.print("Hidden start implementation: {s} \n", .{self.name});
    }

    pub fn stop(self: DefaultScheduler) void {
        std.debug.print("Hidden stop implementation: {s} \n", .{self.name});
    }
};

pub fn main() void {
    var sched = Scheduler.default;
    sched.start();
    sched.stop();
}

This also should work if you want to keep it ‘hidden’ as you say (which I interpretate as internal to Scheduler).

const std = @import("std");

pub const Scheduler = struct {
    ptr: *anyopaque,
    vtab: *const struct {
        start: *const fn (ptr: *anyopaque) void,
        stop: *const fn (ptr: *anyopaque) void,
    },

    pub fn start(self: Scheduler) void {
        self.vtab.start(self.ptr);
    }

    pub fn stop(self: Scheduler) void {
        self.vtab.stop(self.ptr);
    }

    fn init(obj: anytype) Scheduler {
        const object_type = @TypeOf(obj);

        const impl = struct {
            fn start(ptr: *anyopaque) void {
                const self: *object_type = @ptrCast(@alignCast(ptr));
                self.start();
            }
            fn stop(ptr: *anyopaque) void {
                const self: *object_type = @ptrCast(@alignCast(ptr));
                self.stop();
            }
        };
        return .{
            .ptr = @constCast(&obj),
            .vtab = &.{
                .start = impl.start,
                .stop = impl.stop,
            },
        };
    }

    const default: Scheduler = .init(struct {
        name: []const u8 = "DefaultScheduler",

        fn start(self: @This()) void {
            std.debug.print("Hidden start implementation: {s} \n", .{self.name});
        }

        fn stop(self: @This()) void {
            std.debug.print("Hidden stop implementation: {s} \n", .{self.name});
        }
    }{});
};

pub fn main() void {
    var sched = Scheduler.default;
    sched.start();
    sched.stop();
}

Thank you for sharing this. I was thinking about it but one downside of this approach is that it prevents the user to pass a custom allocator or a configuration file.

Maybe it is my background in Java programming which influences me. Since you are allowed to have package scope Class so you can swap/have a very complexe implementation which is not exposed to the users.

The pub identifier gives you the same result. That why I’ve followed this approach.

That would be great if it was feasible.

What I am finding difficult to understand is why by removing adding/removing the type field in the Scheduler struct gives me 2 different behaviours, which compile using Zig 0.15.2 on macOS.

It is also weird that the code gives different result, based on @Southporter comment: Zig: Why does removing the type field from a vtable-based struct changes the behaviour - #2 by Southporter

@Southporter told you what was wrong, what you did causes undefined behavior, it’s called like that for a reason, you could have had different results, and the first version wasn’t more correct than the first one only because it ‘happened’ to work.

That ‘default’ scheduler serves no purpose as far as I can see, I just readapted your code to avoid the problem with that buggy get_default (and btw you don’t need init() everywhere, you can just assign a struct with a literal initializer if there’s no real work to do in init).

1 Like

As @mg979 said, undefined behaviour is undefined.

But to give a more useful answer, if you did smuggle type into runtime it could have a size and alignment that would affect the stack layout changing data you read from your dangling pointer.

I can’t say exactly what happened, but that’s my guess, even though type allegedly has a size of 0 and alignment of 1 according to built ins @size/alignOf, that is certainly not accurate as type does have data.

1 Like

Similar to the others who have commented, I wouldn’t write it like this. I would define an interface, and use @fieldParentPtr() similar to std.Io.File.Reader/Writer works. a “blind” ptrCast like this is hard to prove correct. Case it point, you wrote pub fn start(self: DefaultScheduler) void which assuming you’re set on writing it like this, should be *const DefaultScheduler Because as written the compiler gets decide if it calls by ref, or by value.

I would make the type in init explicit init(Type: type, obj: *anyopaque)

but I’d rather write something like

const DefaultScheduler = struct {
    name: []const u8 = "DefaultScheduler",
    scheduler: Scheduler,

    pub fn start(sch: *Scheduler) void {
        const self: *DefaultScheduler = @fieldParentPtr("scheduler", sch);
        std.debug.print("Hidden start implementation: {s} \n", .{self.name});
    }

    pub fn stop(sch: *Scheduler) void {
        const self: *DefaultScheduler = @fieldParentPtr("scheduler", sch);
        std.debug.print("Hidden stop implementation: {s} \n", .{self.name});
    }
};

don’t forget start() and stop() in the Scheduler needs to be a ptr as well.

As to why it worked before, or in different versions? I’m assuming you got lucky and the compiler decided to emit the functions as *const DefaultScheduler instead of DefaultScheduler.

the method syntax may auto deref the self parameter, regardless of if PRO was applied it still works semantically.

The reason it worked before was the undefined behaviour happened to do the expected thing.

I would not encourage this style of interface unless you have size constraints on the interface object, or you want to better facilitate implementations interacting with interface state and extending its functionality. Those are the reasons reader/writer use this style.

Are there performance implications in using this kind of interface, generally speaking, compared to using other kinds of composition (like the one suggested by @Southporter)?

Maybe this happened because the data was still on the stack (even if not legitimately addressable) and wasn’t overwritten by some other data? If my reasoning is correct, the outcome in these (UB) cases depends on whether you push more stuff on the stack which would overwrite old data (but we don’t know in this case because OP didn’t post their main function)? That would explain why they got the ‘expected’ result in one occasion, but maybe they were manipulating differently the stack in the two instances?

LLVM has a harder time devirtualising it.
Non interface solutions tend to be better optimised.

But if you care you should be benchmarking in the first place, then you will know for a fact what is faster.

What @Southporter suggested, was a different style of interface, where you just keep a pointer to the implementation state as part of the interface object. This is what std.mem.Allocator does. I would recommend this over the intrusive approach you recommended, simply because it’s harder to lose your foot.

we dont know what their main/test was, but they do return references to stack memory in their original code

ptr is pointing to the obj parameter, and is even worse @constCasted. Which is fine if you know the implementation, but init does not.

The pointer to the implementation state is dangling.

Also, my talk about the smuggling of type into runtime, is just not what the problem is, even if that is a bug present in the code.

2 Likes

Hey, even though the nightly version is very unstable, I think you can try it on the 0.16 nightly version. I guess both versions would throw errors now; 0.15.2 compiles successfully but just wasn’t able to detect illegal behavior.

This is not how Zig expects things. This is a Java way of approaching it. It works in Java because Java :dizzy:automagically :dizzy: moves things to the heap when necessary.
Zig on the othe hand expects the developer to place it on the stack or the heap. This gives the user flexibility.

This is critical for your case. If your default scheduler needs any user passed inputs, then you want to allow them to create there own with these values and then get the scheduler interface from that.
You will want to provide the default implementation as a type that the user can create a value with and anyplace where you expect a scheduler, they can create it from the value.
This is how we expect things to work in Zig. This is how the standard library does it.

1 Like

We can look at godbolt and see the difference:
With type in Scheduler
Without type

If you look at example.main in the assembly window, you will see that with type, the compiler is able to remove a lot of the code and call functions directly.
The second seems to be what I was talking about. It calls the get_default function, gets a pointer to the stack as part of the value and crashes because that value is no longer valid.

After looking at it, it seems like including the type: type allows for comptime evaluation of the scheduler and moving some of the values into more globally accessible areas, which means that the first one behaves more as you would expect, while the other is illegal behavior for runtime values.

I’m still learning how to read assembly, so my description may be off. Others may be more knowledgeable and may be able to provide a better analysis.

Right, is that incompatible with what I said? How would the undefined behavior work otherwise?

I’m assuming the point is to create an abstract interface. Because otherwise Scheduler shouldn’t need a vtable at all. Given the constraints, what specific implementation would you suggest?

This is completely correct, and it may be worth expanding on what it means.

One may often create the appearance of types existing at runtime. For example, printing the name of a type:

std.debug.print("This is a {s}\n", .{@typeName(@TypeOf(thing))});

This will print the name of the type at runtime.

It is however, not a type, it’s more like, type residue. The name is all that’s left, by the time the program is compiled and runs. The actual program will be closer to this:

 // Note: function does not exist, it's an analogy
std.debug.writeAll("This is a ");
std.debug.writeAll("Thing");
std.debug.writeAll("\n");

If your code creates a situation where the extraction of ‘type residue’ requires runtime-known information in order to happen, that won’t compile.

It’s possible to create some very dynamic-appearing code through judicious use of type residue, so long as you grok what you’re doing. The limits of the technique take some exploration to find.

The reason I mention this is because the inclusion of a type field in the interface definition is exactly the kind of pattern which might look like the persistence of the type until runtime, but you’re going to keep bumping into the reality of it: it does not. Whether or not there’s value in doing things this way, or trying to, I really can’t say. My hunch is that it will fall down right precisely where you want it to work.

2 Likes