Vtable interfaces and the role of @ptrCast and @alignCast

What is the current best practice for runtime polymorphism based on a vtable?

I guess this is a pretty common question, so I tried to do my homework before posting here :

First of all, I read the excellent article on interfaces based on tagged unions (Easy Interfaces with Zig 0.10.0 - Zig NEWS). This looks great and I’ll use it as much as possible.

But let’s say I actually need a vtable.
There’s a series of blogposts, I think the most up-to-date is this one: Zig Interfaces for the Uninitiated, an update - Zig NEWS
I’ve taken the code snippets from the blogpost and put them together zig_runtime_polymorphism/src/blogpost.zig at main · lhk/zig_runtime_polymorphism · GitHub. But that doesn’t compile (maybe it’s because of zig 0.12.0-dev.926+3be8490d8)

So I decided to take a look at the standard library and see how you do it there.
I’ve tried to replicate the pattern of mem.Allocator and implemented the example from the blogpost with it (an Iterator interface and a Range implementation).

Is this how you would implement an interface, or am I missing something?
The interface definition:

const Iterator = @This();

ptr: *anyopaque,
vtable: *const VTable,

pub const VTable = struct {
    next: *const fn (ctx: *anyopaque) ?u32,
};

pub fn next(self: Iterator) ?u32 {
    return self.vtable.next(self.ptr);
}

A concrete implementation:

const Iterator = @import("interface_definition.zig");

// implementation of an interface, update to https://zig.news/kilianvounckx/zig-interfaces-for-the-uninitiated-an-update-4gf1
const Range = struct {
    const Self = @This();

    start: u32 = 0,
    end: u32,
    step: u32 = 1,

    pub fn next(ptr: *anyopaque) ?u32 {
        const self: *Self = @ptrCast(@alignCast(ptr));
        if (self.start >= self.end) return null;
        const result = self.start;
        self.start += self.step;
        return result;
    }

    pub fn iterator(self: *Self) Iterator {
        return .{ .ptr = self, .vtable = &.{ .next = next } };
    }
};

The full code, with tests can be found here: zig_runtime_polymorphism/src at main · lhk/zig_runtime_polymorphism · GitHub

Finally, I don’t understand what @ptrCast and @alignCast are doing.
Do they perform any runtime checks? If yes, isn’t this interface pattern inefficient, because it does the type checks every time next is called?
If there is indeed some form of type checking, could you point me to a documentation/resource where I can learn more about how it does that?

But my intuition is that @ptrCast and @alignCast just tell the compiler “you’re getting a pointer to an instance of Range. Now you know how to resolve the next function”.
In this case, why is there a separate @alignCast on top of the @ptrCast? Shouldn’t it be enough to tell the compiler “this is a pointer to Range”? How could such a pointer have an alignment which is different from the alignment of a Range struct?

Overall, my understanding of alignment is very rudimentary. Basically I just thought that this is something to be aware of when laying out a struct, in order not to waste memory. The idea to do an @alignCast is confusing to me, I always thought that alignment is just what it is, i.e. there’s a 1:1 mapping between data structure and alignment. Having the type of a pointer should determine the type of the data it points to, which then should also determine the alignment. Therefore, I’m confused to see both a @ptrCast and an @alignCast. I’d expect the alignCast to be redundant. If there’s a good writeup on this, I’d be very interested in it.

Is this how you would implement an interface, or am I missing something?

Looks right to me. There is one thing you can do to improve the ergonomics: Instead of writing the pointer casts everywhere, you can generate them at compile-time:

pub const VTable = struct {
    next: *const fn (ctx: *anyopaque) ?u32,
    pub fn init(Type: type) *const VTable {
        return &.{
            .next = &struct {
                fn fun(ctx: *anyopaque)  ?u32 {
                    const self: *Type = @ptrCast(@alignCast(ctx));
                    return self.next();
                }
            }.fun;
        };
    }
};
...
// Now you don't need to cast in range anymore:
pub fn next(self: *Self) ?u32 {...}
pub fn iterator(self: *Self) Iterator {
    return .{ .ptr = self, .vtable = VTable.init(Self) };
}

why is there a separate @alignCast on top of the @ptrCast?

It reminds you that alignment is a thing that you need to care before doing pointercasts. This is irrelevant here, since you always get the same pointer out that you put in, but forgetting about alignment can cause problems in other cases like when casting to @Vector types.

there’s a 1:1 mapping between data structure and alignment

There is a difference between struct alignment and pointer alignment: You can manually specify a pointer alignment like *align(1) Range. This is useful in some cases, but reading and writing to underaligned pointers will likely be slower.

Do they perform any runtime checks?

@alignCast does perform runtime checks, to verify that the alignment matches.
However keep in mind that these are only enabled in debug and ReleaseSafe. So they won’t slow your release builds down. But these checks will help you catch bugs early when running in debug.

Note that you can also disable runtime checks inside a scope with @setRuntimeSafety(false). But I would recommend to only do this when you are absolutely sure that everything works correctly and proved in a benchmark that these runtime checks are indeed a performance problem.

6 Likes

Thank you for the explanations :slight_smile:

It’s not clear to me how your code does the type checking at compile time. As far as I understand, the new function fun is executed every time next is called, doesn’t that mean runtime calls to the casts?

Also, I’m not clear on the scoping rules that apply here: doesn’t this create a dangling pointer to a stack variable? The function fun is defined in a struct literal, which should go out of scope when the call to init is done.

I believe this is an instance of what Zig calls static local variables .

3 Likes

Well first of all it seems I forgot to put the comptime keyword in the function declaration:

    pub fn init(comptime Type: type) *const VTable {
        return &.{
            .next = &struct {
                fn fun(ctx: *anyopaque)  ?u32 {
                    const self: *Type = @ptrCast(@alignCast(ctx));
                    return self.next();
                }
            }.fun,
        };
    }

Now you need to know one thing you need to know about the compiler:
Every expression that only depends on compile-time values is executed at compile-time (with the exception of most function calls).
So if you have something like 1+2, then that will be evaluated to 3 at compile-time.
Similarly &fun, taking the address of a function(since all functions must be known at compile-time), is executed at compile-time.
So creating the VTable struct .{.next = &fun} can also be done at compile-time.
So &.{.next = &fun} is taking the address of a compile-time known value.
So it isn’t even on the stack. Instead the compiler will actually put it into the binary.

So the next function basically just returns a constant.
Now, in debug modes this indeed causes a function call. You can get rid of that by forcing the function call at compile-time or marking the function as inline:
.vtable = comptime VTable.init(Self)
or pub inline fn init(...}

2 Likes

@dude_the_builder thank you, this is good to be aware of. I guess I should read through the documentation carefully, at least to have these concepts on the radar.

@IntegratedQuantum thank you so much for all the detailed explanations. This is really incredibly helpful.

I still don’t understand how your code can do the casts at compile time. As far as I see, the reference to fun is resolved at compile time, and next ends up pointing at a constant. But the way I read the code, this constant still contains the @ptrCast and @alignCast. That code is not executed until next (=fun) is actually called. So aren’t there runtime checks here, too? Which can be avoided by compiling in release mode, which just removes them but still doesn’t execute them at compile time.

The change I proposed was only for ergonomics, so you don’t need to manually write these casts everywhere. The runtime checks are still there.

1 Like

Should those be align casts or asserts? If the alignment is off, then you aren’t getting a pointer to the thing you think you are and it seems like it blow up (in a much more confusing way) on a dereference. What would be a valid time align case would need to adjust and do it properly?

1 Like

Sorry for bumping up an old thread. Since in this case function address is known at comptime and stored in a const, would it be possible for the Zig compiler to elide call-via-pointer and make a call-by-funciton-body instead? Calling a function through the pointer breaks optimization opportunities and causes an extra pointer dereference. Calling by function body is better for performance.

That is absolutely possible. I would bet that the compiler doesn’t even differentiate behavior between comptime function pointers and “function bodies”.

1 Like

Instead of wrapping every function in the vtable you can just cast the function pointer so long as the signature matches e.g. *const fn (*T) ?u32 can cast into *const fn (*anyopaque) ?u32