Function Call Devirtualization

I’ve been looking into the new Reader/Writer interfaces which has led me to comparing these two dynamic dispatch methods:

  1. “fat pointer”: a context pointer + pointer to a single, static vtable (e.g Allocator)
  2. A single pointer to a vtable stored in the context struct, which is accessed with @fieldParentPtr. (e.g new Writer, or Allocator pre 0.9.0)

This blog post from when Allocator was changed from the latter type to the former type of interface explains the difference pretty well, as well as the performance reason which motivated the change.

According to this post, the difference in performance is due to how optimizer-friendly each approach is with regards to devirtualization.

The post explains that approach 2 makes it impossible for the optimizer to devirtualize calls because the vtable lives in a runtime object andcould be modified, so the compiler has to treat it like a black box. Approach 1 is supposedly not affected by this because:

when using fat pointers, the vtables are shared constant objects, and a new fat pointer is constructed on every call to gpa.allocator() .

This confuses me because the Allocator is also a runtime object, and even though the function pointers in the static vtable can’t change, the pointer to the vtable in the Allocator interface might! Why doesn’t this make devirtualization impossible? What’s the difference?

1 Like

But this isn’t how Io.Reader/Io.Writer work, instead there we have a pointer to a concrete Reader/Writer instance and a few fields which are used in the hot path and once their buffer is full they use their constant pointer to a constant vtable to deal with that full buffer and after that they again work on concrete direct data. Devirtualization is much less of a concern because most of the operations aren’t virtual to begin with.

Here is an attempt at visualizing what is going on with *Io.Writer:

The drain buffer only happens if the buffer is too full to fit the data that is about to be written.

13 Likes

this proposal will dramatically improve zig’s de-virtualisation

4 Likes

You’re right, I misunderstood. For some reason I thought the vtable was what lived inside the implementation, but the vtable is static with both Io.Writer and Allocator. The only difference is how the implementation pointer is accessed, either through @fieldParentPtr(interface) or a type-erased pointer stored in the interface.

The old Allocator pre 0.9.0 actually seems to store the function pointers directly rather than a vtable pointer. I don’t see why this would affect optimization, but it’s interesting.

This comment on the issue which the aforementioned blog post links to explains the reason why the @fieldParentPtr approach isn’t optimized properly, so pretty much answers my original question:

In the current code, LLVM peels off the first iteration of the loop and runs it first. In that iteration, it devirtualizes fill but doesn’t inline it. fill writes to state in prng, dirtying it. Unfortunately, that also dirties the function pointer in rand, which is stored inside of prng. With the function pointer dirtied, all future iterations of the loop must do a full virtual call.

This comment ends with (highlight mine):

This problem applies to all interfaces in Zig which use the @fieldParentPtr idiom. We should switch them to use this temporary setup instead.

So it seems like @fieldParentPtr is the better approach in theory, and isn’t inherently unoptimizable, but is worse in practice due to llvm’s deficiencies. Maybe the proposal linked by @vulpesx would solve this, and @fieldParentPtr interfaces would become objectively better by being smaller and maybe even more cache friendly since the interface and implementation are stored in the same place.

2 Likes

it makes the interface struct larger if there is more than 1 function.

I think it should, if it doesn’t, I would take that as it being unfinished, after all llvm is the backend for release builds (will be for a long time) which are expected to have this kind of optimisation