Apologies in advance for a slightly cranky tone here, did have my morning pur-erh yet 
In my opinion, this is not a good talk. It needlessly mystifies the topic, without communicating the absolute minimum amount of clarity that is required in any virtual-vs-concrete call discussion. It’s like a tribal shaman pointing at the thunder and lightning and claiming those to be a wrath of gods, instead of, you know, explaining this electricity thing.
This is the core fact about virtual functions:
You don’t compare indirect calls with direct calls. You compare a call with fully inlining the function at the call site.
The first benchmark they present is excellent to make this point, but the point isn’t made in the presentation.
The code looks roughly like this (pardon me for my C++):
// Somewhere in your codebase
struct Base {
int concrete() { return 7; }
virtual int virt() { return 7; }
};
// Call-site
void benchmark(Base* b, int iteration_count) {
Timer timer; // Made up class
timer.start();
for(int i = 0; i < iteration_count; i++) {
b.concrete();
}
timer.print_elapsed();
timer.start();
for(int i = 0; i < iteration_count; i++) {
b.virt();
}
timer.print_elapsed();
}
The question is, which loop is going to be faster, the one with virtual calls or the one with concrete ones?
The thing is, the answer here is crystal clear. You might get a different result, but that would reveal a flaw in your benchmark or in your program.
The concrete loop should be infinitely faster than the virtual one. The concrete loop should be compiled away, while, for the virtual one, there must be iteration_count
calls present.
The thing about concrete
call is that it’s concrete — there’s one specific function in the program that is being called, compiler knows about it, and it should be able to inline it, and then constant-fold the entire thing to nothing.
In contrast, the thing about virtual
call is that it is unknowable. There might be any amount of implementations of virt
in the code base, and it could even be loaded at runtime from a dynamic library (.so
). As the function could do anything (eg, print to the screen or change program’s global state), it would be incorrect for the compiler to eliminate the call.
Now, of course you might get a different result in practice. For example, you might notice that the concrete call is not a no-op. This generally means that you’ve messed up something about your physical architecture, so that the compiler lost access to the body of the function. E.g., in Rust, you might be hitting effects described in Inline In Rust.
You might also see that your virtual calls take no time. That means that the compiler was able to devirtualize the call. This is surprising — that means that somehow compiler figured that there is in fact only one function that can actually be called here and inlined it, going against your intention as a programmer to declare the code as runtime polymorphic. This likely means that you should just change your code to explicitly communicate to the compiler that no virtual call is possible there
EDIT: to clarify, I am not trivializing the problem here. Where exactly do you put type-erasure/virtual-call boundary is a hard design problem! Similarly to how a lightning is a pretty complicated phenomenon. But if you start addressing the issue with the “wrath of Caches”, I don’t think you’ll get anywhere.