The Limits of Devirtualization

That is a common misconception. People generate a small function in Compiler Explorer, they assign a function pointer to a variable, immediately call it, then never use it again. Then they look at the generated code and see that the compiler optimized it, and think that devirtualization is common.
That’s barely devirtualization, that’s just a convenience constant.
Anything more than this trivial example, and devirtualization will fail.

The solution for that is generic programming. If your code is bloated, instantiate the generic function with a type erasure, containing the vtable. Now you only have 1 function for all your types, exactly the same as you have today. You can also mix and match. If you need this function to be really fast for one particular type, you can instantiate your function for that specific type, and if for other types you prefer to debloat, you instantiate the function with the type erasure.

C++ named these concepts. There were some proposals for it in Zig, all rejected. Some were quite elegant, but none addressed one the main problems in generic programming, which is tooling. Any proposal needs to take into consideration how a LSP would be able to generate completions based on what is written in the function signature. One of the proposals that gained traction was that you could use arbitrary functions that returned bool in an anytype declaration, and the compiler would reject the type if the function returned false. It has its merits, but this is the kind of thing that a LSP would never be able to generate completions from.

3 Likes

Maybe the tooling problem goes away if it’s feasible to ditch the common LSP protocol, and define a different format that effectively connects an editor directly to a running incremental mode compiler.

That would be a ridiculously ambitious undertaking.. but it does sound like the perceived need to adhere to the common LSP protocol (which we have no control over) is creating a tooling blocker for a more elegant compiler / optimisation design ?

2 Likes

why should you? i can’t come up with a reason it should stay if it has zero dependents

shouldn’t these two items be considered as sort of opaque constants with a well-defined maximum? we can depend upon their actual value, but should we ever? in case we should not – it looks to me that it should not be a barrier to optimizations. one compilation it might be 3, other it might be 4, but if its maximum is 4, and you make your decisions on this datum, then it becomes 3, but its alignment stays the same, why should you care?

if the compiler decides to rip apart my structs within some function and reassemble them differently, then i care just as much as i care about reordering.

I agree. I remember the post by andrew where he days that he prefers namespaces not be used for categorization. well, this is exactly what conflating multiple rulesets into the namespace of struct does. not separating it into a symbol just makes the difference implicit.

2 Likes

I tried an approach that turns a vtb interface into a union enum while retaining extensibility, but this requires users to declare all the types the program will use in advance, and all namespaces need to become generic namespaces.

1 Like

This is an impressive contraption. Creative use of comptime is magical stuff.

As a ‘tech demo’, it’s great. In a way, it reminds me of metalang99, which uses the C preprocessor to add various things to C: algebraic datatypes, slices, interfaces. Hirrolot is a bona-fide genius, no doubt about it.

But I write Zig, not metalang99-flavored C. There’s a difference between built-on and built-in, and there are things a compiler can do for us that, even with comptime, even with a hypothetical comptime which wasn’t nerfed so that you have to hand-generate all the declarations for your union type and wrap it, we cannot feasibly do for ourselves.

My main argument is that there isn’t much point, it would be hard to do and it wouldn’t improve programs much.

Subsidiary to that is the more philosophical point, that I don’t want to argue with the language about what my data structures ‘really are’. I’ve been known to do things like reserve space and measure memory use on that basis, and yeah, sure, I could make certain they get referenced somewhere, then make sure that somewhere stays reachable, whatever.

One of my data structures is padded and aligned, of course the padding isn’t referenced anywhere, it’s padding. How would I ensure that it stays that way? If I reference it in a test, users importing the queue aren’t going to compile the test, there goes my padding. What if I reference it, but the compiler realizes that the reference has no effect, elides it, and then strips my padding? Now my atomic isn’t on its own cache line, and I’m sad.

If I thought it would actually lead to better object code in practice, I’d be all over it. But I don’t, so I’m not.

1 Like

Within the same compilation unit, yes. And within a compilation unit, where the compiler has enough information to know if a field is accessed or not, the result of these functions can reflect the size and offset of fields within the actual compiled result, even if fields are eliminated due to never being accessed.

Once we cross the boundary of a compilation unit, it seems incorrect to assume the memory layout of a struct (specifically not extern and not packed) will maintain size and memory layout. Under the assumptions that seem reasonable to make of struct types, I also think it also seems reasonable that fields which are never accessed are statically eliminated.

I don’t think it matters if @sizeOf(std.Io) is different between two programs, as long as it’s consistent within a single program.

2 Likes

Well, technically via whole program analysis one could figure out that a function pointer is never called, and with that that no function which gets assigned to it isn’t either, and with that again that these functions don’t actually need to be included.

But that’s would be an expensive optimisation. Sure, one would only need that for release builds, but expensive nonetheless.

1 Like

align, probably extern struct, possibly packed struct. Otherwise your padding could be anywhere anyway.

Currently, the use cases I see in the standard library are like this.
This method roughly uses a convention: the default alignment of a struct is determined by the maximum alignment of all its elements. If Zig can provide a convention that ‘optimized unused fields still retain their effect on the struct’s alignment,’ then this use case can still be addressed, although it is indeed somewhat strange.

Well it does use align, but

It’s not enough.

I’m talking about the very same pattern here, but if you look at where it’s used, if the padding was elided, the compiler would be free to put the queue slices right next to the CachePad fields. The fields would be aligned, but the point of doing this is so that each ends up the only data in that cache line. This would subvert the purpose. I don’t care about how many zeroes the pointer ends in, I care about isolating the data. If I cared about which end of the line it was on, I’d use extern struct here, but I don’t.

Of course, look, all of these things could be solved. We could add more qualifiers, something like volatile but for “no I mean it, don’t elide this field”. But what are we getting? How many not-a-mistake programs don’t use every field in a struct? This is very far-fetched!

Ok, so Io.VTable is about to change that. I don’t think adding a radical ‘optimization’ which no one wanted last week is a good response to that fact.

I keep getting responses about how it could be done, which is getting repetitive: it could, in fact, be done. This is a thing, which is possible. I don’t see a lot of people linking to their own projects and saying “see, right here? This struct has six fields and I don’t even use them, the compiler should take care of that for me.”

1 Like