The Limits of Devirtualization

That is a common misconception. People generate a small function in Compiler Explorer, they assign a function pointer to a variable, immediately call it, then never use it again. Then they look at the generated code and see that the compiler optimized it, and think that devirtualization is common.
That’s barely devirtualization, that’s just a convenience constant.
Anything more than this trivial example, and devirtualization will fail.

The solution for that is generic programming. If your code is bloated, instantiate the generic function with a type erasure, containing the vtable. Now you only have 1 function for all your types, exactly the same as you have today. You can also mix and match. If you need this function to be really fast for one particular type, you can instantiate your function for that specific type, and if for other types you prefer to debloat, you instantiate the function with the type erasure.

C++ named these concepts. There were some proposals for it in Zig, all rejected. Some were quite elegant, but none addressed one the main problems in generic programming, which is tooling. Any proposal needs to take into consideration how a LSP would be able to generate completions based on what is written in the function signature. One of the proposals that gained traction was that you could use arbitrary functions that returned bool in an anytype declaration, and the compiler would reject the type if the function returned false. It has its merits, but this is the kind of thing that a LSP would never be able to generate completions from.

5 Likes

Maybe the tooling problem goes away if it’s feasible to ditch the common LSP protocol, and define a different format that effectively connects an editor directly to a running incremental mode compiler.

That would be a ridiculously ambitious undertaking.. but it does sound like the perceived need to adhere to the common LSP protocol (which we have no control over) is creating a tooling blocker for a more elegant compiler / optimisation design ?

2 Likes

why should you? i can’t come up with a reason it should stay if it has zero dependents

shouldn’t these two items be considered as sort of opaque constants with a well-defined maximum? we can depend upon their actual value, but should we ever? in case we should not – it looks to me that it should not be a barrier to optimizations. one compilation it might be 3, other it might be 4, but if its maximum is 4, and you make your decisions on this datum, then it becomes 3, but its alignment stays the same, why should you care?

if the compiler decides to rip apart my structs within some function and reassemble them differently, then i care just as much as i care about reordering.

I agree. I remember the post by andrew where he days that he prefers namespaces not be used for categorization. well, this is exactly what conflating multiple rulesets into the namespace of struct does. not separating it into a symbol just makes the difference implicit.

5 Likes

I tried an approach that turns a vtb interface into a union enum while retaining extensibility, but this requires users to declare all the types the program will use in advance, and all namespaces need to become generic namespaces.

3 Likes

This is an impressive contraption. Creative use of comptime is magical stuff.

As a ‘tech demo’, it’s great. In a way, it reminds me of metalang99, which uses the C preprocessor to add various things to C: algebraic datatypes, slices, interfaces. Hirrolot is a bona-fide genius, no doubt about it.

But I write Zig, not metalang99-flavored C. There’s a difference between built-on and built-in, and there are things a compiler can do for us that, even with comptime, even with a hypothetical comptime which wasn’t nerfed so that you have to hand-generate all the declarations for your union type and wrap it, we cannot feasibly do for ourselves.

My main argument is that there isn’t much point, it would be hard to do and it wouldn’t improve programs much.

Subsidiary to that is the more philosophical point, that I don’t want to argue with the language about what my data structures ‘really are’. I’ve been known to do things like reserve space and measure memory use on that basis, and yeah, sure, I could make certain they get referenced somewhere, then make sure that somewhere stays reachable, whatever.

One of my data structures is padded and aligned, of course the padding isn’t referenced anywhere, it’s padding. How would I ensure that it stays that way? If I reference it in a test, users importing the queue aren’t going to compile the test, there goes my padding. What if I reference it, but the compiler realizes that the reference has no effect, elides it, and then strips my padding? Now my atomic isn’t on its own cache line, and I’m sad.

If I thought it would actually lead to better object code in practice, I’d be all over it. But I don’t, so I’m not.

2 Likes

Within the same compilation unit, yes. And within a compilation unit, where the compiler has enough information to know if a field is accessed or not, the result of these functions can reflect the size and offset of fields within the actual compiled result, even if fields are eliminated due to never being accessed.

Once we cross the boundary of a compilation unit, it seems incorrect to assume the memory layout of a struct (specifically not extern and not packed) will maintain size and memory layout. Under the assumptions that seem reasonable to make of struct types, I also think it also seems reasonable that fields which are never accessed are statically eliminated.

I don’t think it matters if @sizeOf(std.Io) is different between two programs, as long as it’s consistent within a single program.

7 Likes

Well, technically via whole program analysis one could figure out that a function pointer is never called, and with that that no function which gets assigned to it isn’t either, and with that again that these functions don’t actually need to be included.

But that’s would be an expensive optimisation. Sure, one would only need that for release builds, but expensive nonetheless.

1 Like

align, probably extern struct, possibly packed struct. Otherwise your padding could be anywhere anyway.

4 Likes

Currently, the use cases I see in the standard library are like this.
This method roughly uses a convention: the default alignment of a struct is determined by the maximum alignment of all its elements. If Zig can provide a convention that ‘optimized unused fields still retain their effect on the struct’s alignment,’ then this use case can still be addressed, although it is indeed somewhat strange.

Well it does use align, but

It’s not enough.

I’m talking about the very same pattern here, but if you look at where it’s used, if the padding was elided, the compiler would be free to put the queue slices right next to the CachePad fields. The fields would be aligned, but the point of doing this is so that each ends up the only data in that cache line. This would subvert the purpose. I don’t care about how many zeroes the pointer ends in, I care about isolating the data. If I cared about which end of the line it was on, I’d use extern struct here, but I don’t.

Of course, look, all of these things could be solved. We could add more qualifiers, something like volatile but for “no I mean it, don’t elide this field”. But what are we getting? How many not-a-mistake programs don’t use every field in a struct? This is very far-fetched!

Ok, so Io.VTable is about to change that. I don’t think adding a radical ‘optimization’ which no one wanted last week is a good response to that fact.

I keep getting responses about how it could be done, which is getting repetitive: it could, in fact, be done. This is a thing, which is possible. I don’t see a lot of people linking to their own projects and saying “see, right here? This struct has six fields and I don’t even use them, the compiler should take care of that for me.”

4 Likes

Could the binary size be reduced by specializing your implementation of Io rather than the Io interface? For example, if you know your program isn’t touching the filesystem or networking, your instantiation of the implementation could look something like const threaded: std.Io.Threaded(.{.filesystem = false, .networking = false}) = .init;, telling the implementation to replace its vtable with failing stub functions for the usecases of Io you won’t be using.

3 Likes

It feels like this ‘radical optimization’ has exceeded my expectations. In my imagination, optimizations for unused fields shouldn’t allow one field in a struct to encroach on the padding of another field.
Since the maximum alignment of the fields in CachePad(T) is cache_line, the default alignment of CachePad(T) is therefore cache_line. The size of a struct cannot be less than its default alignment and must be a multiple of that alignment, so the size of CachePad(T) is at least cache_line.

Even if a structure undergoes radical optimization based on the TuQueQue(...) field, causing some fields to be absent due to non-usage, which could indeed alter the layout, the CachePad(T) type of each field still guarantees that its size is at least cache_line. Therefore, I do not believe such optimizations would cause isolation between fields to fail.

1 Like

I think you raise a lot of interesting questions and I wonder if changing the focus from “interfaces full of function pointers” to “each individual restricted function pointer” would reveal some useful way to address it.

1 Like

i don’t see anyone here proposing that alignment request be disregarded. and i don’t see how it possibly could be disregarded. if you have two fields with an alignment request, then it’s impossible for them to be placed any closer than the alignment value. otherwise, it’s clearly a bug. or – how can your “padding be elided” but the aligment request be satisfied?

I’d imagine this hypothetical optimization wouldn’t (and couldn’t sensibly) apply to extern structs. If a struct doesn’t have a defined representation, then I don’t think you should be relying on it being padded and aligned.

There are code patterns that favorize devirtualisation. Eg instead of storing an allocator struct next to the data, I store a std.heap.ArenaAllocator. That way when I do x.arena.allocator().free() it’s trivial to devirtualize for LLVM.

The problem is long call chains passing allocator or io around, because any failure to inline will prevent devirtualization.

3 Likes

Alignment, size, stride, padding: we’re deep in the weeds here. This is, fundamentally, why I don’t want a compiler messing with my declared fields.

But hey! It has been proposed! #21010, by Andrew no less. But not accepted. Good discussion there.

There’s also #12854, an error on unused struct fields, which was closed as, basically, impossible. As a lint, no problem, because linters are allowed false positives. Errors are not.

Conceptually, it’s cool. Zig is lazily-compiled, why not lazy structs. The number of problems it solves is not zero.

But it’s pretty close, and there are practical reasons not to which I’ve covered already. The real use cases can probably be addressed by moving them out of struct-land, and this makes more sense than, well, as @mlugg put it:

an absurd situation where @sizeOf logically depends on the entire program – including the code following itself – being analyzed.

2 Likes

sounds like we need a keyword like interface which allows field removal and field access analysis.

4 Likes

I’m biased, since I already considered Zig 0.7 as almost complete :slight_smile: Although I must admit the addition of packed struct(u32) has been a life-saver for my PinePhone OS project. I just reached 10,000 LoC (maybe ~5% is Global Assembly). But a significant part of the Zig code is MMIO, and those 100s of packed structs are swamped with unused fields. I really do hope that they never become errors as mentioned above. It is hard enough prototyping/refactoring with all the current hand-holding errors.

Zig is currently perfect for what I’m doing, but only because after WriterGate I removed nearly all dependencies on the std libs. This academic search for language purity worries me. When I read threads like this, with increasing binary size, additional API complexity, and consider all the useful things that have been tossed aside already in this quest for perfection, I start to wonder if I’ll ever update versions again.

In fact I’d go as far as to say that with a little better documentation, and a language reference, Zig 0.14.1 could have already been the ‘A Simple Language’ v1.0.

Maybe the answer is to v1.0 the core language, and then separate out all the std libs and allow them to sink or swim alone. I think applying such a dis-aggregation would also indicate the completeness of the core language. Retired libs would then compete with other implementations, both core and external, on a fair basis.

12 Likes

19 posts were split to a new topic: Discussion about Io and Zig