The Limits of Devirtualization

Interestingly, a minimal std.debug.print("Hello World!\n", .{}); program is approaching that size on Zig nightly on macOS, with -O ReleaseSmall -dead_strip -fstrip -fno-unwind-tables

Nightly build of hello.zig is 122K on macOS/aarch64 and 61K on Linux/aarch64.

0.15.2 build of hello.zig is just 8K on Linux/aarch64, using the same flags.

I wonder if it’s a regression, or a consequence of Io stuff being pulled in due to print

5 Likes

You mean the 106 function pointers?

Yeah. Probably that.

3 Likes

Great that you mentioned it. I know it’s tangential, but the whole virtual functions galore is getting out of hand. We always recommend pushing everything possible into comptime rather than runtime, but these interfaces are going in the opposite direction. Zig was right when it made Reader generic. Not only did they undo it, but they doubled down with Io.

And we’re just talking about size. Wait to see the impact this is going to have in performance. Devirtualization is one of the optimizations that I almost never see in generated code.

3 Likes

Right, though restricted function types should help in this case, yeah?

4 Likes

I’m reserving judgement on Io, although I’m doing so with raised eyebrows.

The optimistic take is that there’s a bit of tunnel vision involved in solving a difficult problem here, and providing for less maximalist use cases (does every input/output task need a CSPRNG built in? process launching? net bind? because with Io we get all of that whether we asked for it or not) will emerge in later releases out of experience gained from 0.16.

But when I look at std.posix 0.15.2 and std.posix master: there’s no escape hatch for this thing. It’s all-in.

So… we’ll see how that turns out.

4 Likes

This alone can be a pretty big issue for more constrained environments. One of the big reasons I became interested in Zig was its ability to produce relatively small binaries that were comparable with C. Seeing a binary size jump nearly 8 fold with all (most?) size optimizations turned on is alerting. Not saying that the core team has ever made producing small binaries a highlight feature of the language. It’s just a very unfortunate byproduct of the current focus that we’re losing the ability to do so to a non-negligible degree.

My understanding is that restricted function pointers will help with speed, but not size, yea?

5 Likes

Don’t know, I would like to know as well. Perhaps it can help the optimizer generate less runtime dispatch code and better folding due to devirtualization?

2 Likes

While I’m a big fan of the Io interface, I think it’s a mistake to make the only option. Especially for std.debug stuff, that’s compiled into every binary, depending on Io makes little sense to me, it’s not like you can realistically use evented Io in a SIGSEGV signal handler, for example.

3 Likes

I don’t see how it could.

The Io.VTable has 105 function pointers. They’ve all got to be there, and their existence requires that each and every function, and every function those depend on, be compiled and included in the binary.

It’s a “vtable” so calling them doesn’t matter. That CSPRNG, it’s just along for the ride.

3 Likes

Hopefully, yes. But it’s an unproven idea. Also, any optimization enabled by it would have to come from Zig’s own backend, since this concept doesn’t exist in LLVM. As far as I know, Zig’s backend doesn’t do any optimizations yet, so it would be a long time before they got to this specific optimization.
Also, we already have a really good solution for this problem, generic programming, which Zig does amazingly well. The generated code is unbeatable. The only problem isthe ergonomics of it, like breaking tooling and signatures becoming harder to understand. But I think the best solution would be to make generic programming more amenable to tooling, not reverting back to pushing everything towards runtime.

2 Likes

If you could devirtualize very well, you would know exactly which functions get called and which don’t. At this point, it’s the same as a direct function call, and dead code elimination would work like usual.

6 Likes

IIUC, the issue description does mention a technique to lower this to LLVM using metadata nodes, but you’re right that it’s unproven. Will be interesting to see the effect if it’s implemented. The day Zig gets an optimizer pipeline that people can populate with passes will be grand.

3 Likes

It’s not code, it’s data. It’s not that eliminating fields of a declared struct instance which don’t get accessed is impossible, sure, that’s a thing a compiler can do. But it’s weird stuff.

3 Likes

You mean because it would change the size of the Io type if you set those fields to void or similar? That would be a little weird, but maybe that’s the right intersection of vtables and comptime generics?

I know I’ve used that technique to eliminate unneeded fields and reduce binary size in one my libraries. This was just with comptime configuration options, but maybe dead code elimination could do something similar in this case?

1 Like

Zooming out here a bit: I’m excited for the new Io. The approach to solving control flow virality (aka ‘function coloring’) is very promising, and it’s ok with me if minimum binary size regresses for a release or two because the team is focusing on getting that right.

I think yanking stuff out of posix the instant that Io is capable of delivering the functionality is hasty, but it’s minor. As long as the core team listens to feedback once 0.16 is official, we can recover the use cases which this development harms, while enjoying everything it brings to the table.

I’m talking about something which, so far as I know, has never been officially suggested as a solution to the consequences of having a hundred-plus member dispatch table. @LucasSantos91 brought it up and we’re riffing on it.

Call it “vtable shaking”. It’s a pretty good metaphor: most languages compile everything and then try to eliminate what isn’t in use, “tree shaking”, but Zig is different: it uses lazy compilation, so nothing is compiled unless it’s used, no tree shaking necessary.

The issue at hand: creating an Io instance counts as “using” every single function belonging to that dispatch table[1]. When you say “put a struct instance in static memory containing a whole bunch of pointers to functions”, Zig just does that, and as a consequence, all those functions must be analyzed, compiled, and appear in the binary.

Lazy compilation means it won’t happen until or unless the struct is reached, which is what happens when an Io is created. But that’s as fine-grained as it gets, Zig doesn’t continue to hold off until fields from the struct are actually used, and late-bind a “real type” containing only those fields. One could think of that idea as: every struct, every union variant, starts with the default type void, and when it gets used, that is replaced with the user-specified type.

I don’t think Zig should do this! For one thing, the impact on the compiler would be fairly radical, and not in a good way. No composite type could be said to have a shape until semantic analysis is completed, that would introduce some pretty wild dependencies which would have to be carefully teased apart and made non-circular.

I also just don’t want to start having to think about data that way. If I tell the compiler that BlahStruct has a .wazoo field of u32, I don’t want it second-guessing that decision based on whether or not I’ve used it in reachable code. Too weird for me.

Another alternative: special case it. Put logic in the compiler which says “oh hey, this instance is Extra Special, it’s a struct consisting purely of restricted function pointers, we do as much devirtualization as possible and then shake the vtable to eliminate anything which wasn’t used, at the end”.

Much more plausible. But this is where I say: hey. c’mon. It’s time to stop pretending this thing is a struct. Let’s make it official and promote whatever-this-is into the type system.

It’s a mistake to think simplicity is achieved with as few concepts as possible. When we conflate several distinct concepts into one, none of the behavior goes away, it all has to be explained.

Example: are struct types containers? The answer is it depends because of tuples, which use the struct keyword but do not otherwise behave at all like structs. Separating the two concepts would simplify the language: both structs and tuples would be understood in terms of themselves, rather than tuples existing as a long list of exceptions to the general behavior of structs, which therefore ‘leaks’ onto structs, where it doesn’t belong. The answer to “are structs containers” is yes, the answer to “are tuples containers” is no, because the answer to “are tuples structs” is also no.

I think (and remember, this is a brainstorm, it’s not on the horizon as a proposal) that extra-special vtable-shaking behavior even being considered is a sign that we’re reaching the limits of the “roll your own interface” policy with Io, and we should be taking a good hard look at existing practice, and thinking about how to let the type system help us out. That’s what it’s there for.


  1. It’s not a vtable! Having established my objection I will proceed with established terminology. ↩︎

11 Likes

If it’s na optimization, and it’s done properly, you wouldn’t need to think about it. You could imagine that the shape is exactly what you wrote, and the compiler would only make changes that don’t affect program output.
It’s an optimization that would be radically useful, for way more than just functions pointers.

That’s is the real blocker here. The complexity of this optimization would be exponential, resulting in impossibly long compilation times.

3 Likes

If not exponential, at least it violates a bedrock assumption of the current compiler, and not, I suspect, in way which could be followed around and fixed in situ.

As an astral-plane idea for some other language, I agree, there’s something interesting here. I don’t think Zig should work this way, but also, doing it would be huge, it’s a radical change.

I suspect that, with the exception of specifically Io.VTable, the amount of unused fields picked up would be quite minimal, representing either incorrect comptime logic or (most often) a field which is just kind of hanging out, because it was in use but has been factored out of play. In the latter case, sounds like a job for a linter to me: the correct thing is to remove it from the source code, not elide it from the binary, this is a kind of diagnostic which ZLS could be induced to perform fairly easily.

It would mean that code wouldn’t have to do ‘fancy comptime stuff’ to make a field void under some conditions: just don’t use it and it doesn’t exist. Tempting, but I think it’s better to have the documentation, and more than that, type safety: if you “leak” a Windows field into a Linux build, it’s not going to compile, with “field elision” it would just silently add it, and any problems would be runtime ones.

Neither case is worth flipping over the table and resetting the board.

2 Likes

I’ve seen discussions in the past[1] (I may even have been part of them) where it’s been asked why a vtable interface is necessary for things like allocators, and I think the same question is relevant for IO. These are things which aren’t normally switched at run-time, so why doesn’t a static duck-typing approach work.

I think you can classify the responses into a few different classes. The VTable approach…

  • …avoids needing to have function parameters of type type everywhere in code.
  • …makes the compiler’s job easier as otherwise all types including references need to be generic.
  • …avoids the compiler instancing multiple versions of multiple large specialised versions.
  • …has a low runtime overhead, if any once optimisations have taken place.

However, if you’ve got to carry the overhead of compiling and including a whole sub-section of the standard library just to use a couple of functions then maybe that’s not always the way you’d choose. Sometimes maybe you just want to compile the few functions you use and nothing more. My ideal would be that the std-lib would allow for both styles. Vtables when it’s better. Static ducks when that’s better.

I’d like a way to give a name to “any type with these decls”. [2] I think that is probably orthogonal to what type of implementation is behind it though. You can have straight functions or VTable trampolines.

…or maybe you’re thinking the VTable itself needs a special type / annotation to allow it to be optimised down to whatever is used?


  1. ↩︎

  2. However, I also see it’s easy to get dragged down the full “type classes” road once you have that. That’s a really powerful paradigm, but not necessarily what Zig needs. ↩︎

3 Likes

I am a bit surprised there is no optimization for only set, never read memory without specified layout.

Anyway would such optimization be possible to do before LLVM gets its hands on it?
It could be safely turned on for non extern/packed/volatile data or am i missing something?

This general problem domain is the Graveyard of Empires in Zig. Go has (well, had) generics, Zig has… this. Call it the keyword interface question.

I think it’s better to wait until Io is official, and we all have some experience using it, at least some of us have implemented it (or tried to?), before trying to scale that mountain.

Theory and practice. In theory, a struct and friends have no layout guarantees, so something like this could be done.

In practice it isn’t done. Structs may not have a specified layout, but they have a @sizeOf, we expect @offsetOf to be consistent per field for any two instances of the same type, the data survives a round-trip into and out of a byte slice, and so on. A struct type has a shape, not several, it’s the details of that shape which are left to the implementation.

That assumption is implicit throughout the compiler: two functions can be compiled separately on the premise that a struct T is a struct T, without checking to see if there are any static T which only use a couple of fields, just for one (important!) example.

Empirically I think opportunities to make good use of this optimization would be limited, as well as hard to detect. Again, ignoring specifically Io.VTable, which is a highly atypical structure.

It would just be a lot of work for what we’re likely to get. A struct would have to be a) static b) fairly large and c) mostly unused for any of it to be worth it. Shaving one single 80 byte struct down to 40 bytes isn’t helping anyone.

The thing about the vtable is that because it holds function pointers, that forces functions to be compiled which aren’t necessarily used, so that the pointer can exist. That kind of knock-on consequence is uncommon: we can imagine a static struct with a []const u8 field, which points to a big ol’ embedded text, and then the program never even looks at that field, but again: how often? Optimizations should affect correct programs, something like what I’m describing is at least a design mistake: if you have a big embed, and it isn’t getting used, get rid of it! Replace it with ""!

3 Likes