The Limits of Devirtualization

That’s the reason I suggested using several optional vtables, broken out by feature groups: an optional vtable field wouldn’t cause the Io type to become generic, so you can still just pass an Io object around, you just need to access “the networking Io vtable” for networking IO operations instead of just having “the Io vtable” as a single monolithic struct with a huge number of function pointers.

Then, instead of being part of a generic type, we just specify what functionality to include in the init method - the relevant vtables would then simply be set to null when instantiating the struct, which would hopefully let the compiler know that those functions can safely be excluded from the build altogether, even if the compiler still has to evaluate the types of said vtables regardless.

This is still a much less automatic solution than what has been proposed above, but unlike the most prominent suggestions so far, this would contain changes to std and would not require extra compiler/language features.

4 Likes

I think this is the right way to start thinking about solutions, but premature.

I know that this:

Was meant to be illustrative, not definitive, but it illustrates a structural shortcoming with the approach: networking should not be available if cryptography isn’t.

A flat table-of-optionals cannot express that kind of dependency. It’s possible (trivially: tooling can generate it) to partition the Io.VTable graph into a mostly-acyclic dependency tree: but there isn’t really a Zig data structure which can express that in a useful way.

Maybe there is an adequate division of the table which works? That would be luck, it’s not inherent to the problem domain. Also, in using something like this we’re really crossing our fingers on the devirtualization question.

One saving grace: since Io itself is a concrete type, the added awkwardness of using a collection of optional dispatch tables only has to be handled in one place. But this is going to leak into user code, by the same token: the author(s) must decide what happens when a function is called and the dependency isn’t there. The policies are: unreachable, “and then, nothing happens”, and error.IoLacksNetworkVTable. Only the last one seems responsible, but it puts a spurious error into play, since user code is generally going to either work if networking is present, or not work at all, and in neither case is handling a runtime error interesting.

It does basically solve the bloat problem though. It would make the dispatch table itself larger, but the table itself is not that big, it’s the compiled functions which are at issue, and those would never be seen if null were provided.

Full marks for being proactive! I’m choosing not to be because I view it as premature. Also because it would be long. Also for personal reasons grounded in past experience.

At this stage, I think it’s fine to note that no one would proactively and deliberately choose to compile in a hundred-plus functions just so they can use three to six. Most languages barely notice things like that, but Zig is not most languages, and in fact does something radical (lazy compilation) to ensure that this kind of thing doesn’t happen.

As a way to initially meet some ambitious goals, no problem. I’m always the one saying make your program correct, then if you still need to, make it fast(er), and the same applies to binary size. This stuff can wait a release cycle or two, the solution space is large, and includes approaches which call for no changes to Io.VTable at all: for example, a la carte primitives which the use cases which need absolute minimalism can build on.

4 Likes

I think this is a path worth exploring, but just using optionals like you show is not sufficient to solve the size problem, you need to have optional pointers:

fs: ?* const std.Io.VTable.Filesystem,
async: ?* const std.Io.VTable.Async,
network: ?* const std.Io.VTable.Networking,
cryptography: ?* const std.Io.VTable.Cryptography,

If you just have the structs as values, the size of the fs-field would be the same size as the VTable.Fileystem struct, even if it’s set to null.

Only when using pointers do you get the space savings that we want; each field is now 4 pointers large (so 16 bytes on x64 AFAIK).

This of course comes with a runtime penalty because you now need to do two pointer indirections, and your base binary is slightly larger than before if you use all Io-features, though that last point is pretty much irrelevant. As soon as you don’t use any one of those VTables, you don’t have to pay for it. However, this means you need some way to control which VTables get populated.

It’s also not optimal because you still have to pay in binary size for all functions in any of those vtables, even if you never call it.

The optimal solution would be one where the zig compiler can see that you’re not using a function and so it just discards any memory associated with it, but we already talked at length about the difficulties of doing that.

Very interesting point, that does make the solution quite a bit trickier.

It mostly does actually. 106 (107? ehhh anyway), 64 bit system, 106 pointers, 848 byte VTable. Four categories, four more words[1], 880 bytes.

It’s the compiled functions which blow up the minimum-viable program to 100k or so, it’s not the VTable itself, which isn’t all that big.

That’s what I mean here:


  1. In principle optionals can be packed, in status-quo Zig, that doesn’t happen. ↩︎

Right, I don’t think now is the right time to go and actually optimize or tree shake this code, especially since it’s still in an experimental state where the specifics of what the Io object should even include is still in flux.

However, if it indeed the case that either the IO vtable or the infrastructure around it will receive some sort of change to allow the omission of unused functionality, I think it’s better to at least have some idea about the most likely outward-facing changes said optimization might require, if any.

This gives Zig’s maintainers the opportunity to make at least some preparations that mitigate the extent of changes such an optimization would entail, hopefully avoiding a single monolithic 10k+ line PR that touches a full third of std or something.

I think what iFreilicht is referencing in terms of my idea not solving the size problem is the memory required to store the Io object, which is miniscule on desktop platforms but may actually be significant on lower-power embedded projects.

2 Likes

That’s exactly right! This was pointed out to me by @buzmeg here.

1KB is a lot of data to store in RAM by default in an embedded context, and that’s just the current state! Looking at the current status of #30150 - Migrate all applicable I/O APIs to `std.Io` - ziglang/zig - Codeberg.org it seems to me that number might still double or triple before the feature is done.

An Arduino UNO has 2KB of RAM[1]. I don’t think we want stdlib’s runtime memory footprint to be so big that you can’t use it on a budget chip like that. The entire point of Io is that it makes libraries portable across a wide range of platforms.

This is also relevant when compiling to WASM for the web, as every additional KB reduces the amount of time taken until your code can actually run.


  1. From what I remember the Atmega 328p is an 8-bit chip, so pointers on it should be much smaller, reducing the size of Io.VTable, but this doesn’t invalidate the argument. ↩︎

2 Likes

Considering that even Unix Domain Sockets would count as networking, I quite frankly disagree with that sentiment.

And even if you use TCP/IP, you are maybe not even needing cryptography. You need cryptography if things go through the Internet (or in general untrusted networks).

Look for example at how a lot of services work these days: A reverse proxy which receives outside requests and deals with encryption, and then communicates with the actual services unencrypted since it’s not needed inside of a local network.

Yes, “always encrypt when going through the network” is something which works in a LOT of cases, but not all cases.

8 Likes

The v-table would probably be stored in ROM/Flash rather than RAM in that situation. It’s still a fair bit (more than I’d like) but not nearly as bad as taking up half of RAM.

1 Like

Well, turn it around if you want: if networking isn’t available, what’s the cryptography for?

It’s not in RAM. Arduinos are 32 bit, so for them, we’re talking about 440 bytes of Flash.

Of course, for embedded, every byte does matter.

But you’re quibbling about, literally, the last 1% after recovering 99% of the wasted space.

I’m 99% less interested in that.

1 Like

While perhaps not as common, there are still several uses for cryptography beyond networking. In my own experience, I’ve used it for file/data encryption and offline network capture analysis.

I understand the point you’re trying to make, but I don’t think that statement quite hit the mark.

3 Likes

File encryption for example?

4 Likes

Trying to determine that (again, trivial) tree, by thinking really hard, is spectacularly missing the point.

Without having a strong opinion here (or even knowledge/experience to begin building it), I saw Jon Blow mention this kind of interface on a recent podcast and I thought it might be relevant & interesting.

The idea is, instead of having N functions in the vtable, you just have one “god function” which accept “Op” enum and args, and dispatches to the implementations. So the only pointer that needs to be stored is the “god function” one.

Roughly this is how I imagine it:

const Op = enum{stream, discard, ...};
const Result = union(Op) = ...;

fn io(comptime op: Op , args: anytype) !Result {
    //
    switch (op) {
        .stream => {
            // inspects args, implements .stream() and wraps result
        },
    }
}

Of course this would be kind of hard to use, so for normal use you would still have convenience wrappers around it, but in those use cases where every byte matters, people could just bypass those and use the “god function” directly.

3 Likes

…well I realized that it kind of beats the purpose if commonly used libraries use those convenience wrappers, so actually the “god function” would have to be the standard interface.

But there still might be some common ground.

Why? You mean just for the purpose of space saving? Because I would assume that the convenience wrappers are trivially inlined when optimizing for size.

1 Like

This is kind of planned with the operate API. If it really becomes reality, most of the functions will belong there.

3 Likes

I meant that if we had the wrappers then it could introduce this problem where half of libraries in the ecosystem use the wrappers.

But perhaps that’s not an issue, as long as the wrappers were thin (no extra vtables, etc.) and easy to inline. So then there would be actually no reason to not use the wrappers; the “god function” could remain hidden as just an implementation detail.

(Originally I thought about using the io() to back current interfaces like Reader, so that way it would beat the purpose.. I tend to think aloud :D)

1 Like

Ah okay I see!

Yeah my understanding was that all wrappers would basically be one-liners like this:

pub fn writeInt(self: *Io, something: u8) void {
    self.godFunction(.writeInt, something)
}

Which can be inlined trivially.

Very interesting, thank you for sharing!

1 Like

It reminds me of Win32 HWND!

I really love this message-based style.

1 Like

Bonus: args doesn’t have to be anytype anymore. it can be op.Arg().

(I’m sad comptime enum parameters are not used more extensively. I’ve been using it basically since I started on Zig, in cases where there are multiple functions with the same signature and the same level of operation).