Discussion about Io and Zig

I had to reflect on this, but, yes, I do agree with that. A function/module should do networking or file I/O but not try to abstract over both.

The baseline hardware characteristics are just vastly different. File I/O, by and large, is fairly reliable, low latency, high bandwidth, and randomly seekable. Network I/O is terribly unreliable, widely varying latency, low bandwidth (although not always), and generally not randomly seekable.

That variance leaks through into the programming abstraction. For example, the error profiles are vastly different. File I/O generally has a bounded set of failure conditions that are probably enumerable with reasonably bounded upper latency that you can generally handle in your code. Network I/O has an almost unbounded set of failure conditions (certainly not enumerable) only a small number of which you could possibly do anything about in your program (what could or even should your code do when DNS is dead or crypto isn’t available?) with latency whose upper bound is infinity.

If you try to abstract over both, you wind up with a stunted set of primitives that only sorta work and always cut people with a rough edge. Everybody always quotes ā€œEverything in Unix is just a file!ā€ but even the Elder Unix Godsā„¢ had to grapple with the impedance mismatch–look at the weird mishmashes of ioctl and fnctl in Unix system programming (everybody has forgotten that ā€œbaudā€ and ā€œparityā€ used to drive people to violence). Another good example of this kind of programming issue is a general ā€œStringā€ type–every String type always has a slight impedance match with the problem you are working on and eventually everybody has to roll their own. It’s better to simply accept that File and Network (and Device–things like USB devices and GPUs and …) need a rich programming API and those APIs are quite different from one another.

Although, to be fair, I think the discussion is probably orthogonal to the the real problem. I suspect the real issue is how to make a good programming abstraction for ā€œconcurrency notificationā€ and subsequent control transfers The problem is that concurrency notification (and maybe concurrency in general) is at a very messy intersection of software/hardware, runtime/comptime, operating system/programming language, active run/passive wait, priority/fairness, interrupt/polling, etc.

Up to this point, only the managed memory languages seem to have … I wouldn’t say good but at least maybe ā€œnot terribleā€ … programming abstractions around concurrent operations. I’m looking forward to seeing if what Zig put on the table eventually proves itself as something novel and useful.

4 Likes

I did not participate in this discussion, because still I don’t use Io

in my network projects.

I hope that I can continue to use non-blocked sockets transparently on linux,mac, windows without using Io abstractions

1 Like

yes, my understanding is the std.posix layer of abstractions is gone, but the lower layer (obviously) isn’t going anywhere.

1 Like

These characteristics depend on the backing device of the file.

If it’s a network filesystem, well, you already named them.

But it could also be a tape drive. Tape drives are kind of the opposite of fast.

Or CDs. I would really say that they are reliable, although I guess that depends on the climate you live in/store your CDs in.

I’m confused about why you’d think Io vtable would use any RAM at all. In an embedded application the Io would obviously be static and known at compile-time. You’re not going to be using more than one Io implementation there are you? At the very worst it’d use flash/NVM space. But as long as the Zig compiler is implemented correctly for 1.0 (I’m not sure about the state of it right now), it should easily be able to optimize away the vtable such that all calls to its functions become direct function calls rather than going through pointers.

Shower thought: I’ve spent a lot of time looking at disassembled 32-bit ARM code, and I’ve noticed that even for static function calls, function pointers ends up taking a lot of space in flash. Every function will have an area with 32-bit constants, and you’ll often find a bunch of function pointers there. So if you have an application which uses Io a lot perhaps passing a shared vtable would reduce flash size. Could even improve performance on some MCUs

2 Likes

I had the same thought initially, but on an AVR flash is a separate address space. Right now, the VTable pointer is defined as *const VTable. This is a distinct type to *const addrspace(.flash) VTable. There’s been some discussion over on Zulip about how to handle this use case, potentially by adding an std_options.vtable_addrspace.

the VTable is going to live in rodata (or the target equivalent) always, right? regardless of the type of the pointer?

Right now if its put in flash, then that pointer can’t point to it. This isn’t just a type-system issue, it’s a fundamental difference in how AVRs address memory.

Ah, yeah that’s a challenge.

The compiler needs to use different instructions for accessing program memory. So it needs to know not just the pointer value but its address space, in order to emit the correct instructions.

Making it an option is probably a good approach. Perhaps the compiler could be made smart enough to optimize it automatically (could be generalized to constants in general). But putting too much effort into optimizations for archaic architectures is probably not worth it. I know AVR is still widely used, but as long as Zig is at least as good as C for that architecture I’d say it’s good enough.

Considering how bad C deals with that situation, I would hope that long term (even if it’s post 1.0) Zig becomes substantially better at it.

And considering that the addrspace modifier exists, I would argue that it can.

I was intending to comment on on #31073 - empty main fn size 122k on Mac OS 26.2 - ziglang/zig - Codeberg.org, but I eventually thought it would be too distracting to be appropriate for an issue tracker, so I will post it here. Sorry if it feels out of place.

The basic Io.Threaded vtable is instantiated by default here:

    /// The `Io` instance that `std.debug` uses for `std.debug.print`,
    /// capturing stack traces, loading debug info, finding the executable's
    /// own path, and environment variables that affect terminal mode
    /// detection. The default is to use statically initialized singleton that
    /// is independent from the application's `Io` instance in order to make
    /// debugging more straightforward. For example, while debugging an `Io`
    /// implementation based on coroutines, one likely wants `std.debug.print`
    /// to directly write to stderr without trying to interact with the code
    /// being debugged.
    pub const debug_io: Io = if (@hasDecl(root, "std_options_debug_io")) root.std_options_debug_io else debug_threaded_io.?.ioBasic();

As I was nerd-sniped into understanding the size of ReleaseSmall executables, I found that stack traces and debug prints still work on my machine (x86_64-linux-gnu) when all the ioBasic() vtable functions are set to undefined except these:

  • swapCancelProtection
  • lockStderr
  • unlockStderr
  • operate
  • dirOpenFile
  • fileLength
  • fileClose
  • fileReadPositional
  • fileEnableAnsiEscapeCodes

std.debug.print only requires the first four functions.

This has a nice side effect:

main.zig:

const std = @import("std");

pub fn main() void {
    std.debug.print("Hello, world\n", .{});
}
$ zig build-exe -O ReleaseSmall -fno-unwind-tables -fsingle-threaded --name main_before main.zig
$ zig build-exe -O ReleaseSmall -fno-unwind-tables -fsingle-threaded --zig-lib-dir ~/Builds/Zig/source/zig_lib_io_basic --name main_after main.zig
$ du -bh main_before main_after
58K	main_before
10K	main_after

Having a way to specify what should go into the ioBasic() vtable would be nice, even if I concede that it could easily break everything in numerous ways.

This is not a proposal in any way. I just wanted to share what I found in case executable size becomes critical. I still think devirtualization will eventually work.

6 Likes

I was looking into it as well but wanting linux only ā€œhello, worldā€. I went throw start.zig looking what part of it dragged in std.Io. It seems like you can disable it and program still works. But sadly only in ReleaseFast/Small modes. Since panicing requires debug Io.

const std = @import("std");

pub const std_options: std.Options = .{
    .signal_stack_size = null,
    .enable_segfault_handler = false,
};
pub const std_options_debug_threaded_io = null;

pub fn main() void {
    const hello = "Hello, world\n";

    _ = std.os.linux.write(std.os.linux.STDOUT_FILENO, hello, hello.len);
}
$ zig build-exe -OReleaseSmall -fstrip -fsingle-threaded ./main.zig
$ du -bh ./main
1.8K    ./main

Here is output if trying in debug

zig build-exe ./main.zig
/home/andrewkraevskii/.cache/zig/p/N-V-__8AAC_uTRUrhIpzwcTOMDh5tBuMQQ3cDzGRmhAegCJd/lib/std/std.zig:205:122: error: unable to unwrap null
    pub const debug_io: Io = if (@hasDecl(root, "std_options_debug_io")) root.std_options_debug_io else debug_threaded_io.?.ioBasic();
                                                                                                        ~~~~~~~~~~~~~~~~~^~
referenced by:
    defaultPanic: /home/andrewkraevskii/.cache/zig/p/N-V-__8AAC_uTRUrhIpzwcTOMDh5tBuMQQ3cDzGRmhAegCJd/lib/std/debug.zig:538:29
    panicExtra__anon_2687: /home/andrewkraevskii/.cache/zig/p/N-V-__8AAC_uTRUrhIpzwcTOMDh5tBuMQQ3cDzGRmhAegCJd/lib/std/debug.zig:463:27
    2 reference(s) hidden; use '-freference-trace=4' to see all references

Can probably just conditionaly give it Io so it still works in debug.

You could compile to -freestanding or -other

You may be interested in #31095 - Allow overriding std.Io at a namespace level. - ziglang/zig - Codeberg.org also which could let you skip vtables all together

1 Like

@AndrewKraevskii

But I like panics and debug prints :slight_smile:

@Cloudef

I am indeed interested. I was already waiting for it.


I want to clarify that I’m not currently experiencing a serious use case where a 1M executable is problematic. It’s just that sometimes I like to mess with the std lib (zig is very hackable).

std.Io is cool, and I also see it as a good and concrete motivation to seriously investigate restricted function pointers, which is the big deal IMO.


Just for fun:

$ cat main.zig
const std = @import("std");

const str = "Hello, world\n";

pub export fn _start() callconv(.naked) void {
    _ = @call(.always_inline, std.os.linux.syscall3, .{ std.os.linux.SYS.write, std.os.linux.STDOUT_FILENO, @intFromPtr(str), str.len });
    _ = @call(.always_inline, std.os.linux.syscall1, .{ std.os.linux.SYS.exit, 0 });
}
$ zig build-exe -O ReleaseSmall -fno-unwind-tables main.zig
$ du -bh main
792 main
$ ./main
Hello, world
$

792 bytes :slight_smile:

Can we make a smaller linux executable using only the zig command that print Hello, world\n and exit cleanly, without using assembly?

I think it should be possible, since:

$ readelf -p .comment main

String dump of section '.comment':
  [     0]  Linker: LLD 21.1.0 (https://codeberg.org/ziglang/zig-bootstrap.git 8868e80219bcf65d6eb576e4fa69a483a7cc65cc)

$ strip --remove-section=.comment main
$ du -bh main
608	main
$ ./main
Hello, world
$

The strip command is clearly cheating though.

3 Likes