Wasm32-emscripten now requires Emscripten filesystem emulation (side effect of new IO?)

I just noticed a new-ish emcc-link-problem when building the sokol-zig samples (GitHub - floooh/sokol-zig: Zig bindings for the sokol headers (https://github.com/floooh/sokol) · GitHub) for wasm32-emscripten (via zig build -Dtarget=wasm32-emscripten examples) using the latest Zig nightly.

This started to fail in the emcc linker step with a missing symbol required by the accept4 syscall:

error: undefined symbol: $SOCKFS (referenced by $getSocketFromFD, referenced by __syscall_accept4, referenced by root reference (e.g. compiled C/C++ code))
warning: To disable errors for undefined symbols use `-sERROR_ON_UNDEFINED_SYMBOLS=0`

TL;DR: this says that even programs that don’t call socket functions now depend on a socket syscall for accept().

This is a linker error because I’m linking with the emcc option -sNO_FILESYSTEM=1 which essentially disables the Emscripten POSIX IO emulation (e.g. the problem is easy to fix by just not passing this option to the linker - but this increases the ‘binary size’).

It made me think though: why does a Zig program that doesn’t even use socket functionality all of the sudden pull in the socket accept() function, when Zig has a ‘lazy’ compilation model which only builds reachable code?

Is this a side effect of the new IO being a virtual method interface which essentially kills dead-code-elimination?

If that’s the reason, will this problem (and IMHO it’s a pretty big problem) even be fixable? It would kinda suck if code that’s never going to be called is linked into each and every Zig program.

If this is not fixable in the compiler, are there plans for a ‘minimal/embedded IO’ which doesn’t require files and sockets? (or maybe a ‘modular IO’ that can be configured via build options - e.g. similar to how I can pass -sNO_FILESYSTEM=1 to Emscripten which promises that I will not call any code which depends on POSIX IO).

PS: resulting size comparison (uncompressed, in bytes):

Zig 0.15.2 with Emscripten -sNO_FILESYSTEM=1 and release=small:

clear.js: 23620 bytes
clear.wasm: 33443 bytes

Current Zig nightly with Emscripten filesystem emulation enabled, release=small:

clear.js: 58078 bytes
clear.wasm: 84700 bytes

Even though it’s “just” a couple dozen kbytes, for small programs this overhead is quite substantial…

Indeed, even standard hello world for x86_64-linux seems to bring ChaCha and other goodies. I don’t think zig can lazily reference struct fields, perhaps restricted function pointers can somehow figure out all the dead code? Not sure how the core team aims to tackle this, but for now the alternative is either to not use the Io interface or implement your own or use 0.15.2.

1 Like

I saw this PR mentioning lazy field analysis, could this help?

yes, i believe one of the motivations for the newly-added lazy field analysis in that PR is exactly for ease of use of std.Io as a namespace type without necessarily pulling in all the code from std.Io as an instantiable type.

of course, if you use a std.Io then we’re back to the problem again

1 Like

I don’t think so, as on source level Io is a interface and goes through *anyopaque pointer. It would require information from some sort of optimization pass (that can also figure out which type and storage location the pointer points to at that point of code at compile time).