I’ve been working on a CPU ray tracer in Zig and recently ran into trouble when trying to parallelize it for WASM in the browser. Most browsers support sharing WASM memory between Web Workers, and setting up Zig for this use case is simple enough (following this comment [0]; and also setting lib.shared_memory = true). But workers sharing the same WASM memory will end up stepping on each other’s toes as they all use the same stack. As far as I understand, tools like Emscripten and wasm-bindgen-rayon provide some support for managing per-thread stacks (I found this discussion [1] to be very helpful). Does Zig have anything analogous? I’ve gone digging through github issues and the stdlib source code, but haven’t quite found anything.
I have seen the recently added experimental WASI threading support [2], but AFAIK this applies to wasm32-wasi and not wasm32-freestanding.
I’m also aware that not sharing WASM memory between workers will work for my use case, but I’m hoping to learn where things stand for shared memory specifically.
Oh, and one other thing. It’s now possible to enable a multi-threaded build for WASM by passing the -fno-single-threaded flag to zig build-lib (see this commit [3]). Is there an equivalent option in build.zig?
While trying to get Emscripten working with the Zig standard library I made a toy thread example. Even if you aren’t interested in the Emscripten toolchain, the build.zig in that project shows a way to enable shared memory and disable single-threaded mode for wasm builds.
Thanks for that, it’s a big help. Any idea how you can get this toolchain to respect the “export” keyword in zig to actually export a function from WASM? Emscripten is very annoying in how much it want’s to take over when all you want is a little pthread glue…
I don’t believe that it is possible to just take the pthread stuff from Emscripten and use it in a freestanding wasm project, it needs the Emscripten runtime to function.
If you just need to export a Zig function to be able to call it from JavaScript, you can use Emscripten’s ccall/cwrap functionality.
Outside of Emscripten I don’t know of any standard ways to do thread parallelism in the browser with wasm, and Zig doesn’t have its own bespoke implementation.
Thanks for the reply! Do you know how to get your toy example working with release builds?
When I compile it with -Doptimize=ReleaseFast, I get errors related to emscripten’s scaffolding for stack traces. I’ve seen you mention this issue in this github comment, where you recommend passing -sUSE_OFFSET_CONVERTER to emcc for safe builds specifically (reflected in the build.zig for your example). But for ReleaseFast, my understanding is that Zig shouldn’t be emitting the stack trace inspection code in the first place, so I’m not sure how to go about fixing the issue.
(Using -sUSE_OFFSET_CONVERTER for ReleaseFast does solve the problem, but incurs a huge performance penalty in the project I’m currently trying to convert from freestanding to emscripten.)
Good find, I guess I never tested release builds. Using the raw std.c pthread interface for threading seems to work fine in ReleaseFast without -sUSE_OFFSET_CONVERTER, so somewhere in std.Thread a return address lookup seems to be being generated even in ReleaseFast. I’ll keep looking and see if I can make a PR to fix the issue for emscripten targets, but in the meantime you can just directly call std.c.pthread_create, etc.
Thanks for the help! For now, I’ve found that replacing emscripen_return_address in the generated js with a noop before it gets imported to WASM works as a temporary hack.
If anyone is curious what a migration to emscripten can look like for a small but nontrivial project, see this commit for my raytracer. Although it was a little painful at times as someone who had never used emscripten before, the switch is definitely worth it if you expect to use a lot of memory and need to share between workers.
Got some time to look at this today and I believe it comes down to std.mem.Allocator calling @returnAddress in the member functions alloc, create, realloc, etc. The implementation of spawn defined in PosixThreadImpl uses std.heap.c_allocator and thus generates calls to @returnAddress. It seems that c_allocator just discards the return address parameter, so the call could likely be optimized out, but isn’t currently (maybe due to the vtable indirection).
The core issue seems to be that while emcc/llvm provide __builtin_return_address for Emscripten targets, it resolves to the expensive JavaScript function _emscripten_return_address. Currently any call to an Allocator interface is generating such a call out to JS.