Questions on the bootstrapping process

lhk · October 20, 2023, 6:25am

Hi,

I’ve been traveling and used this time to binge-listen to various zig related talks and podcasts.

As an aside, before the actual question in this post:
The material the zig community is producing is amazing.
So much so, that it’s a major motivation for me to learn Zig.
I really enjoy consuming programming-related content.
And the material from zig showtime and software you can love is something special.
At some point I thought “ok, this is amazing. Now imagine if you were actually using zig and somewhat invested into it yourself. This content would be even more fun”.
That’s actually a major motivation, learning zig so that the talks and videos you guys produce are even more fun.
Well and it feels good to dive back into a low-level language.

So, one podcast I listened to was this: Bootstrapping a Compiler via WASM with Loris Cro - Software Unscripted | Podcast on Spotify
Bootstrapping a compiler via wasm with Loris Cro.

I’d like to ask some follow-up questions, regarding the bootstrapping process of zig.
I listened to it while driving, so I may have missed some important details.
But as far as I understood:

Problem: We want to avoid compiling a whole chain of zig compiler versions, in order to arrive at the latest commit (for example when starting from scratch, or losing the binary somehow).

Key insight: The person about to commit to the compiler has a working version locally. How can we share this version?
Importantly: it’s not possible to just share a binary. That’s not cross-platform.

Potential fix: Compile the local version to C.
As far as I understood from the talk, the major blocker here is zig comptime.
Comptime is executed before outputting C code.
And among other things, it’s used to resolve platform specific configuration options.
So the local zig compiler would have to be compiled to different C codebases for all combinations of OS, arch, etc.
That is not viable.

Fix: Compile the local version of the zig compiler to wasm. Then if you want to have it on some other platform, you only need a wasm VM.

At this point I think I started being confused. The podcast explains that zig contains a minimal wasm VM.
But they also say that the wasm code execution can be sped up by compiling the wasm to C and then compiling the C to native code.
So, is the wasm VM used at all?
And what about comptime? The podcast presents this as a major problem when generating C code. Why is this not a problem for wasm? Particularly, since as far as I understand, the wasm code isn’t even executed as wasm, it’s converted to C. How can this not run into the problem of comptime already having been run?

squeek502 · October 20, 2023, 7:47am

https://ziglang.org/news/goodbye-cpp/

The wasm is no longer compressed, though. See:

mlugg · October 20, 2023, 11:55am

The link squeek posted has very good details, but I’ll try to answer your questions a little more directly.

The problem is not strictly comptime, but just the fact that the generated C code is target-specific. This might be because code does some kind of comptime thing like if (builtin.os.tag == .windows) { do something }, but it also comes up in a lot of other places. For instance, consider a piece of code which uses usize; the definition of that type varies depending on architecture, so the generated C code from the C backend will be tailored to a specific architecture. As you observe, this is undesirable.

To motivate WASM, first, imagine a different solution: what if we instead checked a binary for some arbitrary target (say, x86_64-linux-musl) into the repo, and told people to build the compiler using a VM (something like QEMU)? We could even provide that VM in the repo, so you only need things in the Zig repo to build the compiler. (Of course, this specific approach would be infeasible in practice since it would make the repo massive, but in theory it would work.)

But let’s say that this VM isn’t using any kind of hardware virtualization or clever trick, so it’s actually really slow. What we could instead try is taking the machine code in that binary, and converting it to C. Provided we trust the binary to not be doing anything nasty (since we might lose some sandboxing), this works well, because a single machine code instruction (on our chosen target of x86_64) has an exact meaning which we can encode in cross-platform C. For instance, if the instruction is adding two unsigned 32-bit integers, we can convert this to the C code (uint32_t)a + (uint32_t)b, which will work correctly regardless of the native target. If our VM is very slow, doing this conversion and compiling the generated C to native code could be much faster, because we can rely on the system C compiler to optimize the code for us - it won’t be perfect, but it’s better than a VM, which is effectively just an interpreter.

The nice thing WASM gives us is that it’s a really simple and small target. The program that translates the WASM binary to C, rather than being this monolithic and complex piece of software, can be implemented in a few thousand lines of cross-platform C.

So, in essence: the WASM code we generate is still target-specific (said target being wasm32-wasi), but we can very easily emulate the features of that target in cross-platform C, whereas doing the same for a more conventional architecture is infeasibly complicated.

kristoff · October 20, 2023, 5:25pm

Hi and thank you for your kind words!

I think @mlugg gave a great reply, so I’ll just add an even more simplified (albeit a bit hand-wavy) version for your convenience: it could be totally possible to move around the comptime logic in the Zig standard library to make it work with a special “multiplatform-c” target, it would just require a lot of work. By targeting wasm32-wasi we create a simple executable that just so happens to push all that platform-specific logic to the edge of the system (ie the interpreter, as the wasm blob expects to be provided with functions that can open files, print to stdout, etc).

So by targeting wasm32-wasi we get “for free” something comparable to the manual work I mentioned in the beginning.

The fact that the WASI is precompiled to C vs being run in a VM is just an implementation detail that has to do with performance. From a functional perspective, when running the WASM in an interpreter, the interpreter needs to provide the aforementioned I/O functions, while in the case of C precompilation, it’s up to us to provide it with the same I/O functions, just with the caveat that those functions need to work on the target machine.

So how do we achieve this last part? By writing those functions by hand as good old fashioned portable C code, as you can see here:

github.com

ziglang/zig/blob/master/stage1/wasi.c

#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#include "panic.h"

#define LOG_TRACE 0

enum wasi_errno {
    wasi_errno_success        = 0,
    wasi_errno_2big           = 1,
    wasi_errno_acces          = 2,
    wasi_errno_addrinuse      = 3,
    wasi_errno_addrnotavail   = 4,
    wasi_errno_afnosupport    = 5,
    wasi_errno_again          = 6,
    wasi_errno_already        = 7,

This file has been truncated. show original

The reason why this works in practice, is because the wasm Zig binary is very simple and doesn’t even need all the APIs that WASI specifies, just the ones that the compiler needs to compile itself. Maintaining this file is pretty easy once you’ve written it once (at the moment of writing this file hasn’t been touched in 7 months), which is immensely easier than a solution that messes around with the Zig standard library.

Hopefully that helps you get an intuition for what’s going on.

Luke · October 25, 2023, 9:46am

This, along with the original question and @mlugg’s answer, should be turned into a blog post somehow. It’s clear, concise and really is a “aha” moment.