When 0.16.0 was released and I started to wrap my head around std.Io, one of the first things I thought about was how it could be leveraged for deterministic simulation testing.
I haven’t had time to really dive into it yet, so bear with me! I’m imagining something that would mock all IO operations, return random failures (based on a seed), and deterministically schedule async and concurrent work. Through various simulations, we’d hopefully uncover race conditions, weak response / error handling, panics as a result of data corruption, etc.
What are some potentially tricky things I’m missing here? I don’t anticipate it being “easy” of course, but are there some really hard problems I’m unaware of in doing something like this? Something that comes to mind for me is how to manage preemptive thread scheduling. If I ask for some concurrent work to be done, I can’t actually start an OS thread (to be deterministic, the entire execution has to be single-threaded), so I would have to stop and restart threads at (seed-based) random times, in the same way an actual thread would behave, as far as the OS has control over it. Are we getting to “stackless coroutine” terriority here? Looking at std.Io.Evented, maybe we can take some inspiration from the backing Dispatch implementation? Is that specific to underlying OS primitives? Would be nice to be portable across at least Linux and macOS…
Potentially a massive topic, but very interested in getting your thoughts.
To be truly deterministic, you would have to not talk with the underlying OS at all.
For concurrent, your best option is coroutines indeed. There’s std.Io.fiber you can reuse, but it does not support all archs. When and if zig gets stackless coroutines then those can be used without having to worry about hardware support.
As for OS threads, it’s still possible to serialize their execution with locks/parking/unparking [1], but ideally you’d want to avoid any OS primitives here at all.
I doing some similar stuff. But the only IO I need to simulate is async, sleep, clock, and a raw socket. There still needs to be something that “ticks” the simulator for me, perhaps advancing the clock 1us at a time. But I havent gotten very far, I still need to wrap my head around how to use the networking part of Io.
I am interested if anyone has resources (like textbooks) they can recommend about the design of deterministic simulators.
Another problem which afaik you can’t solve by making Io deterministic are atomics. You’ll have to somehow patch their ordering between threads and also the results you’re getting. There are of course ways to do that, but it can’t just only be done in Io.
To expand on this, this interacts in an interesting way with polymorphism.
Right now, TigerBeetle implements message passing on top of TCP. In theory, we could mock the TCP in such a way that leads to an interesting message passing semantics. But the problem here is that, one day, we might want to switch to UDP, at which point we are doing entirely different syscalls, and our simulation no longer works.
By simulating the message passing layer directly, we get a much stronger property that TB works with any networking implementation, as long as that implementation is not completely broken. We can swap out our networking and be sure that that won’t bork our consensus.
We do have a “unit-fuzzer” for out networking component in isolation, and that fuzzer mocks at the level of IO. But the whole-system one doesn’t.