Deterministic std.Io

When 0.16.0 was released and I started to wrap my head around std.Io, one of the first things I thought about was how it could be leveraged for deterministic simulation testing.

I haven’t had time to really dive into it yet, so bear with me! I’m imagining something that would mock all IO operations, return random failures (based on a seed), and deterministically schedule async and concurrent work. Through various simulations, we’d hopefully uncover race conditions, weak response / error handling, panics as a result of data corruption, etc.

What are some potentially tricky things I’m missing here? I don’t anticipate it being “easy” of course, but are there some really hard problems I’m unaware of in doing something like this? Something that comes to mind for me is how to manage preemptive thread scheduling. If I ask for some concurrent work to be done, I can’t actually start an OS thread (to be deterministic, the entire execution has to be single-threaded), so I would have to stop and restart threads at (seed-based) random times, in the same way an actual thread would behave, as far as the OS has control over it. Are we getting to “stackless coroutine” terriority here? Looking at std.Io.Evented, maybe we can take some inspiration from the backing Dispatch implementation? Is that specific to underlying OS primitives? Would be nice to be portable across at least Linux and macOS…

Potentially a massive topic, but very interested in getting your thoughts.

7 Likes

To be truly deterministic, you would have to not talk with the underlying OS at all.

For concurrent, your best option is coroutines indeed. There’s std.Io.fiber you can reuse, but it does not support all archs. When and if zig gets stackless coroutines then those can be used without having to worry about hardware support.

As for OS threads, it’s still possible to serialize their execution with locks/parking/unparking [1], but ideally you’d want to avoid any OS primitives here at all.

1: For example here is port of a game that is written using bunch of busy loops, but the platform I ported it to needs to not loop forever, so I used a thread so I could yield the game when it’s waiting for events thus “asyncifying” it https://github.com/sorvi-platform/sorvi-elma-classic/blob/master/src/elma-classic.zig

1 Like

I doing some similar stuff. But the only IO I need to simulate is async, sleep, clock, and a raw socket. There still needs to be something that “ticks” the simulator for me, perhaps advancing the clock 1us at a time. But I havent gotten very far, I still need to wrap my head around how to use the networking part of Io.

I am interested if anyone has resources (like textbooks) they can recommend about the design of deterministic simulators.

4 Likes

Another problem which afaik you can’t solve by making Io deterministic are atomics. You’ll have to somehow patch their ordering between threads and also the results you’re getting. There are of course ways to do that, but it can’t just only be done in Io.

1 Like

For managing threads, see:

One subtle issue is that IO might be the wrong boundary for DST.

If the goal is to DST an existing non-DST-aware application, you might as well go all the way to antithesis and mock at the vm level.

If the goal is to co-design an application and DST harness, you’ll want to DST at the level of applications domain-objects.

TigerBeetle has IO, but we don’t mock IO. We mock storage (fixed sized sector aligned reads/writes) and network (message passing).

This allows us to inject domain-relevant faults directly: we can misdirect a message, rather than an individual network write.

7 Likes

To expand on this, this interacts in an interesting way with polymorphism.

Right now, TigerBeetle implements message passing on top of TCP. In theory, we could mock the TCP in such a way that leads to an interesting message passing semantics. But the problem here is that, one day, we might want to switch to UDP, at which point we are doing entirely different syscalls, and our simulation no longer works.

By simulating the message passing layer directly, we get a much stronger property that TB works with any networking implementation, as long as that implementation is not completely broken. We can swap out our networking and be sure that that won’t bork our consensus.

We do have a “unit-fuzzer” for out networking component in isolation, and that fuzzer mocks at the level of IO. But the whole-system one doesn’t.

4 Likes

I think this is a fair point! The dream is that Antithesis can move towards some combination of open-sourcing the hypervisor and creating more offerings “down market”, to make this available to more people.

It does seem like application-level DST is (generally) easier to implement and has those nice properties of being “protocol aware”, which can increase the effectiveness in certain test scenarios.

How much of that do you think is available to TigerBeetle, mainly as a result of the architecture? Given that each node in the cluster is single-threaded, it’s my understanding you can basically run those in-memory replicas “to completion” at each state transition. More specifically, you don’t have to interrupt the thread of any particular replica’s execution. I might be misunderstanding though.

If we’re looking at architectures that do have multiple threads of execution per node, the application level DST seems to get a lot more difficult, especially if one of those threads just blocks on reads from the network repeatedly. Maybe that just becomes an “interface” problem though, and you can abstract that away and still simulate in a single-thread, without mocking the actual IO implementation.

EDIT: Sorry, I just now looked at the link to your post on Properly Testing Concurrent Data Structures. That seems like it has some answers to what I’m getting at with the above. Thanks!

Yikes. Yeah that could get really difficult!

I know I’m a little late but this is exactly what I have been building (Marionette), and I’ve got some notes that might be useful to the thread!

Short version: the “deterministic std.Io” idea works, at least for the storage/single-threaded part. I’ve been running an unmodified append-only DB (xitdb, which already takes an io: std.Io) on top of a deterministic Io backend over a simulated sector disk, with zero changes to its code, with crash/torn/lost-write fault injection and seed-reproducible replay. It found a recovery boundary (an 8-byte committed-size header that recovery assumes is written atomically, true on any 512/4096 disk, breaks only at sub-sector geometry, so not a real-hardware bug, but a precise atomicity assumption Marionette could pin down). The “run real code unmodified and reproduce from a seed” part is possible and actually useful. I also found a slew of bugs on a less mature database, which is obviously less interesting but good to know that it validates that this idea works.

The thing I didn’t expect and the most useful info I can offer is that a deterministic Io is necessary but not sufficient. You have to hunt uncertainty leaks (this is the same lesson madsim paid for, they had to intercept libc and patch dependency forks, because randomness and time escape through anything that bypasses runtime). The advantage in Zig is that std.Io is a way more comprehensive seam than anything Rust has, but it’s still not quite total. Anything a SUT or its deps reach for directly leaks nondeterminism past your Io and silently breaks replay. The cheap defense, which madsim/S2/FoundationDB all independently built is a meta-test that reruns the same seed and diffs the trace. If two same-seed runs diverge, something leaked.

@matklad’s point upthread is the one I’d underline hardest, because I backed into agreeing with it. Mocking at the Io layer is a great on-ramp, you get to run code that never heard of you. But the deep faults want the domain boundary, misdirecting a message is a more useful fault than corrupting a write, and it survives swapping the transport. So I’ve ended up at two tiers rather than one, Io-level for breadth/adoption, and a message-passing + sector-storage layer (à la TigerBeetle) for depth, and I’m specifically not throwing away the message layer in favor of mocking std.Io.net sockets.

@Cloudef got the concurrency answer, std.Io.fiber is reusable and the arch caveat barely matters for a CI/dev-machine simulator. The thing I’d add though is that the fiber primitive is the easy part and the deterministic scheduler on top is the work. And take Shuttle’s lesson over Loom’s, seeded randomized interleaving exploration, not exhaustive, because exhaustive is factorial and intractable, and randomized catches almost all non-adversarial bugs and parallelizes trivially (see the PCT paper, “A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs”).

If anyone wants to see my work, you can check out Marionette (linked above) and its docs :slight_smile:

15 Likes

If I may ask a somewhat unrelated question to the thread: how much are error-details passed through (say) the storage abstraction from the actual IO to the rest of the application?

More concretely: if you try to read a block, and get some specific error on some specific platform/fs, does the main application care about the exact nature of the error? Or is it simply abstracted to “we couldn’t read, thats all we need to know”. I know for tigerbeetle you do data repair so this is a bit different than most single-noded storage engines would be, who perhaps couldn’t do anything. For my (non-distributed) storage engine that I’m writing/tinkering with my instinct is often “we don’t care: fail immediately”.

I think it often boils down to: can you do anything about the error? if there’s really nothing you can do to correct/recover, then you don’t care what the error is. Only success or failure matter.

1 Like

Wow, this is great. Honestly, if I had the time to pursue this further, I’d start with what you have in Marionette. Nice work!

1 Like

Oh wow!!! This takes away SO MUCH work I was expecting to have do before even getting started building my database application!