Brainstorming about Io implementations for testing

I thought a little bit about how a race-detecting std.testing.io would work.

The idea is based around a single-threaded, task-switching implementation and the concept of a total ordering of all operations, which is logically modeled as a single integer. Each integer value represents one of the possible executions of the process with respect to the order of operations. Since the possibility space can be quite large, this integer potentially has many bits.

The key observation is that every async has two possibilities: immediate execution of the task, or delayed execution. Therefore, each async adds one extra bit into this total ordering.

By contrast, asyncConcurrent only has one possibility: delayed execution. So these don’t affect the total ordering (yet).

The next ingredient is when task switching/yielding. This happens due to I/O being incomplete, or due to synchronization primitives such as Io.await or Io.Mutex. Here, there may be nondeterminism introduced by the OS, but there is a queue of tasks, and some subset of them are ready to be switched to. So then you have log2(queue length) number of bits added to the total ordering corresponding to which task is removed from the queue.

Regarding OS nondeterminism, file system, networking, timers, etc can be mocked so that they use the Io interface primitives rather than talking to the OS. This makes them participate in the total ordering deterministically.

Putting this together, we have an integer that we can pass to a testing Io implementation that is capable of representing all possible execution paths through the program. It may be a large number of bits, in which case it will have to be chosen randomly (perhaps by fuzzer) rather than iterating every possibility.

What does this give us? I’m not sure actually. I think it’s a key ingredient in … something.

Some kinds of bugs I’d like to catch:

  • resource leaks (trivial)
  • data races
  • race condition logic bugs (accessing memory before/during calculation by a different task)
  • deadlocks due to logic bugs of mutex or other synchronization primitive usage
18 Likes

another thing worth testing is around cancellation. basically what i’d love to see from something in this realm of things fuzzing-wise is something along the lines of ā€œtest every sad pathā€

1 Like

I like this! I’ve always thought of the variance in execution order as a sort of implicit ā€˜input’ to a program, so it would make sense for the test runner/fuzzer to see it as such.

I also have an idea for a ā€˜lazy’ implementation of your concept, which should still provide the functionality you want. It would work like this:

  • The Io is given a big int called choice_source.
  • Every time an Io primitive needs to make a ā€˜choice’ of n possibilities (like which task should be removed from a queue), it divides choice_source by n and uses the remainder as the choice.

To iterate through every possible execution, you simply run the test with increasing choice_source values (starting at zero), stopping when the choice_source is nonzero at the end of the test.

For fuzzing, choice_source could just be a variable-length input provided by the fuzzer.

1 Like

I assume timers will be part of the io interface? I wonder what the testing timer implementation would look like, just increment 1 us every time the timer is read?

One thing I am wondering about is that it might be useful to keep the tree information about the n-way choice-points leading to new choice points, instead of just flattening that information as a number of bits.

Maybe the testing io implementation could capture this tree information by internally keeping track of the currently executed node and tracking which other choices are generated from it?

Basically I think that it may be beneficial to use different search strategies for going through specific resulting bit patterns.

  • Naive enumerate all
  • Depth first
  • Breadth first
  • maybe some sort of weighted scoring?
  • maybe a combination of different approaches based on tree depth, whether a branch was explored already etc.
  • monte carlo tree search Monte Carlo tree search - Wikipedia

While the tree approach would be more complex it might help with creating search strategies that are more likely to explore interesting cases quickly.

2 Likes

Two other things that come to mind:

  1. Deterministic ā€œlargeā€ tests (dare I say integration or end-to-end tests, depending on peoples definitions) - It sounds like this could allow some much larger tests to be deterministic ā€œfor freeā€ which is super powerful. This can be a big problem at places.

  2. Maybe dreaming, but could this enable more accurate bug reproducibility from debug or even user facing builds - e.g., if an app could make a bug report that includes this special number describing the order for the users session so that a developer could plug it back in when trying to reproduce a problem?

1 Like

Pretty much +1, that’s exactly how it’s done, I’ve explored some of the related things in

The super idea is that you want to parametrize your program by entropy: []const u8, and this gives you all of the following essentially for free:

  • Full exhaustive coverage via exhaustigen (zig implementation)
  • Property-based testing via minimization via FRNG (finite/fallible/fuzzing RNG, rust implementation)
  • Coverage guided fuzzing whith structured inputs.
6 Likes