Simple test runner for Zig 0.16.0 wanted

pzittlau · May 2, 2026, 3:57pm

This hardcoding of any err log to leading the test to fail is also somewhat contradicting to what the documentation for std.log.err says(emphasis by me):

Log an error message using the default scope. This log level is intended to be used when something has gone wrong. This might be recoverable or might be followed by the program exiting.

You then can’t really test the recoverability, except by putting everything into a single function, which more often than not isn’t possible.

matklad · May 2, 2026, 7:04pm

For TigerBeetle team, the gripe is that printf-debugging is a pain in the back. We get “so, again, how do I printf-debug a handing test” message once every couple of months in our slack. The behavior of capturing/buffering stderr makes debugging hangs or timing/performance related issues painful.

I’ve since learned that a good solution here is to run test binary directly. Eg, zig build test:build && ./zig-out/bin/test instead of zig build test, where xxx:build is our local idiom for an install counter-part of the Run step. But it took me a great deal of annoyance to:

figure out that coaxing the Run step to do what I want is futile, and that I should run the binary directly
actually put that knowledge into long-term memory, instead of re-deriving it from scratch every time.
find out a good pattern to get at the test binary (test[:(unit|integration)[:build]] naming scheme)

But it’s a minor issue, really, overall we are very happy with the build-in runner. I in particular really appreciate tests failing on non-empty stderr, because printf debugging a test which dumps to stderr even if it passes is even a bigger pain!

EDIT: and, yeah log.err being an unconditional failure is also unhelpful, we have the following to cancel-out that behavior:

github.com/tigerbeetle/tigerbeetle

src/stdx/stdx.zig

c75762b28


      
          pub const log = if (builtin.is_test)
              // Downgrade `err` to `warn` for tests.
              // Zig fails any test that does `log.err`, but we want to test those code paths here.
              struct {
                  pub fn scoped(comptime scope: @Type(.enum_literal)) type {
                      const base = std.log.scoped(scope);
                      return struct {
                          pub const err = warn;
                          pub const warn = base.warn;
                          pub const info = base.info;
                          pub const debug = base.debug;
                      };
                  }
              }
          else
              std.log;

andrewrk · May 2, 2026, 8:30pm

Thanks for writing this up @matklad and others in this thread.

Is there an issue open to track this? I get the feeling that everyone thinks it’s impossible to improve the test runner for some reason?

zmitchell · May 2, 2026, 8:35pm

I get the feeling that everyone thinks it’s impossible to improve the test runner for some reason?

I think this is an instance of a pattern I frequently see on Ziggit where people assume the current state of thing, in all of its details and behaviors, is an intentional choice instead “that’s how it is right now.”

I think that comes from Zig being both young (there’s a lot of things in a “pretty good, could be better” state) and having made very specific design choices (for valid reasons) that can be surprising at first. So “surprising” often becomes “surprising, and it’s intended to be this way.”

theo · May 2, 2026, 8:50pm

It didn’t become clear until well into the thread that a lot of people seem not to like the hardcoded fail on .err logs. But now that that is apparent, it probably would be worth someone’s time to open an issue.

theo · May 3, 2026, 8:12pm

I opened an issue for this; don’t know what its fate will be.

It caused me some confusion that zig build test fails on any print to stderr, while zig test src/foo.zig fails only on log.err:

Output source	`zig test src/foo.zig`	`zig build test`
`std.log.info`	suppressed, exits `0`	suppressed, exits `0`
`std.log.warn`	prints, exits `0`	prints, exits `1`
`std.debug.print`	prints, exits `0`	prints, exits `1`
`std.log.err`	prints, exits `1`	prints, exits `1`

So there are separate issues involved. But the log.err handling would be simpler to change.

dennypenta · May 3, 2026, 9:08pm

my main issue is having a tty flag supported. it’s very important to be able to parse tests output, e.g. when one builds an editor plugin to have a consistent output across devices (terminal and a neovim job)

also I noticed 0.15 used to print test names, 0.16 doesn’t

theo · May 3, 2026, 9:26pm

Yes, I also noticed that. As of 0.16 the default runner can easily generate zero output if everything passes, making you wonder if the tests actually ran. I take it it’s supposed to be implicit in the 0 exit status, but still…

xydone · May 4, 2026, 3:09pm

This is somewhat related to the conversation, would probably warrant a different enhancement issue on the tracker if accepted, but felt like I should mention it here before I make an issue. Note that this is just an idea and I have no clue if it would be feasible in the compiler, as I am not a compiler dev, but I believe it should be?

Imagine the following:

fn foo() !void { 
    try bar();
}

test foo {
  try foo();
}

fn bar() !void {
   return error.Bogus;
}

test bar {
   try bar();
}

We can see that bar() will always fail and the decltest for it will always fail. As far as I’m aware, builtin.test_functions is ordered by the compiler evaluation order, which in my experience has meant that test foo will fail before test bar does (NOTE: this is based on 0.15.2, I have limited 0.16 testing experience, but AFAIK it hasn’t changed)

I propose that a test graph is generated, where each node is a test, with the goal of executing leafs first, then the parents of the leafs, the parents of the parents of the leafs and so on, with potentially the ability to do the reverse of that, first evaluating higher level test cases and going down.

This has the advantage of, for example, in a HTTP server if your database query is failing, your database tests would fail before your endpoint implementation tests do, thus making it easier on the reader to determine who’s fault it was.

This works the most straightforward with decltests as the compiler would be able to go "oh, foo calls the function bar and we have a decltest for bar", as opposed to a regular test where it is more implicit what the test block is doing and the compiler has no clue on what to do.

I have written a custom test runner for this a while back, but it’s not as complicated as it is just for one usecase only, so it looks something like

    for (builtin.test_functions) |t| {
        const name = makeNameFriendly(t.name);
        if (isEndpoint(name)) { // std.mem.startsWith(u8, test_name, "Endpoint |");
            try endpoint_queue.test_functions.append(allocator, t);
            continue;
        }
        if (isModel(name)) { // std.mem.startsWith(u8, test_name, "Model |");
            try model_queue.test_functions.append(allocator, t);
            continue;
        }