I want to write a micro benchmark for a tiny utility at TigerBeetle, a unit-test of a benchmark. I think I know what I want, but I don’t see how can I achieve all that I want without hacks. Curious to hear if anyone solved similar problems before.
- The purpose of the benchmark is ad-hoc sanity checking. E.g., if I touch the code with a refactor, I want to be able to include before/after in the commit message. Or, when upgrading the compiler, I want to sanity check that the performance didn’t regress.
- I specifically do not want to do continuous benchmarking, graphs, automated regression detection or the like. I just want one number when I ask for it.
- I do not want to touch my
build.zig. If I have 10 microbenchmarks, I don’t want to create 10 separate Zig modules/binaries just so that I can run them. - I want to run then benchmarks with small size in debug mode every time I run the tests, I want to prevent benchmarks from bitrotting
- Really, I just want to use
test "binary_search: benchmark" {as my interface, it’s just perfect for that, the same way that fuzz tests re-use the same interface and just run passed in corpus. - In “test” mode, I want benchmark tests to be silent.
- In “benchmark” mode, benchmarks should display results on stderr. It’s up to the benchmark to display results
- I also want to be able to get a binary which runs a single benchmark only, such that I can plug the binary into
poopif I am curious about some metrics beyond just time. - But I still want the benchmark to make its own internal measurements, such that it can separate setup costs from the actual benchmarking loop costs.
I think the surface API for this thing could look like this:
const bench = @import("../support/bench.zig");
test "benchmark: binary_search" {
const gpa = std.testing.allocator;
var b = bench.init();
defer b.deinit();
const element_count = switch (b.size()) {
.smoke => 128,
.default => 10_000_000,
.explicit => |n| n,
};
var array = generate_array(gpa, element_count);
defer gpa.free(array);
var searches = generate_searches(gpa, array);
defer gpa.free(searches);
var hash: u32 = 0;
{
b.start();
for (searches) |key| {
hash += binary_searhc(array, key);
}
b.finish();
}
b.print("hash {}", .{hash});
b.print("elapsed {}", .{b.elapsed});
}
Which you chan then run as zig build test -- "benchmark: binary_search".
On the build.zig side, we spy on the test filter name, and, if it includes benchmark, we inject options into the build saying “we are in the benchmark mode”
On the bench.zig side, we look at those options to see if we are in unit-test or benchmark mode. If we are benchmarking, then bench.init flips std.testing.log_level to info (and deinit flips it back). bench.print is log.info in disguise. And b.size is .smoke in test mode, .default in benchmark mode, and -Dbenchmark_size=1_000_000 if that is passed on the CLI.
Is there a better way to do what I want here?