Git + benchmarking workflow

xash · December 12, 2025, 9:53pm

Not really zig related, but the build.zig is probably part of the solution: I develop a DB and want to track performance and test compliance while editing stuff. Would love to see some comparison that staged is passing all tests and is currently -10% wall_time, +20% branch misses, etc. on benchmark A vs HEAD. Also not only HEAD, but all the git commits for that nice feeling of constant improvement of performance/test coverage. Currently I copy the last stable binary to old and then run GitHub - andrewrk/poop: Performance Optimizer Observation Platform against the currently built binary. But I’d like to run multiple benchmarks with shorter reports and have it more automatic.

Does anybody have something along these lines already built, that I could take a look into?

Cheers!

matklad · December 15, 2025, 2:36pm

Yeah, I also don’t have good workflows here which don’t require extra attention.

For ad-hoc local benchmarking (I am trying to make X data structure faster), one tip I have is:

always have a baseline (code from main without changes)
always name the baseline binary literally baseline (so, compile manually and then cp)

For making sure “refactors” don’t tank performance, I wasn’t able to get a good “CI gate” workflow, but “retroactive” works well enough — you basically track performance graph and, if you see that this week you’ve regressed something, you than go back and retroactively fix things. Three ideas to make this workflow work:

Although what you are drawing is a bunch of time series, the time can be virtual. You don’t have to run benchmark every time code gets into the master branch. You can retroactively measure past commits whenever you want. So, when building infra for that, I’d focus on “running the same benchmark across a range of commits”, rather than on “running current HEAD”. Similarly, don’t worry too much where you are storing the data, you can regenerate it on demand if needed with some time.
A good store for metrics is a JSON file in a git repository. Create a separate repo for storing metrics, and just use that. This is a good schema for json:

const JSONFile = []MetricBatch;

const MetricBatch = struct {
    commit_timestamp: u64, // Deterministic ordering
    metrics: []const Metric,
    attributes: struct {
        commit: []const u8, // what we were benchmarking
        machine: []const u8, // where we were benchmarking
        wall_clock_timestamp: u66, // when we were benchmarking
    },
};

const Metric = struct {
    name: []const u8,
    unit: []const u8, // ms, MiB, etc
    value: u64,
};

You can easily visualize this JSON on a static web page: https://devhub.tigerbeetle.com.
If you capture more than 3 metrics, it might be hard to figure out which metrics did change. I’ve implemented simplistic trend change detection while writing this comment, we’ll see how it goes: devhub: outlier detection by matklad · Pull Request #3423 · tigerbeetle/tigerbeetle · GitHub.

In genreal, I am super interesting in articles that describe “software engineering” aspect of benchmarking. Mesuring once and optimizing feels easy enough. What is hard is to make benchmarks a part of the process, without having a dedicated performance team.

zmitchell · December 15, 2025, 5:13pm

How does the infrastructure for your benchmarks look?

All of a dedicated machine
Dedicated machine with fixed resource limits
All of a VM from a cloud provider
etc

I think the internet has discussed at length how unreliable benchmarking in CI is so I’m wondering what set of tradeoffs you settled on at TigerBeetle.

matklad · December 15, 2025, 5:47pm

Just publicly available GitHub runners, lol We’ve only just transitioned to the stage where we stated to care about minor performance regressions, so we plan to switch to a dedicated Hetzner machine for benchmarking, but so far GitHub runners worked ok for spotting major changes, given that we are looking at the trends over multiple runs, rather than at any specific single execution.