Benchmarking

mnemnion · June 10, 2024, 10:18pm

There might be a better way to phrase that, feel free to rephrase. I do think that the distinction I was drawing out there is fundamental to the difference between a benchmark and a profile, though.

They do! And there’s a lot of individual benchmarks as well. I also mentioned mean, median, and best time, let’s note that it says “single metrics” plural, not single metric, singular. I considered the word scalar, but figured it was overly technical for this kind of introduction.

Each of the metrics in that link is a single number. When you run a benchmark, you get one best time, one median, one mean, one envelope (say, 3x difference between best and worst), and so on. There might be hundreds of metrics, but it’s one measure per metric that gets reported. Each benchmark gets run a bunch of times, but the purpose of that is to reduce random sources of variance. Many columns, one row.

A profile, on the other hand, collects samples, as many as it can, so rather than one number, it’s on the order of one number per line, and per situation that line is executed in (so taking the rest of the call stack into account). What you end up with is data which can’t be evaluated on its own, you need to graph it to use it. Each benchmark value is a meaningful number referring to a single benchmark, even if you’re tracking a lot of numbers. One or several columns, many rows, different possible joins on the data.

Perhaps something like “for each aspect being measured by a benchmark, the benchmark will generally deliver a single measure of that aspect, after being run enough times to average out variance due to factors (cache warmups, branch prediction, various interactions with the host system) which aren’t inherent to the code being measured”.