I think you need to start using more tooling for example sampling and tracing profilers.
Maybe try to use tracy.
I think if you want to make progress you need to make more detailed measurements, better benchmarking tools, or profilers can help with that.
If the speed test measurement itself doesn’t have a problem, then there still could be a number of things causing different throughputs (ideally you would measure more than just throughput):
- program being scheduled and switched between different cpus mid execution, causing varying additional latency added by the OS (should be visible with profiling tools)
fix: pin the program on a single core, or use tools that distinguish between wall clock and actual runtime - running the program on cpus with varying clockspeeds
fix: set the cpu clock speed to a fixed value, during program execution - address space layout randomization, causing different things in the program to be slow
fix: have enough independent runs so that the variance between different runs gets washed out through determining the mean (ideally the profiler would re-randomize the layout during the run of the program)
For benchmarks you also want a tool that is able to record system calls, cache misses, branch misses etc. (so that you have an easier time spotting possible causes)
If you find good tools to use under windows you can also add those to the tools section of this doc, or add a comment:
Some people on random forum posts I found, have mentioned Windows Performance Analyzer | Microsoft Learn so that is another thing you could try, I am not on windows so I haven’t used it.