After migrating my (only) zig project from 0.14 to 0.15. I find that the program execution speed becomes slower. For a test case, in Debug mode, the program execution duration becomes about 0.3s (from v0.14’s 0.2s), and in ReleaseFast mode, it becomes about 0.032s (from v0.14’s 0.028s).
The debug mode time likely comes from the fact that Zig switched from the llvm backend to its self-hosted x86 backend which produces slightly worse instructions.
The release time difference appears small enough that it could be a measuring error. Unless you can narrow this down to a specific IO operation then it is very bold to say that the IO interface got slower.
I’ll also add the obligatory note that benchmarking is hard. At the very least it’d be helpful to use a tool like poop or hyperfine to determine if the measured performance difference is meaningful, but there’s countless other ways that benchmarking can go wrong.
Are you on x86? This might be a side effect of the new non-LLVM backend which is used in debug mode.
Doesn’t explain the slowdown in release mode though, but 4 millisecons for IO operations might also be noise.
The other thing that comes to mind is maybe buffering, e.g. accidentially doing unbuffered IO somewhere where the old IO system implicitly used buffered IO where the new IO system doesn’t.
It looks the poop tool doesn’t build with any minor toolchain versions.
My machine is Linux/x86.
I made the test case data set larger and still found that the 0.15 version is about 10% slower than the 0.14 version (ReleaseFast, 0.22s vs 0.20s). I use the time command to benchmark program execution time.
The code differences between 0.14 and 0.15 versions are all about io. There are three parts related to io:
read files from disk.
write files to disk.
write buffered files (0.14’s FixedBufferStream vs 0.15’s fixed writer).
I benched the three parts solely in some small zig programs, and find that 0.15’s fixed writer is a little faster than 0.14’s FixedBufferStream but don’t find obvious performance differences for the read/write-files parts. So I think the cause should be io unrelated.
I have no directions to find the cause now.
BTW: I actually don’t care much about the performance difference. The program is already much faster than my expectation. It can handle about 1000 .tmd files (Markdown alike) in 0.2 second!
It would be helpful if you could provide the code you’re testing. Running the two versions with strace -c may also provide some clues about possible differences.
Thanks. The repo is just updated. I built/run it and get the following results.
I’m not an expert on the numbers. So just FYI. My machine is Linux/amd64.
CPU is old (Intel(R) Core™ i5-4200M CPU @ 2.50GHz).
Weird, I’m not sure how to square the minor difference in instruction count with the large difference in cpu cycles. Also strange that the strace of tmd-15-fast looks strictly better than tmd-14-fast (always <= the number of syscalls when compared syscall-to-syscall (e.g. mmap count vs mmap count), and more than 1,000 fewer syscalls overall).
Might be time to bust out a profiler if you want to investigate further.
They are build from the develop (for tmd-15-fast) and zig-0.14 (for tmd-14-fast) branches with the zig build -Doptimize=ReleaseFast command. Test data is in pages folder here: tmd/documentation at develop · tapirmd/tmd · GitHub. I just copied those .tmd files about sveral hundreds of times to make program execution to 0.2s.