Hi folks, no criticism here, just trying to understand stuff - have I done something wrong when porting to 0.16 or are those numbers among the expected baseline?
So, I have a collection of small apps that I use on my day-to-day that I’ve written in zig and I’m in the process of porting them to 0.16. One of such tools is true, which actually mimics the coreutils true only for the sake of having a binary I can point to - and for the fun of writing zig as well.
The same source (see below, but nothing very interesting) when compiled in 0.15.2 would take ~100μs whereas in 0.16 it takes ~300μs. I’ve consistently noticed this ~200μs increase throughout other binaries and went on to true for its simplicity. Turns out strace tells me there’s a lot more going on now wrt syscalls:
# zig 0.15.2
strace -fc ~/.local/bin/true
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0,00 0,000000 0 1 rt_sigaction
0,00 0,000000 0 1 execve
0,00 0,000000 0 1 arch_prctl
0,00 0,000000 0 1 prlimit64
------ ----------- ----------- --------- --------- ----------------
100,00 0,000000 0 4 total
# zig 0.16
strace -fc ./zig-out/bin/true
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0,00 0,000000 0 1 read
0,00 0,000000 0 2 close
0,00 0,000000 0 2 fstat
0,00 0,000000 0 8 mmap
0,00 0,000000 0 3 mprotect
0,00 0,000000 0 1 munmap
0,00 0,000000 0 1 brk
0,00 0,000000 0 2 pread64
0,00 0,000000 0 1 1 access
0,00 0,000000 0 1 execve
0,00 0,000000 0 1 sigaltstack
0,00 0,000000 0 1 arch_prctl
0,00 0,000000 0 1 set_tid_address
0,00 0,000000 0 2 openat
0,00 0,000000 0 1 set_robust_list
0,00 0,000000 0 2 prlimit64
0,00 0,000000 0 1 getrandom
0,00 0,000000 0 1 rseq
------ ----------- ----------- --------- --------- ----------------
100,00 0,000000 0 32 1 total
Both were compiled w/ --release=fast as usual.
The code:
const std = @import("std");
pub fn main() !void {
std.process.exit(0);
}
Note that this doesn’t use std.process.Init, so one could argue that having the eager std.process.Init struct injected would justify the "overhead”, but that wouldn’t exactly be the case here.
Again, don’t take this as a criticism as ~200μs is hardly a delay, let alone a problem, at least in my use case, but my curiosity got the best of me and I had to nerdsnipe investigate.
Thanks in advance!