I’ve used Zig’s tokenizer as a starting point for other tokenizers a few times. Based on what I’ve learned about pointer provenance (see Martin’s talk), it makes me wonder if changing this:
pub const Tokenizer = struct {
buffer: [:0]const u8,
index: usize,
pub fn next(self: *Tokenizer) Token { ... }
};
into something like this:
pub fn next(buffer: [:0]const u8, index: *usize) Token {
// ...
}
would make things more “optimizer friendly”? Since buffer
never changes but index
does, putting them into the same struct/provenance might make it harder to optimize? Is this a useful way to think about things? Should we all be designing our types to avoid combining constant and mutating data?
I created a benchmark here: PerfTest · GitHub
It compares the implementation from std
with a copy that removes the buffer
field from Tokenizer
and adds it as an extra parameter to next
.
As expected the performance between the two is almost identical…however…on macOS and Windows the performance between runs seemed pretty consistent, consistent enough to say there might be a possible 1% improvement in performance?
Machine |
Std Impl |
Modified Impl |
M1 Air |
0.44 - 0.46 s |
0.43 - 0.45 s |
Win10 Desktop |
0.444 - 0.448 ms |
0.440 - 0.441 ms |
On linux the variance was higher for some reason high enough that a 1% improvement might be undetectable. I haven’t tried looking at the assembly yet.
To share an additional datapoint, I grabbed the perftest and ran on an M3 (multiple runs shows the same result):
⚡hyperfine "./perftest std"
Benchmark 1: ./perftest std
Time (mean ± σ): 42.5 ms ± 0.3 ms [User: 42.2 ms, System: 0.2 ms]
Range (min … max): 41.9 ms … 43.6 ms 67 runs
⚡hyperfine "./perftest custom"
Benchmark 1: ./perftest custom
Time (mean ± σ): 43.3 ms ± 0.4 ms [User: 43.1 ms, System: 0.2 ms]
Range (min … max): 42.7 ms … 45.0 ms 65 runs
Which seems to slightly favor std.
Using a large input file like Sema.zig shows no meaningful difference.
Compiled with zig trunk.
Maybe the variance comes down to just system dependent changes to process data layout between the two implementations?
1 Like
In Martin’s talk all of the examples where memory region escapes/provenance resulted in pessimization involved functions calls that were opaque to the compiler (i.e. loading a virtual function or crossing a translation unit boundary). The Zig tokenizer is compiled in one unit, and as far as I can see it doesn’t even make function calls except to lookup in a static string map, so at a glance this doesn’t look like a case where provenance issues would crop up.
1 Like
Use hyperfine "./perftest std" "./perftest custom"
to get a comparison printed out as well.
1 Like
Thanks for sharing your data. I forgot I could use hyperfine on macOS and when I do so I’m actually seeing similar results as you on my M1, std seems to consistently perform slightly better. Not what I was expecting but very interesting nonetheless. Also strange I’m seeing the opposite on Windows. Unfortunately this little tangent only seems to have raised more questions then answers.
1 Like
If you haven’t seen it, this talk is something to keep in mind for this sort of stuff:
8 Likes
Yeah that’s the one I had in mind, the username bit is so funny
1 Like
Great video…should be “required reading (watching)” for every programmer 
1 Like
As a companion to this video, I give The Slow Winter by the inimitable James Mickens my highest recommendation.
Emery Berger explains the facts of life in this demon-haunted world. Mickens tells us where the demons came from.
3 Likes
Compilers can analyze each field in a struct as if it were a separate argument, so it shouldn’t matter. This is called field sensitive pointer analysis. Obviously, this is more costly in terms of compilation time than tracking the whole struct, which is why GCC exposes an option to tune how intensely GCC should do this: max-fields-for-field-sensitive.