Zig's Tokenizer and Pointer Provenance?

I’ve used Zig’s tokenizer as a starting point for other tokenizers a few times. Based on what I’ve learned about pointer provenance (see Martin’s talk), it makes me wonder if changing this:

pub const Tokenizer = struct {
    buffer: [:0]const u8,
    index: usize,

    pub fn next(self: *Tokenizer) Token { ... }
};

into something like this:

pub fn next(buffer: [:0]const u8, index: *usize) Token {
    // ...
}

would make things more “optimizer friendly”? Since buffer never changes but index does, putting them into the same struct/provenance might make it harder to optimize? Is this a useful way to think about things? Should we all be designing our types to avoid combining constant and mutating data?

I created a benchmark here: PerfTest · GitHub

It compares the implementation from std with a copy that removes the buffer field from Tokenizer and adds it as an extra parameter to next.

As expected the performance between the two is almost identical…however…on macOS and Windows the performance between runs seemed pretty consistent, consistent enough to say there might be a possible 1% improvement in performance?

Machine Std Impl Modified Impl
M1 Air 0.44 - 0.46 s 0.43 - 0.45 s
Win10 Desktop 0.444 - 0.448 ms 0.440 - 0.441 ms

On linux the variance was higher for some reason high enough that a 1% improvement might be undetectable. I haven’t tried looking at the assembly yet.

To share an additional datapoint, I grabbed the perftest and ran on an M3 (multiple runs shows the same result):

⚡hyperfine "./perftest std"
Benchmark 1: ./perftest std
  Time (mean ± σ):      42.5 ms ±   0.3 ms    [User: 42.2 ms, System: 0.2 ms]
  Range (min … max):    41.9 ms …  43.6 ms    67 runs

⚡hyperfine "./perftest custom"
Benchmark 1: ./perftest custom
  Time (mean ± σ):      43.3 ms ±   0.4 ms    [User: 43.1 ms, System: 0.2 ms]
  Range (min … max):    42.7 ms …  45.0 ms    65 runs

Which seems to slightly favor std.

Using a large input file like Sema.zig shows no meaningful difference.

Compiled with zig trunk.

Maybe the variance comes down to just system dependent changes to process data layout between the two implementations?

1 Like

In Martin’s talk all of the examples where memory region escapes/provenance resulted in pessimization involved functions calls that were opaque to the compiler (i.e. loading a virtual function or crossing a translation unit boundary). The Zig tokenizer is compiled in one unit, and as far as I can see it doesn’t even make function calls except to lookup in a static string map, so at a glance this doesn’t look like a case where provenance issues would crop up.

1 Like

Use hyperfine "./perftest std" "./perftest custom" to get a comparison printed out as well.

1 Like

Thanks for sharing your data. I forgot I could use hyperfine on macOS and when I do so I’m actually seeing similar results as you on my M1, std seems to consistently perform slightly better. Not what I was expecting but very interesting nonetheless. Also strange I’m seeing the opposite on Windows. Unfortunately this little tangent only seems to have raised more questions then answers.

1 Like

If you haven’t seen it, this talk is something to keep in mind for this sort of stuff:

8 Likes

Yeah that’s the one I had in mind, the username bit is so funny

1 Like

Great video…should be “required reading (watching)” for every programmer :slight_smile:

1 Like

As a companion to this video, I give The Slow Winter by the inimitable James Mickens my highest recommendation.

Emery Berger explains the facts of life in this demon-haunted world. Mickens tells us where the demons came from.

3 Likes

Compilers can analyze each field in a struct as if it were a separate argument, so it shouldn’t matter. This is called field sensitive pointer analysis. Obviously, this is more costly in terms of compilation time than tracking the whole struct, which is why GCC exposes an option to tune how intensely GCC should do this: max-fields-for-field-sensitive.