Sketch of type-safer buffered IO

Zig’s Reader has this function:

/// Reads 1 byte from the stream or returns `error.EndOfStream`.
pub fn readByte(self: Self) anyerror!u8 {
    var result: [1]u8 = undefined;
    const amt_read = try self.read(result[0..]);
    if (amt_read < 1) return error.EndOfStream;
    return result[0];
}

Higher-level operations like “read a line” bottom out in calling this function in a loop (very approximate code):

fn readLine(reader: AnyReader, buffer: []u8) []u8 {
    for(buffer, 0..) |&byte, index| {
        byte.* = try reader.readByte();
        if (byte.* == '\n') return buffer[..index];
    }
}

This code has three performance bugs:

  1. In the worst-case, it does one syscall per byte of input
  2. It does one virtual call per byte of input
  3. It doesn’t use SIMD and is not vectorizable — there’s simply no slice of memory we can run SIMD over here

Now, the first (and only the first) issue can be fixed by wrapping a reader into a buffered reader, but that still leaves a couple of performance rakes lying dangerously around:

fn uses_reader(reader: AnyReader) !void

This signature gives raise to at least three distinct possibilities:

  1. The function isn’t using readByte-derived APIs, in which case it is fine to pass something like std.fs.File in directly
  2. The function does call something like readLine internally, so the caller must supply a buffered reader.
  3. Out of caution, the function internally wraps a reader into a buffered reader, so the user must not pass a buffered reader, as that would leave to unnecessary double buffering.

If the caller’s and callee expectations mismatch, there’s a perf bug! It’s also not hard to imagine a situation where a library gets refactored from 2. to 3. to fix perf issues for one user, creating new perf issues for other users who did buffer already.

I think the right solution here is to move byte-oriented API to a buffered reader, such that, if you want to call readLine, you function signature tells the caller that they need to supply a buffered reader. That’s basically how Rust Read, BufRead and BufReader are set up.

I don’t think I am quite ready to submit a PR to Zig repo with this (relatively large scale) change, but I couldn’t help but sketch the API this morning! Here’s the result:

This is the dynamically-dispatched part of the API. For the generic part, I think it basically boils down to rewriting the existing fn BufferedReader along the lines of fn GenericReader(

12 Likes