When to use read() and readAll()?

timfayz · March 17, 2024, 9:24am

Reader interface defines the following standard methods: read() and readAll(). Here is how they are defined:

/// Returns the number of bytes read. It may be less than buffer.len.
/// If the number of bytes read is 0, it means end of stream.
/// End of stream is not an error condition.
pub fn read(self: Self, buffer: []u8) anyerror!usize {
    return self.readFn(self.context, buffer);
}

/// Returns the number of bytes read. If the number read is smaller than `buffer.len`, it
/// means the stream reached the end. Reaching the end of a stream is not an error
/// condition.
pub fn readAll(self: Self, buffer: []u8) anyerror!usize {
    return readAtLeast(self, buffer, buffer.len);
}

/// Returns the number of bytes read, calling the underlying read
/// function the minimal number of times until the buffer has at least
/// `len` bytes filled. If the number read is less than `len` it means
/// the stream reached the end. Reaching the end of the stream is not
/// an error condition.
pub fn readAtLeast(self: Self, buffer: []u8, len: usize) anyerror!usize {
    assert(len <= buffer.len);
    var index: usize = 0;
    while (index < len) {
        const amt = try self.read(buffer[index..]);
        if (amt == 0) break;
        index += amt;
    }
    return index;
}

I’ve read the above several times. It seems the read() is the most low level primitive, which does the read “once”, giving what it was able to read in one pass to the destination buffer. In its turn, readAll() tries to fill the destination buffer as much as it can, iteratively invoking read(). That’s why the only way to know whether readAll() reached EOL is to check weather it read less than the dest buf. In case of read(), we can’t rely on the number of bytes written because read() could be in very small chunks (thus, 0 is the only reliable indicator of EOL).

Yet, I might be wrong regarding the above and it would be super nice if anyone could give examples where each of the methods shines in practice. For example, when I read a file into a buffer of 1024 bytes length, should I use read() or readAll()? (Srry if the question is too dump )

cancername · March 17, 2024, 11:50am

Use readAll to avoid short reads. In this case, readAll should be used if you want to actually read as much as you can and process it in one go, but on the other hand, if it’s in a loop like this:

while (true) {
    var buf: [1 << 10]u8 = undefined;
    const len = try std.io.getStdIn().reader().read(&buf);
    if (len == 0) break; // end of stream
    // do stuff to buf[0..len]
}

It’s fine to use read, because you don’t really care about how big the pieces you’re processing are.

dimdin · March 17, 2024, 11:54am

Yes, read() is a system call.
read blocks until there is something to read, it can fill partially the buffer with whatever is available.

readAll calls one or more times read until the buffer is full.

To fill a buffer you should use readAll().

dude_the_builder · March 17, 2024, 12:11pm

Note that if you want to read an entire file in one go, std.fs.Dir has readFile and readFileAlloc methods to do just that easily.

If you want to repeatedly read from a reader more efficiently (by reducing syscalls using buffering) you can use std.io.bufferedReader like this:

var file = try std.fs.cwd().openFile("foo.txt", .{});
defer file.close();
var buf_reader = std.io.bufferedReader(file.reader());
const reader = buf_reader.reader();

// do stuff with reader

timfayz · March 18, 2024, 11:32am

Thanks all for valuable answers!

Just because I haven’t reached that part yet, I’m not sure what the use case of that might be, but I assume if I want to read a file in small chunks “as I go,” then the buffered reader will improve the read speed by reading into its internal buffer first and then dispatching it to me without touching the read() syscall.

Got it.

It’s interesting to note that I decided to check the slightly modified code you provided, and it didn’t behave the way I expected:

var buf: [4]u8 = undefined; // len 4 to accommodate "exit"
while (true) {
    const len = try std.io.getStdIn().reader().read(&buf);
    if (std.mem.eql(u8, buf[0..len], "exit")) break;
    std.log.debug("{s}", .{buf[0..len]});
}

When I type 12345678\enter, I get:

[] 12345678
debug: 1234
debug: 5678
debug:      <- a newline here

[]          <- the typing cursor

I could understand how is that possible that the loop streamed the thing I typed in chunks. Then I typed exitfoo\enter and got:

dev/tmp $ zig run file.zig
[] exitfoo
~                           <- program exited                                                                                                                                                                               
/dev/tmp $ foo              <- term typed foo\n afterwards                                                                                                                                                                               
zsh: command not found: foo
~  
/dev/tmp $ []

It seems the terminal gathers the input I type and then dispatch it… dispatch how? (here I’m not sure how to explain it more technically).