I can never figure out where you are/aren’t supposed to use a buffer with new IO. It’s important because if you underbuffer, you get too many syscalls. But if you overbuffer, you get too many memcpys.
Like this code for instance, that hashes a file:
const file = try dir.openFile(cx.io, entry.name, .{});
defer file.close(cx.io);
var buf1: [1024]u8 = undefined;
var reader = file.reader(cx.io, &buf1);
var buf2: [1024]u8 = undefined;
var hashing: std.Io.Writer.Hashing(std.crypto.hash.sha2.Sha256) = .init(&buf2);
_ = try reader.interface.streamRemaining(&hasher.writer);
const hash = hashing.hasher.finalResult();
It uses 2 buffers, but I’m guessing the optimal code uses 1 buffer. But should that buffer go on the reader or the writer? And why? Can someone tell me the rule of thumb for this or help me develop my intuition?
I’m also interested to see what others say about this as I’m developing my thinking too.
I think in general, you want to size the first buffer (buf1) on read simply based on the tradeoff between # of syscalls and memory usage - so this should usually be pretty straightforward. But once the data is in userspace memory, you then should use context as much as you can for the next step.
Perhaps in this case std.Io.Writer.Hashing is actually not what you want - this interface makes it easy to generically get data from a number of different, maybe non-contiguous, maybe application-dependent data sources and get a hash at the end of it. Your context is much more specific than this - and it’s this knowledge that makes the reduced buffer optimisation possible - you know your data will be read in one contiguous chunk and hashed.
This means you can just read directly off the reader with something like
var hasher: std.crypto.hash.sha2.Sha256 = .init(.{});
while (true) {
const data = reader.interface.peekGreedy(1) catch |err| switch (err) {
error.EndOfStream => break,
else => |e| return e,
};
hasher.update(data);
reader.interface.toss(data.len);
}
const hash = hasher.finalResult();
Slightly more verbose, but I guess that’s the tradeoff for context-dependent performance enhancements! Interested if more experienced folk have more to say on this though.
(Sidenote - as written, your code doesn’t actually produce a valid hash! You need hashing.writer.flush() afterwards - but I suspect this is a shortened snippet)
Reader.Hashed may work as it is, without a writer at all, or you could optimize as described in the linked doc. (I haven’t used this, I’m just going by the doc.)
Unfortunately, there is no universal rule. Often, the only solution is try and error, experimenting and measuring. Timing, stability, and similar factors are the most critical metrics. I have been struggling for weeks with data transmission via two USB in parallel, where real-time performance, stability, and various other parameters are important. And depending on the specific device running my program, I encounter new surprises. Sometimes it’s extremely frustrating.
Conclusion: while there are theoretical approaches regarding the number and size of buffers, the actual practical values are what ultimately matter.