Hi, I am writing a small image parser and I finally had a chance to experiment with the new std.Io.Reader interface in Zig 0.15.2.
The C-programmer in me originally wanted to just take a []const u8 for the input data from the file; essentially forcing the callee to read the entire file before parsing. But, it occurred to me that a std.Io.Reader can achieve the same effect, while also supporting chunked reads semantically.
I tried to reason about why I might not want to use std.Io.Reader, because it triggered cognitive dissonance to the programmer in me that tends to avoid unnecessary layers of abstraction. However, I could not think of good enough reasons for why I should not use the interface for my parser.
For one, the callee can always do std.Io.Reader.fixed with the entire file contents if they want to limit file syscalls, and the buffer will therefore always be “above the vtable,” so any costs associated with dynamic dispatch can also be avoided. Brilliant.
The only potential downside I noted, was the fact that std.Io.Reader does not support all the memory operations I wanted. For example, there is no equivalent to std.mem.indexOfAny or std.mem.indexOfNone, which I would have liked to be able to parse for multiple different delimiter characters (i.e., std.Io.Reader.takeDelimiter with a slice instead of a character). I therefore had to work around the lack of this support with some less than efficient std.Io.Reader.peekByte style parsing.
Is there anything I am missing in terms of trade-offs or potential reasons for preferring a plain []const u8?
Speaking of my experience with using a Reader interface over a slice: it can add some complexity on allocating and also how to handle reading data that is larger than the buffer, but this is in the context of trying to implement the scanner in a forward-only, non-allocating manner and requiring the caller to allocate/dupe the values where necessary. Something akin to a SAX parser if that helps clarify.
When using a slice, this was a non-issue, so long as it was not freed, I could freely sub-divide it up into token values without any concern, but when you take a Reader, you can make no such assumptions. Once the reader position changes, you pretty much have to assume any previously buffered data is lost forever.
Yeah, I can see how it can get complicated as the use-case diverges from simple line parsing. I still like how the interface makes the costs apparent to the callee.
In my image parser’s case, I am taking an Allocator anyways and constructing pixel data in a separate container, which is the most semantically obvious outcome. Whereas, if I see a function taking a slice, it comes with the nuance that the slice might be used as a “view” and I should then be more careful with the lifetime of the buffer. In fact, I could see there being value in providing an API where it can take a Reader or a slice, with slightly different semantics.
Tokenizing/lexing also comes to mind as another good use of just a plain slice for that reason: it often makes sense to just maintain a sub-slice to token data, as done in Zig’s parser.
you can reader.buffered() to get the whole buffered slice, which you can then use inexOf** with.
dont forget to reader.toss(n) to tell the reader that you read n data.
You may want to reader.fillMore() to attempt to fill the buffer more before doing that.
I started with using [] const u8 for everything in my networking project. Rest assured that migrating from[] const u8 API to *std.Io.Reader API is very easy, as long as your parser does not require backtracking.
If you want to help prevent yourself from accidentally backtracking, start with the reader API. Keep in mind that you can also peek a slice (as long as the buffer is large enough), to use the std.mem functions.