I think I am misunderstanding the way buffering works and I would really appreciate some help clearing it up.
I have the following file, which contains the letters of the alphabet:
$ cat file
ABCDEFGHIJKLMNOPQRSTUVWXYZ
I expected the following program to run fine. It doesn’t. It will only work if I increase the size of the buffer. This is a simplified example, in my actual code I’d end up needing a buffer the size of the file even though I’m only ever reading a few bytes at a time.
I am either misunderstanding how buffering is supposed to work (I thought this would be fine as I am never trying to read, or even toss, more bytes at a time than the size of the buffer), or my understanding is correct by my code is wrong.
I have already read the post on IO in 0.16 here and I don’t see anything that contradicts what I’m trying to do.
toss only removes data from the buffer ant it asserts the buffer was not empty.
takeByte will only fill the buffer if there is no data.
What happens is:
takeByte, the buffer is empty so it is filled, it probably is full now since it is small
toss 2 from buffer, which now only has 1 byte,
takeByte, the buffer has a byte so it takes it, it is now empty,
toss 2, the buffer is empty, it trips the assert.
instead of toss, use discard or discardShort, that will consume from both the buffer, and the underlying source if the buffer is empty.
If anybody can comment on the soundness of using this, this way, I’d be grateful. The documentation on this,
Generally, Zig users are encouraged to explicitly initialize all fields of a struct explicitly rather than using this function. However, it is recognized that there are sometimes use cases for initializing all fields to a “zero” value. For example, when interfacing with a C API where this practice is more common and relied upon. If you are performing code review and see this function used, examine closely - it may be a code smell. Zero initializes the type…..
… suggests that this is, at least, not the best way to do this. It’s not idiomatic, anyway. At the very least, I’d declare your buffer type a [4]u8 in the straighforward way.
var buf: [10240]u8 = undefined;
Too, since this is a read buffer, I think you shouldn’t have to worry about initializing the data. Just make sure that you’re only accessing indices that are actually being filled with your read operations.
@jmctagger Ah, I originally had undefined and do for most of my buffers. I don’t know why I used std.mem.zeroes for this one. It might be something I did in a wild attempt to solve my problem before I understood where it was coming from, and then I copied the same thing into my minimal example without thinking about it.
Thanks for pointing it out. I don’t think what you quoted about structs applies in this case though. Would something else be preferable if I really did want an array initialised with zeroes?
It’s a shame you can only know that by reading the library source. This is an example where the documentation is poor and misleading. If you look at the main Io.Reader page you see:
pub fn toss(r: *Reader, n: usize) void
Skips the next n bytes from the stream, advancing the seek position. This is typically and safely used after peek.
discardShort(), discardAll(), and toss() all say they do the same thing on that page – “Skips the next n bytes from the stream, advancing the seek position”. The only difference is the errors, or lack of them.
If you then click through to the toss() page you see it continues:
Asserts that the number of bytes buffered is at least as many as n.
The “tossed” memory remains alive until a “peek” operation occurs.
That’s pretty important information that’s been clipped off. The function assumes it won’t hit the end of the buffer.
This is something that I have very much struggled with as it comes to the 0.16.0 documentation. There is a lot here and no real consumable way of learning how to use it. And reading the source, while doable, is pretty advanced zig.
You’ll get used to it pretty easily. My programming level is medium at best. But going through documentation and source code is useful, relatively easy, and can even help you understand standard paradigms.
The only thing I struggle with is following up vtables.
In general I find the content of the documentation to be excellent when present, and the source code is pretty clear when needed (I think it helps a lot that the source for std is concise). But I definitely was caught out by the way the preview was truncated. It didn’t occur to me that there might have been more to the description.
If I’m going to say something on the topic of documentation, the delay to open the docs can be really annoying. It seems to be downloading a tar file? I think it would make more sense to just serve the static site.
It downloads the source code, parses and renders it (reusing the zig parser compiled to wasm), which is actually smaller then creating individually rendered pages or saving it as some sort of data format that can be read in. (Also avoiding code duplication for a separate javascript implementation)
The only downside is that the initial download is a bigger chunk (so avoid it by instead running zig std locally), might be cool to have some way to only load a small part of the data (basically only the data needed for the root page) directly in the autodoc page, so that the initial page could be shown directly while the data is loading.
Another idea that might be interesting, is to investigate whether anybody has created something like a random-access / seekable tar alternative that allows you to access all the data in a way that is optimized for streaming it in (while still retaining some of the compression benefits?).
(Maybe it would also be enough to put the most important data at the beginning of the tar file and have a streaming implementation that doesn’t wait for download completion? (but I haven’t investigated the details))
https://github.com/sorvi-platform/sra-archive
Though I only compress the header. This format is used for my platform for compressing the executable + assets. I don’t do whole archive compression because I believe most assets already have their own compression (I do compress the ELF files though).
That said, this format is probably not streaming friendly (without something like range requests at least) as the index is at the end of the file (like in ZIP as well).