On the std.Io.Reader interface

I am currently working on translating some old exercises I did during a university course from C to Zig for the reason that you can get through quite a few edge cases there (this course was actually pretty darn good, especially in retrospective). In case you ask why I’m doing that: late night procrastination mostly.

In one exercise one needs to read in from stdin line by line and do something with each line. And if the line is above a certain length, ignore that line, log it and continue.

That sounds simple, and with fgets actually is.

With std.Io.Reader as it turns out, not so much.

Discarding is quite easy, call discardDelimiterInclusive and you are done (if you didn’t take the delimiter of that line already).

But for getting there, you have two choices: peekDelimiter* and takeDelimiter*. These don’t make much difference, but the real problem comes from choosing either the inclusive or the exclusive variant.

Let’s go over the inclusive one first. It is fine for the common case. But if the last line does not end with a line feed, you suddenly get an error.EndOfStream without receiving the last line. So now I have to figure out from the buffer of the reader if that’s because it’s at the end, or because the last line doesn’t end with a line feed.

So, what about the exclusive one? It is also fine for the common case and does not have the problem from the inclusive variants. But it gets a new problem. If a line is empty, you get error.EndOfStream instead of a slice of length 0. So now I have to figure out in the buffer of the reader if that’s because I am at the end of the stream, or because the current line is empty.

Both things are really annoying to deal with. Doable (in my case a lot of error handling was put into the condition of the while loop instead of the while’s else clause), but imo unnecessarily complicated.

I can maybe understand why one might want the behaviour of the inclusive variant (even if I think that people will silently loose data from not noticing that). But the exclusive one? Why not just return a slice of length 0 or a different error if a slice of length 0 isn’t wanted? That way you could separate both cases a lot easier.

So, what do the others think about this?

1 Like

error.EndOfStream is only returned a) if it is the actual end of the stream and b) if there was no unread data before the end.

correct me if im wrong, but it sounds like you want to differentiate between "a\nb\nc" and "a\n\b\nc\n". with takeDelimiterExclusive you have to remove the delimiter from the stream, usually with toss, but there is nothing stopping you from use takeByte which will either give you the delimiter or an error which could be EndOfStream. You can use that to detect if there is an empty next line and handle it.

That is more reliable than looking at previous data in the buffer, as the reader might remove it.

because you need to be able to differentiate between the end of the stream and having the delimiter being the next thing in the stream.
If you want both, make it return a zero length slice the first time it hits the end of the stream, then the interface needs to track if the previous call returned a slice or an error.

There is also takeDelimiter, which is exclusive, but also removes the delimiter from the stream, so you don’t have to. It also returns null instead of EndOfStream, this is nice for while loops. But it also prevents you from reliably detecting what you want, so I would advise not using.

Does this meet your need?

/// Returns a slice of the next bytes of buffered data from the stream until
/// `delimiter` is found, advancing the seek position past the delimiter.
///
/// Returned slice excludes the delimiter. End-of-stream is treated equivalent
/// to a delimiter, unless it would result in a length 0 return value, in which
/// case `null` is returned instead.
///
/// If the delimiter is not found within a number of bytes matching the
/// capacity of this `Reader`, `error.StreamTooLong` is returned. In
/// such case, the stream state is unmodified as if this function was never
/// called.
///
/// Invalidates previously returned values from `peek`.
///
/// See also:
/// * `takeDelimiterInclusive`
/// * `takeDelimiterExclusive`
pub fn takeDelimiter(r: *Reader, delimiter: u8) error{ ReadFailed, StreamTooLong }!?[]u8 {
    const inclusive = r.peekDelimiterInclusive(delimiter) catch |err| switch (err) {
        error.EndOfStream => {
            const remaining = r.buffer[r.seek..r.end];
            if (remaining.len == 0) return null;
            r.toss(remaining.len);
            return remaining;
        },
        else => |e| return e,
    };
    r.toss(inclusive.len);
    return inclusive[0 .. inclusive.len - 1];
}
5 Likes

I also found the behavior of takeDelimiterExclusive to be unintuitive on first use. I too am used to C /C++ APIs where the delimiter often gets swallowed (see std::getline).

But, after further use, I think I like the state of the API. As demonstrated by other posters, takeDelimiter is more familiar since it seeks past the delimiter and handles the case of remaining bytes at EOF without a delimiter.

My only nitpick is that it feels asymmetrical that it swallows error.EndOfStream, but from an efficiency perspective it makes sense.

…alright that’s even funnier, but

So Cookie monster! and… whatever this quote is: Discourse is trying to fetch a preview of the link, Codeberg’s berg of code presumably doesn’t like some behavior associated with that and they’re being German[1] about it.

It’s unclear what to do about this. Ideally Discourse and Codeberg duke it out, this seems like a social bug to me rather than a software one.

In the meantime it’s probably better to post Codeberg links like this <https://c-is-for-cookie.berg> or like this [good enough for me!](https://cookiecookiecookie.cookie).


  1. All love to my CCC brothers but you know I’m right ↩︎

Codeberg uses Anubis (which uses proof of work on the client) to disincentivize excessive crawlers, but that also catches discourses relatively harmless bot that fetches previews/descriptions, I don’t know all the details, here is a length discussion that goes into some of the aspects of the topic #319 - Anubis - using proof-of-work to stop excessive crawling - forgejo/discussions - Codeberg.org and there is also some mention of the problem of fetching previews, just not a solution for that.

1 Like

I figured it would be something like that, thanks for looking into it. Sounds like it won’t be a problem forever.

I think I mostly wanted people to know where Cookie Monster! was coming from: neither the Zig repo nor Ziggit are the agents of that particular bit of trolling.

1 Like

I want to treat "a\nb\nc" and "a\nb\nc\n" the same.

The problem I describe with the exclusive API is when you get "...a\n\nb...".

You have one empty line there, but the exclusive one gives you an error.EndOfStream too for that.

So essentially you get this with two newlines:

_ = try r.takeDelimiterExclusive('\n');
r.toss(1);
assert(error.EndOfStream == r.takeDelimiterExclusive('\n'));

That’s why I also said “or a different error if a slice of length 0 isn’t wanted”.

So if you are at the end of the stream with no data: error.EndOfStream. If you are at a delimiter with no data: error.NoData (or some other name).

Currently if you get error.EndOfStream, you need to toss a byte (assuming it’s your delimiter) and try again to figure out if you are really at the end of the stream or just got an empty line.

Looking at that, if I pipe in the input with:

python -c 'print("50", end="")' | zig run cookie.zig

I immediately get an error.EndOfStream instead of (what I essentially want in my program) the answer to the guess first (so “Guess higher.” or “Guess lower.”) and THEN error.EndOfStream (if I didn’t get lucky and the answer is 50).

So takeDelimiter like here works the same way as takeDelimiterInclusive.

no, it should give you an empty slice there, I did confirm this to be true on 0.15.2.

What reader implementation are you using? It might be false reporting end of stream.

I was testing with std.Io.Reader.fixed and std.fs.File.Reader, both have the expected behaviour.

std.fs.File.Reader

To be fair, a bit part of the complexity came from me trying to use the error union function of while loops coupled with wanting the buffer of the of the reader of be substantially bigger than the max length of a line (to minimise system calls) and wanting to have all the handling error.* from the readers in one switch. Maybe this caused that to be confused.

But I just tested again with this code:

const std = @import("std");

pub fn main() !void {
    var lineBuffer: [1024]u8 = undefined;
    var stdinReader: std.fs.File.Reader = .init(.stdin(), &lineBuffer);
    const in = &stdinReader.interface;

    while (in.takeDelimiterExclusive('\n')) |line| {
        std.debug.print("line: {s}\n", .{line});
    } else |e| if (e != error.EndOfStream) return e;
}

and this: python -c 'print("a\n\nb", end="")' | zig run test1.zig

And I don’t get an error, but this:

line: a
line: 
line: b

But I am nonetheless confused since I am not taking the newline from each line (since I am using an exclusive function) and it doesn’t loop.

AH the issue is you are using 0.15.1, this is a bug that was fixed in 0.15.2

1 Like

I also just noticed that on ziglang.org, the documentation for this function for version 0.15.1 and 0.15.2 is identical. Unlike for when you use zig std.

2 Likes