Should read return !?usize?

Zemogus · April 26, 2025, 2:57pm

So I was happily coding along when copilot hallucinated this code:

var buf: [8192]u8 = undefined;
while (try de.reader().read(&buf)) |n| {
    _ = try stdout.write(buf[0..n]);
}

This obviously doesn’t compile because read() returns !usize. But it could work if read() returns null instead of 0. Personally I feel like returning “null bytes read” doesn’t really make sense, but the resulting code does look more concise than the currently functional alternative:

var buf: [8192]u8 = undefined;
while (true) {
    const n = try de.reader().read(&buf);
    if (n == 0) break;
    _ = try stdout.write(buf[0..n]);
}

What do you guys think?

g41797 · April 26, 2025, 4:48pm

according to read(2) — Linux manual page:

On success, the number of bytes read is returned (zero indicates end of file),

It means zero is valid value for read operation

nullable has different meaning

dasimmet · April 26, 2025, 4:54pm

I’d say since read returns an Integer length it should be allowed to return 0. It’s supposed to be close to the C ABI version and 0 meaning end of stream is it’s convention.
But maybe a separate function that returns !?[]u8 would be a convenient wrapper that could be a “method” of the reader (for the common case of slicing buf):

fn readSlice(reader: anytype, type: T, buf: []T) !?[]T {
    const nread = try reader.read(buf);
    if (nread == 0) return null;
    return buf[0..nread];
}

separate function version:

var buf: [8192]u8 = undefined;
while (try readSlice(de.reader(), u8, &buf)) |slice| {
    _ = try stdout.write(slice);
}

as a “method” of reader the type argument could be omitted, too:

var buf: [8192]u8 = undefined;
while (try de.reader().readSlice(&buf)) |slice| {
    _ = try stdout.write(slice);
}

LucasSantos91 · April 26, 2025, 4:54pm

var buf: [8192]u8 = undefined;
while (de.reader().read(&buf)) |n| {
    _ = try stdout.write(buf[0..n]);
} else |err| return err;

I think this would also work.

In any case, I agree with the ?usize. Just because Linux uses 0 as end of stream doesn’t mean Zig has to do the same. Consider an asynchronous reader. If there’s some data ready to read, it reads it. If there isn’t, we would like it return immediately, in which case it should return 0. But this is not allowed by the interface now. This reader would be forced to block and wait for at least one byte to be ready, before returning.

mnemnion · April 26, 2025, 5:03pm

I’ve found it useful to distinguish between 0, meaning “polling succeeded and nothing was read”, and null, meaning “polling did not find anything ready to read from”.

Not sure what anyone should conclude from that, but it seemed relevant.

g41797 · April 26, 2025, 6:13pm

i don’t understand the difference - both are not errors

        /// Returns the number of bytes read. It may be less than buffer.len.
        /// If the number of bytes read is 0, it means end of stream.
        /// End of stream is not an error condition.
        pub fn read(self: *Self, buffer: []u8) Error!usize {

for low-level language it’s better to be close to C/OS API

mnemnion · April 26, 2025, 6:20pm

The difference lies in the difference between reading (which blocks) and polling (which always returns).

When polling, it’s useful and important to distinguish between “this has finished reading” and “this did not happen to read”.

g41797 · April 26, 2025, 6:28pm

still the question is “are we going to provide interface similar to C/OS or provide new one”

LucasSantos91 · April 26, 2025, 6:51pm

It already is a new interface. The OS doens’t have the notion of zig errors. I see no value in mimicking the OS.

mnemnion · April 26, 2025, 7:28pm

Andrew is also refactoring I/O with the aim of producing a ‘colorblind’ interface which works the same way with blocking as with polling. I’m interested in what he comes up with, and it’s a safe bet that the part of the current Reader interface we’re discussing here will work differently.

I use null to represent ‘not ready on poll’, but that decision was downstream of readers returning 0 when there’s nothing left to read. If they returned null I’d have done something else.

I don’t think 0 is much of a ‘magic number’ here, (bytes_read == 0) is a fairly coherent expression. I grant the point that control flow would be nicer with a null however.