Pass a Reader to a function as argument

desijuan · December 29, 2023, 2:58pm

Hi!

I’m trying to implement a function that works in a similar way to the getline function in C. It has the following signature:

pub fn getLine(list: *std.ArrayList(u8), reader: anytype) !?[]const u8

Right now the type of the reader is anytype, and this bothers me. I would like that the type of reader would be more restrictive, something like:

reader: std.io.Reader, std.io.AnyReader or std.io.GenericReader,

or something like that. Basically something that has a

fn read(self: Self, buffer: []u8) anyerror!usize

function.

I think there has to be a better way than passing it as reader: anytype and checking that it implements that funtion or simply not checking anything at all.

I don’t know, using anytype seems like dropping types. I can use it everywhere and expect that things don’t crash at runtime. It doesn’t feel right.

For better understanding I copy the code that I have below.

const std = @import("std");
const print = std.debug.print;

fn getLine(list: *std.ArrayList(u8), reader: anytype) !?[]const u8 {
    var buffer: [1]u8 = undefined;
    list.clearRetainingCapacity();
    while (try reader.read(&buffer) != 0) {
        try list.append(buffer[0]);
        if (buffer[0] == '\n') break;
    }
    return if (list.items.len == 0) null else list.items;
}

pub fn main() !void {
    const allocator = std.heap.page_allocator;
    var list = std.ArrayList(u8).init(allocator);
    defer list.deinit();

    const file_path = "input.txt";

    const file = try std.fs.cwd().openFile(file_path, .{});
    defer file.close();

    var bufferedReader = std.io.bufferedReader(file.reader());
    const reader = bufferedReader.reader();

    while (try getLine(&list, reader)) |line| {
        print("{s}", .{line});
    }
}

mscott9437 · December 29, 2023, 3:19pm

getline() in C accepts a file handle as an argument, as opposed to a reader. So my first thoughts would be to declare your bufferedReader logic inside your Zig getline() function. And pass the result of openFile() into that function, instead of the reader

Edit: i think you also might be able to do something like my_reader.getLine(Self: my_reader, etc…)

Edit2: Actually by declaring your reader in the function it will reset after each call to the function, so you will only get the first line. To get around this you might use the return output of read() to track your index and read from there to the ‘\n’. Alternatively you can initialize the reader inside a struct and reuse it after that. But the reader will still be held as an anytype. This would be a different approach then using the standard streamUntilDelimiter, which requires an ArrayList.writer by default, so you should probably avoid it unless you have some specific reason. i.e. you don’t want to use an ArrayList or allocators and just want to rely on reader.read() and buffers.

ianprime0509 · December 29, 2023, 4:25pm

Currently, using anytype for reader is the best solution (and the same pattern for reader and writer parameters is used very commonly). There is currently no way to express the constraints you want in the function signature (for a related discussion, see replace anytype · Issue #17198 · ziglang/zig · GitHub).

However, just to clear up a potential misconception:

anytype is a compile-time construct, not a runtime construct. If you pass something as a reader to your function which does not have a suitable read function (as used in the body of getLine), your program will fail to compile, rather than failing at runtime. This is because the compiler will analyze getLine separately for each distinct type you pass in for reader, just as it would work if you wrote it as

fn getLine(list: *std.ArrayList(u8), comptime T: type, reader: T) !?[]const u8

The benefit of using anytype instead of an explicit T type parameter here is that the user can call getLine(list, reader) rather than getLine(list, @TypeOf(reader), reader).

Also, unrelated to your immediate question, similar functionality to getLine already exists: std.io.Reader.streamUntilDelimiter. To use this, you would need to pass list.writer() as the writer parameter.

gnarz · December 29, 2023, 6:17pm

Hi! In current zig (as opposed to 0.11), you could actually use an std.io.AnyReader here. The std.io.Reader is actually a std.io.GenericReader, which has a method called any(), which returns an AnyReader. I use this successfully in a project I am currently working on, with both file readers and also a self-built string reader. Works fine.

desijuan · December 29, 2023, 7:44pm

@ianprime0509, thank you for your explanation!

I’m new to Zig and didn’t know that anytype is a compile-time construct. Today I larned something new :).

About that method std.io.Reader.stremUntilDelimiter, I had seen it, but to me it seems awkward that it doesn’t return anything, and instead it returns an error.EndOfStream (from std.io.Reader.readByte when it reaches the end of the stream.

For 2 reasons: to reach the end of stream is not an error, but more important, because I wanted to use it in a simpler way, without having to catch the error and switch on it to see if I reached the EOF. That is why I was trying to write my own getline function.

desijuan · December 29, 2023, 7:48pm

It worked! Nice! Thank you very much @gnarz !

I ended up using your solution the following way:

const std = @import("std");
const print = std.debug.print;

pub fn getLine(list: *std.ArrayList(u8), reader: *const std.io.AnyReader) !?[]const u8 {
    var buffer: [1]u8 = undefined;
    list.clearRetainingCapacity();
    while (try reader.read(&buffer) != 0) {
        try list.append(buffer[0]);
        if (buffer[0] == '\n') break;
    }
    return if (list.items.len == 0) null else list.items;
}

pub fn main() !void {
    const allocator = std.heap.page_allocator;
    var list = std.ArrayList(u8).init(allocator);
    defer list.deinit();

    const file_path = "input.txt";

    const file = try std.fs.cwd().openFile(file_path, .{});
    defer file.close();

    var bufferedReader = std.io.bufferedReader(file.reader());
    const reader = bufferedReader.reader().any();

    while (try getLine(&list, &reader)) |line| {
        print("{s}", .{line});
    }
}

Note: I don’t know if passing a pointer reader: *const std.io.AnyReader instead of directly reader: std.io.AnyReader to the getline function makes a difference, because I think the compiler does it automatically.

Note 2: I’m using an array of length 1 for the buffer, this is something that bothers me too. Perhaps I can use a single item pointer and coerce it to a slice? I will ask this in a separate thread, because this one is already solved.

mscott9437 · December 29, 2023, 9:41pm

Just a quick note for you that might help. If you notice when you call read() it’s filling your buffer with the data. So when you make a call to ArrayList.append() immediately after that, then this will make an extra copy of that byte when it goes into the ArrayList from the buffer. In an ideal situation, you would likely only want one copy of that data to save memory, so you would either want to work directly with the buffer you already have when calling read(), or you would just use the streamUntilDelimiter() method exclusively. also you can look at the implementation for AnyReader.readByte which will return the u8 value directly.

/// Reads 1 byte from the stream or returns `error.EndOfStream`.
pub fn readByte(self: Self) anyerror!u8 {
    var result: [1]u8 = undefined;
    const amt_read = try self.read(result[0..]);
    if (amt_read < 1) return error.EndOfStream;
    return result[0];
}

Here they are using a [1]u8 buffer, similar to how you are doing it with a call to read().

For readers and writers you can pass by value directly,

desijuan · December 29, 2023, 11:31pm

Ok, thank you for your comment, but I don’t know if I understood correclty. Doesn’t his readByte method do the same as I’m doing? I see 2 differences, but the rest seems the same to me (perhaps I’m missing something). The differences I see are:

The readByte method is instantiating a variable result each time it is being called, and then returning the value read. The method streamUntilDelimiter then copies it to the writer, by calling writeByte. The copy is being done anyway. Isn’t this the same that I am doing?
If it reaches the end of the stream (EOF in my case, since I’m reading from a file), the readByte method (and also streamUntilDelimiter) returns an error.EndOfStream. I find this very unsatisfactory.
On the one hand reaching the end of stream is not an error. Zig has optionals, one can return an optional usize that is null when the end of stream has been reached.
On the other hand, this behaviour of streamUntilDelimiter is not very ergonomic. I should catch and switch on the error to see if I have reached the end of the file? Something like this?

while (true) {
  reader.streamUntilDelimiter(list.writer(), '\n', null) catch |err| switch (err) {
    error.EndOfStream => break,
    else => return err,
  };
  // Do something with the line, for example print it
  print("{s}", .{list.items});
}

Sorry but I find this very ugly. I would like to use the getLine in a much simpler way, and make use of the optionals in the while loop.

while (try getLine(&list, &reader)) |line| {
  // Do something with the line, for example print it
  print("{s}", .{line});
}

This behaviour of streamUntilDelimiter not returning anything and relying on error.EndOfStream to signal the end of the stream is the reason I thought in writing my own getLine function in the first place.

I found interesting your observation on the byte being read and copy two times. I will see if I can improove it.

mscott9437 · December 30, 2023, 1:56am

Looks like you are on the right track. And yeah I can see where streamUntilDelimiter calls readByte and writeByte. As far as the EndOfStream not being an error, I think from the perspective of Zig it would still be handled like an error. If you use it as the condition for the while loop it will clean up your code a little bit:

while (reader.streamUntilDelimiter(bytes.writer(), '\n', 1000) != error.EndOfStream) { 
    print("{s}", .{ bytes.items });
}

I can see where you might prefer something like getLine, but I still think the ArrayList is overkill here. You would be going through a lot of trouble to re-implement streamUntilDelimiter just to avoid having the EndOfStream error. Now something with more abstraction might be interesting, like if you could just return the slice directly from the read() without a call to append(). Or maybe just pass an allocator instead of the ArrayList to see how that would work. But now I think we’re definitely into a new topic here.

Anyway I just wanted to give you some observations based on my experience. Keep up the good work!

AndrewCodeDev · December 30, 2023, 2:10am

The any method may become a private function at some point: Complicated ownership using AnyReader · Issue #17458 · ziglang/zig · GitHub

desijuan · December 30, 2023, 2:49am

Oh! I didn’t realize that I could use it like that in the while loop. Thank you very much!

I think in that case I don’t need my getLine function anymore.

Anyway, it was fun to do some research about Zig. The ultimate goal was to learn and I’m really enjoying it!

gnarz · December 30, 2023, 10:07am

Yes, I noticed the Problem with the ownership. But it is basically the same issue with using the allocator() methods of GeneralPurposeAllocator and friends, so I didn‘t think much of it. It is just an interface to another object, so you need to keep the other object around. Also, I do find the interface with needing to to obtain an AnyReader from a Reader obtained from a File somewhat convoluted… but such are the ways of a std lib in progress of a language in progress.

dude_the_builder · December 30, 2023, 12:02pm

A couple of notes:

I’m not sure you can categorically assert this. The method name is streamUntilDelimiter, so I would presume the objective is to stream until the specified delimiter is found. Any other outcome would be unexpected and thus an error to be handled differently.

I don’t want to rain on your parade, but this will igoner any other error returned from streamUntilDeliniter, entering the loop body in a possible error condition. Switching on the error may seem tedious or verbose, but it’s the only way to handle all possible errors.

mscott9437 · December 30, 2023, 12:52pm

Yeah so when i originally came up with that pattern, the only error I could think to trigger was EndOfStream. Because Ctrl+C will trigger an EndOfStream error on windows, whereas on Linux it will just exit. So that was the best I could do. But now that I look at it again, I realize we have a StreamTooLong error, so I would amend that to match that error instead, since you have to pass a max size to streamUntilDelimiter already

   while (reader.streamUntilDelimiter(bytes.writer(), '\n', 10) != error.StreamTooLong) {
      try writer.print("> {s}\n", .{ bytes.items });
   }

I still think this is good when you want to keep things simple, but yes you should take some caution because you will not have ways to handle the other errors down the road. If you want to use it inside your while loop to catch all the errors then I would try this one

while (true) {

   reader.streamUntilDelimiter(input.writer(), '\n', 1000) catch |err| switch (err) {

      error.EndOfStream => {
         try writer.print("\n", .{ });
         break;
      },

      error.StreamTooLong => {
         try writer.print("\n", .{ });
         break;
      },

      else => |e| return e

   };

}

This is one thing I want to point out which is really nice about Zig compared to some other modern language, because ultimately it lets you decide just how safe you need to be with your code

Also I’m not sure of the other errors streamUntilDelimiter can return. It’s still not clear for me how to determine that

Edit: forgot that you can also pass null as the max size, so you might adjust it to check for a different error besides StreamTooLong in that case

ianprime0509 · December 30, 2023, 2:54pm

As a small suggestion for simplification, while loops can be used with error unions:

while (reader.streamUntilDelimiter(input.writer(), '\n', 1000)) {
    // Do stuff with the input you read
} else |err| switch (err) {
    error.EndOfStream, error.StreamTooLong => try writer.writeByte('\n'),
    else => |e| return e,
}

When using a while loop with an error union type, there must be an else branch capturing an error, and the loop will execute until there’s an error, and then the else will execute.