Is it possible to use tokenizeSequence directly from a reader?

What would be the best way to directly read lines from a file? I was attempting to tokenize a sequence directly via the reader but not sure that’s possible.

    var buffer: [2048]u8 = undefined;
    var file = try fs.cwd().openFile("02.input", .{ .mode = .read_only });
    defer file.close();

    var reader: fs.File.Reader = file.reader(&buffer);
    const stream = &reader.interface;
    var row_it = std.mem.tokenizeSequence(u8, <how_to_read_buffer>, "\n");

    while (row_it.next()) |row| {

Recent threads intersect your question (though they do not reference tokenizeSequence); you might find them helpful since your sequence delimiter is \n, the same as used in these examples…

For 0.15:

For 0.16:

std.Io.Writer.takeDelimiterExclusive: Zig Documentation

In your example:

    var buffer: [2048]u8 = undefined;
    var file = try fs.cwd().openFile("02.input", .{ .mode = .read_only });
    defer file.close();

    var reader: fs.File.Reader = file.reader(&buffer);
    const stream = &reader.interface;

    while (try stream.takeDelimiterExclusive('\n')) |row| {

Don’t forget to toss(1) the delimiter! (std.Io.Reader: fix delimiter bugs · ziglang/zig@bf58b4e · GitHub)

1 Like

I had another implementation using takeDelimiterExclusive which in my view had two issues.

  1. The necessary call to .toss(1) which didn’t feel very ergonomic which made me think there must be a better option
  2. If the file didn’t end with a new line the program would just crash :sweat_smile:

Personally I’m not a big fan of the error “EndOfStream” either and the fact it must be explicitly handled. Personally that seems like expected behaviour when reading a file and not an “Error”.

These two reasons were what made me look for other solutions, as the code I’m writing is purely for academic reasons, so I was trying to find the solution that felt the most elegant

I actually think that it’s pretty convenient for EOF to be an error, you almost always want to return a different error on error.ReadFailed or error.WriteFailed anyway (like the error cached in File.Reader or error.OutOfMemory if you’re using a Writer.Allocating) so you’d have switched on the error even if EndOfStream wasn’t part of it, and it’s certainly more explicit than returning null for EOF

I agree with @Justus2308 that the EndOfStream error is handy, but some of the read functions do treat EOF as another special delimiter, in a way, in addition to the one you provide. For instance, in master (0.16)

   var reader: std.Io.File.Reader = file.reader(io, &buf);
   while(reader.interface.takeDelimiter('\n')) |line| {
      if (line) |l| { std.debug.print("line: {s}\n", .{l}); }
      else { break; }
   } else |err| return err;

takeDelimiter() returns the position in your buffer, where the new data has landed, and only returns an error if a real error (e.g. ReadFailed) has occurred. This will handle your “final line without a \n” case “elegantly”, I think most would say. But note that it is possible, in this case, to get a “null” return, so you have to unwrap the optional to use it, and explicitly break when, indeed, you get that break, because you will get it.

With a similar setup (notably, a large-enough buf, you could, of course take (or takeArray) and then tokenizeSequence(). In this case, again, if you’re working directly with a buffer rather than setting up a read-to-write stream with an allocator, your loc will remain small.

Doesn’t the while loop end if “line” is null?

At least with my testing, this is the exact ergonomic function I want (can handle errors if I want later down the road).


    var reader: fs.File.Reader = file.reader(&buffer);
    while (try reader.interface.takeDelimiter('\n')) |line| {
        std.debug.print("line: {s}\n", .{line});
    }

I think I very much like that EOF is treated as an “extra” delimiter.

takeDelimiter returns an error union wrapping an optional wrapping a slice. while only unwraps once, so you have to either try to unwrap the error union (or catch inside the while condition which rarely helps readability) and then let while unwrap the optional like you did or let the while unwrap the error union (and catch the error with an else case to switch on it or whatever, which is a very nice pattern IMO) and then unwrap the optional inside your loop body and break from the loop manually if it’s null.
I think it’s much nicer to just have everything as an error so that you only have to unwrap once.

Indeed, as @Justus2308 mentions, takeDelimiter is a case of a !?T return - it can return an Error, a null, or a value. If the language allowed while {} else {} else |err| {} then you could simply ‘do nothing’ in the first else{}, and let it break the while loop. This could be kind-of cool, I guess. But no, we just have else |err| {} in this case, so the return value remains unwrapped, and possibly null.

If you come to agree with @Justus2308 and I, and like the error style, then the equivalent would look like:

   var reader: std.Io.File.Reader = file.reader(io, &buf);
   while(reader.interface.takeDelimiterInclusive('\n')) |line| {
      std.debug.print("line: {s}\n", .{line});
   } else |err| switch (err) {
      error.EndOfStream => {}, // while loop done
      else => return err,
   }

It’s “sort of” the same loc.

But, for completeness, I should say, of course, your slim’n’trim “try” variant (while (try reader.interface.takeDelimiter) works exactly as you wrote it because, in that case, line is indeed unwrapped already, so you don’t have to. As you say, for your testing case, that’s probably the most readable and undecorated variation.

But, to be complete and offer a tokenizeSequence(), and to highlight the one-liner readFile() (note, this is 0.16, again, but an 0.15 variant exists):

   const io = std.testing.io;
   var buf: [1024]u8 = undefined; // must be big enough for entire file
   const contents = try std.Io.Dir.readFile(std.Io.Dir.cwd(), io, "fn", &buf);
   var tok = std.mem.tokenizeSequence(u8, contents, "\n");
   while (tok.next()) |line| {
      std.debug.print("line: {s}\n", .{line});
   }