Back to Basics: File Reading with the New IO in Zig

9 Likes

Small note but GeneralPurposeAllocator was renamed to DebugAllocator in 0.14.0 :slight_smile:

I’m not sure reader.toss(1) is the right API for discarding the delimiter byte, have you tried using takeDelimiterExclusive?

Ha. Copying and pasting couple month old code are outdated already. Thanks for the info! I’ve updated the blog with DebugAllocator.

1 Like

careful std.Io.Reader: fix delimiter bugs by mlugg · Pull Request #25169 · ziglang/zig · GitHub

1 Like

Good question. The current implementation of takeDelimiterExclusive only looks in the buffered data up to the buffer’s capacity for the delimiter. It won’t look beyond that. See the called peekDelimiterInclusive doing the read; the while loop only ranges over the buffer’s capacity.

To make it work, you have to have a big enough buffer to read in the data with the delimiter. That’s too much assumption for a general purpose read-until-delimiter.

I’ve gone through all variations of the delimiter reading API and settled on the streamDelimiter() with a growing writer buffer.

1 Like

In provided snippet:

var reader = &file_reader.interface;

reader variable can be made const since it’s a pointer and not a value. Compiler sadly can’t detect it :frowning:

2 Likes

I remember seeing a compile error: expected type ‘*Io.Reader’, found ‘*const Io.Reader’ at reader.streamDelimiter(…) before and changed it to var. Now that I change it back to const as you suggest and it works. I guess it was the ‘const file_reader’ causing it and it showed up as a compile error on ‘reader’. Thanks for the info.

1 Like

Thanks for writing and sharing this! It helped me a lot.

If I want to read the entire contents of a file (instead of line-by-line), is this the best approach:

  1. Create an ArrayList(u8) in which to store the contents.
  2. Call appendRemaining on the reader, passing a suitable allocator and the array list.

?

Specifically:

// --snip--
const reader = &file_reader.interface;
var text_buf = std.ArrayList(u8).empty;
// Read the entire file into `text_buf`.
try reader.appendRemaining(alloc, &text_buf, .unlimited);

// Get a slice to the text content. Alternatively, call `toOwnedSlice()`.
const text = text_buf.items;

Thanks for the comment. For reading the entire file into memory, using appendRemaining works when loading into an ArrayList. There’re several other ways. The following are some examples.

This is an one-liner, simplest.

This one let you allocate a buffer first.

This one is most efficient, least amount of buffer copying.

This one is for when the file size is unknown, reading as much as possible until the EOF.

Cheers.

2 Likes

Funny that this should resurface today.

Earlier, stratts advocated for not assigning the interface to a variable at all.

Indeed, in this case, in your post:

    var file_reader: std.fs.File.Reader = file.reader(&read_buf);

    // Pointer to the std.Io.Reader interface to use the generic IO functions.
    const reader = &file_reader.interface;

You create a reader just as in the discussion (and oops) of that thread. I notice that file_reader is used for nothing other than its interface, and that the interface (reader) is used only twice. Obviously, in a real app, it would potentially be used much more, and file_reader might, too, but would it make sense to propagate @stratts proposal, and just

    _ = file_reader.interface.streamDelimiter(&line...

instead, and possibly rename variables for ergonomics?

stratts and you have a good point.

There should be no buffer copying in half of your solutions.
The reader will first copy data from its buffer to the output buffer/writer if it has any data, then it reads the data directly into the output buffer/writer. In your examples, you don’t do any operation that puts data in the internal buffer, Excluding the example where you only fill the buffer.

Except Writer.Allocating, as that might copy data when growing its buffer
Also except Dir.readFileAlloc as that appends to an array list

Trace through the call stack of std.fs.Dir.readFileAlloc.

1 Like

Thank you again! That’s much clearer that what I was doing. :+1:

There seem to be quite a bit of questions regarding file I/O. I’ve put together two more blog posts diving into the topic.

Most languages have an one-liner library function to read a text file into lines. The above is a version for Zig. It supports a simple usage like below.

var f = try FileLines.read(alloc, std.fs.cwd(), "test.txt");
for (f.lines()) |line| {
    std.debug.print("line = '{s}'\n", .{line});
}
f.deinit();
1 Like

Thank you for this, any idea about how the best way to make partial reads/writes? For example I could stop reading after N lines, then read the rest by setting position with file.seekTo(), but then on write, how can you write only a part of the file, assuming you didn’t read it all and edited a part of it?

I think streamDelimiter should be helpful.

std.fs.File.Reader.seekTo() should work with partial reading with jumping to different parts of the file. It will adjust the associated std.Io.Reader interface for the seek operation.

std.fs.File.Writer.seekTo() also should work. It will adjust the associated std.Io.Writer interface for the seek operation. You just seek to a position in the file and write the number of bytes for the partial data.

Also the File.Reader and File.Writer are opened with the .positional mode by default.

1 Like

Thanks, I’ll try with this one.