File I/O basics (0.16)

(Note, this is a variation on this article, invited into docs with the release of 0.16.)

As always, reply/comment all you want, and I’ll endeavor to make changes accordingly.

Motivation, and migration

Zig 0.16 brought big changes to I/O. std.fs references deprecate everywhere, and understanding basic use of std.Io is in order. Zig’s new I/O model brings I/O into alignment with the Allocator model - all of Zig’s “file system, networking, timers, synchronization, and pretty much everything that can block [is moved] into a new std.Io interface. All code that performs I/O will need access to an Io instance, similar to how all code that allocates memory needs access to an Allocator instance.” ref.

What used to be std.fs.cwd().openFile(path, .{}) becomes std.Io.Dir.cwd().openFile(io, path, .{}), and likewise throughout the interface: the io arg is in every function that does I/O. This allows you to choose your I/O model (synchronous, asynchronous - coroutines, threads, etc.) with just a swap of the Io you instantiate and provide to all of your I/O calls.

Most of this article is an attempt to provide basic examples and discussion around several typical file (or file-like) I/O use cases. If you’re looking for an std.Io overview, including async and concurrency, look here.

Getting an io object

First thing first; there are many ways to get an io object, but two common ones are within a test:

test "within a test" {
   const io = std.testing.io;
   // ...
}

and, alternately, within a main. Here we’ll assume use of juicy main:

pub fn main(init: std.process.Init) !void {
   const io = init.io;
   // ...
}

I’ll take the “within a test” approach throughout the remainder of this journey…

Reading a whole file

First, let’s simply read a whole file:

   var buf: [10240]u8 = undefined; // must be big enough for entire file
   const io = std.testing.io;
   const contents = try std.Io.Dir.readFile(std.Io.Dir.cwd(), io, "test-filename", &buf);
   var tok = std.mem.tokenizeSequence(u8, contents, "\n");
   while (tok.next()) |line| {
      std.debug.print("line: {s}\n", .{line});
   }

This variant uses readFile() to read the entire (text) file, or as much as can fit into buf. It returns contents, but it’s important to realize that contents is just a slice on buf - no memory is magically materialized. Still, you want to use contents, not your buf, directly, because contents is a proper slice with .len that corresponds to the data read. Note that no error is returned if buf is not big enough for the file; rather, only buf.len bytes are read, to fill buf - the remainder of the file remains unread. Thus, in real code, you should either be confident that you know the length of the file, or check the length of the result (contents.len < buf.len) to deal accordingly.

The above requires a preallocated buffer (and, implicitly, the length of the file). A variant, readFileAlloc(), exists as well. For this, familiarity with zig’s Allocators is helpful.

   // ... alternatively ...
   const allocator = std.testing.allocator;
   const contents = try std.Io.Dir.readFileAlloc(std.Io.Dir.cwd(), io, "test-filename", allocator, .unlimited);
   defer allocator.free(contents); // or free elsewise; caller owns readFileAlloc()'s return buffer!
   // ...

Be careful to read the caveates with this implementation, in the doc comments, and consider readFileAllocOptions(), as well. Also note that readFileAlloc*()is implemented to handle “files” for which file size isn’t known (e.g., network streams, terminal input, etc.) (see…).

Errors

The above example just propagates all errors, with try. This is irresponsible - errors should most-often be unwrapped and treated thoughtfully, especially since Error.Canceled needs to be propagated for (async task) cancelation to work properly. In examples below, we’ll see some more realistic error handling.

Reading line-by-line

The above file was treated like a ‘\n’-line-delimited text file, and we’ll continue assuming “text files” for awhile. Rather than reading the whole file, what if we just wanted to read line-by-line (and thus reduce our memory requirement for each read operation)…

   const io = std.testing.io;
   if (std.Io.Dir.cwd().openFile(io, "test-filename", .{
      .mode = .read_only, // optional args, used here for clarity
      .lock = .exclusive,
   })) |file| { // or catch, rather than if
      defer file.close(io);
      var buf: [1024]u8 = undefined; // must be big enough for longest line
      var reader: std.Io.File.Reader = file.reader(io, &buf);
      while (try reader.interface.takeDelimiter('\n')) |line| { // not advisable to auto-propagate; see below...
         std.debug.print("line: {s}\n", .{line});
      }
   } else |err| switch (err) {
      error.FileNotFound, error.AccessDenied => {
         std.debug.print("unable to open file: {}\n", .{err});
         // loop back to try another or something
      },
      else => |e| return e, // don't continue; rather, bomb out
   }

This example uses takeDelimiter() and includes some (but not enough) error handling. First, note that takeDelimiter() is a function of Io.Reader, but our reader, here, is actually an Io.File.Reader. Io.XX.Reader objects carry an interface, which must be used to access functions like takeDelimiter. One common mistake, though, involves assigning that interface improperly:

     const reader = &file_reader.interface; // right way
   //const reader =  file_reader.interface; // BAD!! BAD!!!  DON'T DO!

(The reference is necessary because the interface needs it’s connection to the parent reader (File.Reader, in this case), and a copy would isolate it from its parent.) To avoid this, one good pattern is to just always use reader.interface.foo() - that is, always type out the whole thing. Sometimes this is too verbose, given the context, so, if you need to create a const, make sure it’s a const reference, and not a copy. In the line-by-line reading example, above, you see the verbose reader.interface.takeDelimiter() pattern.

Error handling: if the openFile() fails, the code switches to handle FileNotFound and AccessDenied errors as redeemable - perhaps the user is given a chance to choose another file(name). But if any other errors are returned, the final else just propagates the error (which may rise to the top and result in a panic). However, this example neglects takeDelimiter() errors - we’ll handle those better below. Also note: the earlier example, with std.Io.Dir.readFile() never explicitly called openFile(), so did not need the defer file.close(io) that is essential here in this example.

This example also uses a stack buffer, buf - this time just 1024 bytes large; this suggests that we know that the lines of the file are less than 1024 bytes each, or else a whole line would not fit into buf when takeDelimiter() tried to read the line, and takeDelimiter() would return StreamTooLong. Note that tossBuffered() does NOT need to be called, even if a line is 1023 bytes long, because each subsequent call to takeDelimiter() will assume responsibility for that (line will be invalidated at that point, since line is just a slice reference into buf, and buf must be available for the next read). The use of an allocator, rather than the stack, is demonstrated a couple of places elsewhere in this article.

Better error handling… and a touch of async I/O

The above example relied on try to auto-propagate on errors returned from takeDelimiter(). More often, for code that has the concrete reader (or writer), such as our File.Reader, it is more appropriate to return reader.err.?. This is essential, for instance, for error.Canceled, which must be propagated for cancelation (in async/concurrent contexts) to work properly:

      // ... alternatively ...
      while(reader.interface.takeDelimiter('\n')) |result| if (result) |line| {
         std.debug.print("line: {s}\n", .{line});
      } else break else |err| switch (err) {
         error.ReadFailed => {
            std.debug.print("read failed, discontinuing!\n", .{});
            return reader.err.?; // return the specific error; especially essential for error.Canceled
         },
         else => return err, // StreamTooLong could be handled explicitly, but `else` propagation is not illegal
      }
      // ...

In this case, the “compound-optional” result of takeDelimiter() could be:

  • actual data, when unwrapped, or
  • null (since result is an optional), or
  • an error

If takeDelimiter() succeeds with data, result is unwrapped to line, and you can process this line that was read from the file. If result is null, the else break bit breaks out of the while loop. And, finally, if result is an error, it’s switched upon. That’s the work that this line does:

      } else break else |err| switch (err) {

A common pattern involves the top-level code, which creates the reader or writer, carefully checking reader.err or writer.err, like this, when handling error.ReadFailed or error.WriteFailed. Lower code, which just takes the reader.interface or writer.interface can merely try *read(), or the likes, and let that ReadFailed or WriteFailed propagate up. The creater of the concrete reader or writer knows what it is (e.g., a File.Reader), and knows if it might need to propagate .err (which might be error.Cancelable, e.g.). So, for instance:

   foo(&file_writer.interface) catch |err| switch (err) {
      error.WriteFailed => return file_writer.err.?,
   };

… but, within the implementation of foo(), try writer.writeAll(...) might simply propagate any WriteFailed (or other error) if that code is in no position to do more than propagate.

See also “More on async”, below.

More approaches to reading…

Yet another approach is to use takeDelimiterInclusive() or takeDelimiterExclusive():

      // ... alternatively ...
      var rif = &reader.interface; // careful to take the address &!
      while(rif.takeDelimiterInclusive('\n')) |line| {
         std.debug.print("line: {s}", .{line});
      } else |err| switch (err) {
         error.ReadFailed => return reader.err.?,
         error.EndOfStream => { // process tail...
            const line = try rif.take(rif.end - rif.seek); // eek! better catch ReadFailed again!
            std.debug.print("final line: {s}\n", .{line});
         },
         error.StreamTooLong => return err, // or just else => return err,
      }
      // ...

Here, we don’t have an optional result, so don’t have the if-check for a null result, but assign to line directly. Instead, if the end of the stream is reached before a delimiter is reached, EndOfStream is returned. In this case, there may be “tail” data if the file’s last byte was not a \n - then, the last line of the file would be missed if not for that tail handler.

Using the “stream” pattern

Streaming is a typical approach, and often relies on allocating memory along the way:

   if (std.Io.Dir.cwd().openFile(io, "test-filename", .{})) |file| {
      defer file.close(io);

      var gpa = std.heap.DebugAllocator(.{}){};  // or just use std.testing.allocator directly
      defer _ = gpa.deinit();
      const alloc = gpa.allocator();

      var line = std.Io.Writer.Allocating.init(alloc);
      defer line.deinit();
      var buf: [64]u8 = undefined; // somewhat arbitrary buffer size
      var reader: std.Io.File.Reader = file.reader(io, &buf);
      while(reader.interface.streamDelimiter(&line.writer, '\n')) |written_count| {
         _ = written_count;
         _ = reader.interface.toss(1); // move past the delimiter
         std.debug.print("line: {s}\n", .{line.written()});
         line.clearRetainingCapacity(); // reset the buffer
      } else |err| switch (err) {
         error.ReadFailed, error.WriteFailed => return reader.err.?, // in this case, WriteFailed err detail is in reader.err
         error.EndOfStream => {
            if (line.written().len > 0) {
               std.debug.print("tail: {s}\n", .{line.written()});
            }
         },
         else => return err,
      }
   } else |err| switch (err) {
      error.FileNotFound, error.AccessDenied => {
         std.debug.print("unable to open file: {}\n", .{err});
         // loop back to try another or something
      },
      else => return err, // don't continue; rather, bomb out
   }

This example employs a DebugAllocator; lines in the file could be any arbitrary length as the Writer.Allocating will allocate more space as needed when streaming from the file.

This code block is the most entrenched, of course, but offers the advantage of handling completely unknown file sizes with completely unknown line sizes. Note the many essential defer lines to clean up (after openFile, DebugAllocator, and std.Io.Writer.Allocating.init().)

Byte-wise

The above examples read chunks of data from a file according to some delimiter (‘\n’ or end-of-file). Reading in more granular chunks is straightforward:

   // ...
   const byte = try reader.interface.takeByte();
   std.debug.print("Byte: {}\n", .{byte});
   const int = try reader.interface.takeInt(u32, .little);
   std.debug.print("u32 int: {}\n", .{int});

More reading options

readSliceAll, readSliceShort, readSliceEndian, takeStruct, takeStructPointer, as well as peak*(), discard*(), toss(), *alloc() variants, and others are all worth investigation. With peek* and take* functions, in particular, you can simplify your code, reading non-uniform data, by taking advantage of the reader’s buffering.

Writing

A simple buffer-centric writer (for testing, e.g.) can be constructed with fixed() (fixed() exists for Reader, as well, by the way):

   var buf: [1024]u8 = undefined;
   var writer: std.Io.Writer = .fixed(&buf);

Note that this code, the code that creates the writer, will be responsible for calling flush() (below), and responsible for NOT propagating error.WriteFailed errors, but, instead, unwrapping those errors; if, in an async/concurrent context, error.Canceled is an error return, then it usually should be propagated in order for cancelation to work correctly. I’ll keep the following code in “simple” form for illustrative purposes, though…

Writing follows in expected ways (note the use of writer, below, not writer.interface, because writer, created above, is an instance of Io.Writer, not a specific implementation):

   try writer.writeByte(byte);
   try writer.writeInt(u32, int, .little);

Note that several functions, such as write() return usize, indicating “bytes transferred”, but note that the entire payload may not be transferred yet, so don’t rely on the return value matching the number of bytes in a buffer sent to such functions. Other functions, such as writeAll(), do not return bytes transferred, but, rather, call drain() repeatedly until all bytes are transferred. See *drain() functions for more.

Importantly, after a block of writing functions, be sure to flush():

   try writer.flush(); // useless for our .fixed() writer, but...

Note that flush() is useless (no-op) for our fixed() buffer-destined writer, but a more typical, e.g., File.Writer, would require flush() at the finish.

See also print() and print*() variants, sendFile() and send*() variants, and splat*() variants.

More on async

Especially significant, with cancelable I/O, are async variants, and their proper error handling. note, from the 0.16 release notes - “Future, Group, and Batch APIs all support requesting cancelation. When cancelation is requested, the request may or may not be acknowledged. Acknowledged cancelation requests cause I/O operations to return error.Canceled…” That resource goes on to coach your decision between propagating, io.recancel()ing, or declaring unreachable via io.swapCancelProtection(). An async version of the above openFile() step might do something like this:

   var open_task = io.async(Io.Dir.openFile, .{ .cwd(), io, "test-filename", .{} });
   defer if (open_task.cancel(io)) |file| file.close(io) else |_| {};
   // ... continue with the read operations

You can read elsewhere about how cancelation is equivalent to awaiting, in the sense that the above code does not continue until the file is open. But defering a cancelation after creating an Io task is preferred to await() in order to ensure resource clean-up. (Note, that is, that the file may have successfully been opened, just before the task was canceled, and so required the immediate re-close()!)

Sources

21 Likes

This might encourage people to create, or use a global instance, where they shouldn’t. It should be mentioned that an application’s main function should be deciding the Io (especially for libraries), and that it itself can let std choose via juicy main.

It is also odd that you use an if instead of catch in that example. But that is not important

writeAll should be mentioned


One thing that annoys me is never mentioned is the very nice peek* and take* api, that take advantage of the readers buffering you can use to simplify your code. You dont need to go into too much detail about them, just acknowledge their existance and maybe an example reading non uniform data.
Or that could be in a seperate post, idk.

3 Likes

Super, thanks. I can do all of that. I thought about peek* and take* (and recently took very nice advantage of take* in some code, but couldn’t decide if it was “making something long… longer” or not. :slight_smile: If it doesn’t already feel too long, I’m happy to add a little treatment.

Definitely should mention juicy main and the io available there. The earliest “notes” that led to this were mad pre-juicy, and never evolved.

(I think it’s normal “form” to edit the OP in place, here in docs, so will do that. Though probably tomorrow.)

One important pattern:

foo(&file_writer.interface) catch |err| switch (err) {
    error.WriteFailed => return file_writer.err.?,
};

Specifically, the code that creates the Io.Writer (i.e. the one also responsible for flushing) should not propagate error.WriteFailed, instead unwrap it into the specific error that occurred. Importantly, now with std.Io, one of those unwrapped error codes could be error.Canceled which you’ll want to propagate in order for cancelation to work properly. That was already true with other error codes like error.OutOfMemory, but now the stakes are raised somewhat.

Same deal with Io.Reader and error.ReadFailed.

22 Likes

Ok, treated the io-getting bit, including reference to juicy-main io, but didn’t yet get to writeAll() or peek* or take* (though I did include takeByte() and takeInt() examples, in the original, and referred to takeStruct() and some others, in brief passing… but I think I know what you’re after, @vulpesx, and I’ll get it in soon).

Additionally, I think I addressed error.Canceled and some related bits well. Please critique, and I can address… I haven’t spent enough time in async to feel confident, and, I confess, I haven’t yet run my code changes… I just used the Force. (But I will, and will correct as needed, when I get back to my compiler; I just have a typewriter here right now.)

(You should be able to ‘diff’ the OP for the edit.)

1 Like

Actually, I didn’t do this quite right… I’ll fix … reflections can be held a bit.

So, your message was clear, but I had some blinders … I used a lunch-break and jumped too quickly. I studied code later, and think I covered this properly now, but am happy to make changes. Indeed, missing XX.Reader/Writer.err, especially in light of error.Canceled, was a major omission.

All covered now, I think. Feel free to disagree, or even nit-pick. I did think about more detail on a couple of your suggestions, but felt things were already getting long. Hopefully in a good way. But perhaps more detail is for another treatment.

I would like to suggest changing this to File.stat() EDIT[1]: File.length() instead. Using a path-based stat syscall is good for fetching information on files without opening them.

If you intend to work with the files further, you should first be opening them. The reasoning is simple: in an ever-changing file system, it may happen that the file gets either deleted, or worse, swapped for a different one in between the stat call and the open call. If that happens, you’re opening a different file than you’ve stated. If you open the file first, the OS guarantees that all calls on the returned file descriptor refer to that same file – even if it gets deleted[2], even if it gets swapped out after opening.


  1. Thanks for pointing it out, @miagi! ↩︎

  2. Linux actually tends to keep the file in the background until the descriptor is closed ↩︎

4 Likes

Pattern that I like to use is try file.length(io). I refactored this code for showing purposes here to unwrap ReadFailed and to use if block for file opening:

if (std.Io.Dir.cwd().openFile(io, path, .{
    .mode = .read_only,
    .lock = .exclusive,
})) |file| {
    defer file.close(io);

    const buf = try gpa.alloc(u8, try file.length(io));
    var reader = file.reader(io, buf);
    // Read all content of a file into buffer
    reader.interface.readSliceAll(buf) catch |err| switch (err) {
        error.ReadFailed => return reader.err.?,
        else => return err,
    };
} else |err| {
    return err;
}

EDIT:
Here is a documentation of that function from LSP:

fn length(file: File, io: Io) LengthError!u64

(fn (File, Io) error{...}!u64)

Go to [File](file:///opt/homebrew/Caskroom/zig@nightly/0.17.0-dev.56+a8226cd53/zig-aarch64-macos-0.17.0-dev.56+a8226cd53/lib/std/Io/File.zig#L1) | [Io](file:///opt/homebrew/Caskroom/zig@nightly/0.17.0-dev.56+a8226cd53/zig-aarch64-macos-0.17.0-dev.56+a8226cd53/lib/std/Io.zig#L1)

Retrieve the ending byte index of the file.

Sometimes cheaper than `stat` if only the length is needed.

3 Likes

Ah, I didn’t know there was a separate File.length() method, I was coming off of the syscalls that I knew. Good info to have!

2 Likes

Excellent, thanks guys, I’ll fix. I’m not sure file.length() existed when I wrote that (quite awhile back, when 0.16 was still pretty fresh) - I fixed several other things upon review before posting this recently, but missed that one!

1 Like

Actually, I see that there’s a little more to it. It was a mistake, I think, to put that reference to std.Io.File.Reader up at the top, there, in the “Reading a whole file” section, where the simple std.Io.Dir.readFile is showcased. I’m inclined to think that the Dir.readFile() itself probably wants an allocating variant so that it can atomically size a buffer and read the whole thing. All of my later examples do indeed use .openFile(), but some read line-by-line, so wouldn’t really benefit from the file size, so gathering that intel for those examples is moot… I’ll have to give it a little more thought, and, importantly, I’ll consider a PR for a version of the whalloping Dir.readFile(), for an atomic(ish) allocating buffer approach to work well (since you don’t actually have the file object, to call .length() on, in that case).

2 Likes

(And yes, this is a legitimate alternative… but if I want to expose the slimmer simple Dir.readFile() offering, combined with a variable destination buffer, then….)

1 Like

Well, actually, it has one; I went to PR a std.Io.Dir.readFileAlloc(), to discover it already exists. It’s not as symmetric with readFile() as I’d expect, and I want to understand that better, so I’ll pursue that in another thread. In the meantime, it suffices, in principle, for that bit of this article, so that’s the change I made for now.

1 Like

Huge shout out to IP for compiling this, and I personally would greatly appreciate continued work on this living document

3 Likes

I’m glad it’s helpful. I benefit from so many (as the attributions indicate), and appreciate human clarity in documentation. Everybody says zig is fun, well, I think it should be fun (and is fun) to document. The code is always best, but sometimes it takes longer to grok than examples.

I don’t think this will “grow” much, though, even if it’s “living”. The super std.Io overview doc that also followed on the heals of the 0.16 release might spawn child documents for the many domains of interest barely touched, this one might have a few spinoffs, but not likely much more meat in its own belly.

1 Like

Should this guide also cover how to use stdin, stdout, stderr? or would we expect to cover those elsewhere?

1 Like

Yes! … maybe. :slight_smile: I’m happy to give that a shot, but feel more of a leaning to include a link to a guide that focuses more narrowly on stdin/out/err, perhaps with a command-line-tool audience in mind, in particular. Otoh, perhaps there’s no so much to it that it would make more than a paragraph, and I should just squeeze it in. It just feels like the kind of thing that wants a “mini-reference” of its own… a little go-to.

?

1 Like

Once you dig into it, std handles have a giant bag of expectations, platform eccentricities, and more, so deserve a treatment all their own.

1 Like