File I/O basics (0.16)

(Note, this is a variation on this article, invited into docs with the release of 0.16.)

As always, reply/comment all you want, and I’ll endeavor to make changes accordingly.

Motivation

Zig 0.16 brought big changes to I/O. std.fs references deprecate everywhere, and understanding basic use of std.Io is in order. Zig’s new I/O model brings I/O into alignment with the Allocator model - all of Zig’s “file system, networking, timers, synchronization, and pretty much everything that can block [is moved] into a new std.Io interface. All code that performs I/O will need access to an Io instance, similar to how all code that allocates memory needs access to an Allocator instance.” ref.

What used to be std.fs.cwd().openFile(path, .{}) becomes std.Io.Dir.cwd().openFile(io, path, .{}), and likewise throughout the interface: the io arg is in every function that does I/O. This allows you to choose your I/O model (synchronous, asynchronous - coroutines, threads, etc.) with just a swap of the Io you instantiate and provide to all of your I/O calls.

Most of this article is an attempt to provide basic examples and discussion around several typical file (or file-like) I/O use cases…

Reading a whole file

First, let’s simply read a whole file:

   const std = @import("std");
   //...
   var buf: [10240]u8 = undefined; // must be big enough for entire file
   const io = std.testing.io; // consider std.Io.Threaded (using .init_single_threaded()), e.g., instead)
   const contents = try std.Io.Dir.readFile(std.Io.Dir.cwd(), io, "test-filename", &buf);
   var tok = std.mem.tokenizeSequence(u8, contents, "\n");
   while (tok.next()) |line| {
      std.debug.print("line: {s}\n", .{line});
   }

This reads the entire file, or as much as can fit into buf. It returns contents, but it’s important to realize that contents is just a slice on buf - no memory is magically materialized! Still, you want to use contents, not your buf, directly, because contents is a proper slice with .len that corresponds to the data read. Note that no error is returned if buf is not big enough for the file; rather, only buf.len bytes are read, to fill buf. The remainder of the file remains unread. Thus, in real code, you should either be confident that you know the length of the file, or check the length of the result (contents.len < buf.len) to deal accordingly. You can, of course, Dir.statPath() the file, then use the resulting stat.size value to allocate a buffer of sufficient size; the above is just a simple stack-based example.

If you wanted to allocate such a “buffer of sufficient size”, familiarity with zig’s Allocators is essential. Briefly, though:

   // ... alternatively ...
   var buf: []u8 = try allocator.alloc(u8, size); // size might be the result of std.Io.Dir.statPath()
   var reader: std.Io.File.Reader = file.reader(io, buf); // note NO &buf here, just buf, which is already a pointer in this case
   // ...

Finally, note that the above example just propagates all errors, with try; in examples below, we’ll see some more realistic error handling.

Reading line-by-line

The above file was treated like a ‘\n’-line-delimited text file, and we’ll continue assuming “text files” for awhile. Rather than reading the whole file, what if we just wanted to read line-by-line (and thus reduce our memory requirement for each read operation)…

   const io = std.testing.io; // consider std.Io.Threaded (using .init_single_threaded()), e.g., instead)
   if (std.Io.Dir.cwd().openFile(io, "test-filename", .{})) |file| {
      defer file.close(io);
      var buf: [1024]u8 = undefined; // must be big enough for longest line
      var reader: std.Io.File.Reader = file.reader(io, &buf);
      while (try reader.interface.takeDelimiter('\n')) |line| {
         std.debug.print("line: {s}\n", .{line});
      }
   } else |err| switch (err) {
      error.FileNotFound, error.AccessDenied => {
         std.debug.print("unable to open file: {}\n", .{err});
         // loop back to try another or something
      },
      else => |e| return e, // don't continue; rather, bomb out
   }

This example uses takeDelimiter() and includes more robust error handling. First, note that takeDelimiter() is a function of Io.Reader, but our reader, here, is actually an Io.File.Reader. Io.XX.Reader objects carry an interface, which must be used to access functions like takeDelimiter(). One common mistake, though, involves assigning that interface improperly:

     const reader = &file_reader.interface; // right way
   //const reader =  file_reader.interface; // BAD!! BAD!!!  DON'T DO!

(The reference is necessary because the interface needs it’s connection to the parent reader (File.Reader, in this case), and a copy would isolate it from its parent.) To avoid this, one good pattern is to just always use reader.interface.foo() - that is, always type out the whole thing. Sometimes this is too verbose, given the context, so, if you need to create a const, make sure it’s a const reference, and not a copy. In the line-by-line reading example, above, you see the verbose reader.interface.takeDelimiter() pattern.

Error handling: first, if the openFile() fails, the code switches to handle FileNotFound and AccessDenied errors as redeemable - perhaps the user is given a chance to choose another file(name). But if any other errors are returned, the final else just propagates the error (which may rise to the top and result in a panic). Also note: the earlier example, with std.Io.Dir.readFile() never explicitly needed to openFile(), so did not need the defer file.close(io) that is essential here.

This example also uses a stack buffer, buf - this time just 1024 bytes large; this suggests that we know that the lines of the file are less than 1024 bytes each, or else a whole line would not fit into buf when takeDelimiter() tried to read the line, and takeDelimiter() would return StreamTooLong. Note that tossBuffered() does NOT need to be called, even if a line is 1023 bytes long, because each subsequent call to takeDelimiter() will assume responsibility for that (line will be invalidated at that point, since line is just a slice reference into buf, and buf must be available for the next read). The use of an allocator, rather than the stack, is demonstrated above in the first example.

This example also relies on try to auto-propagate on errors returned from takeDelimiter(). The following code would do the same, but provides an opportunity to handle errors explicitly:

      // ... alternatively ...
      while(reader.interface.takeDelimiter('\n')) |result| if (result) |line| {
         std.debug.print("line: {s}\n", .{line});
      } else break else |err| {
         return err; // or handle elsewise
      }
      // ...

In this case, the “compound-optional” result of takeDelimiter() means that an error can be handled with else, but if takeDelimiter() succeeds, result’s type is still an optional that needs to be unwrapped (since it could be null). So, errors are handled in an else and result must be if-checked to confirm that it’s not null, before finally processing the contents of result (unwrapped to line). This code behaves exactly as the above code since the error handling here merely passes the error up with a return err, just as try would do in try takeDelimiter().

The advantage of this more verbose approach is, of course, being able to take advantage of error and null results.

Yet another approach is to use takeDelimiterInclusive() or takeDelimiterExclusive():

      // ... alternatively ...
      var rif = &reader.interface; // careful to take the address &!
      while(rif.takeDelimiterInclusive('\n')) |line| {
         std.debug.print("line: {s}", .{line});
      } else |err| switch (err) {
         error.EndOfStream => {
            const line = try rif.take(rif.end - rif.seek);
            std.debug.print("final line: {s}\n", .{line});
         },
         error.StreamTooLong, error.ReadFailed => |e| return e, // or just else => |e| return e,
      }
      // ...

Here, we don’t have an optional result, so don’t have the if-check for a null result, but assign to line directly. Instead, if the end of the stream is reached before a delimiter is reached, EndOfStream is returned. In this case, there may be “tail” data if the file’s last byte was not a \n - then, the last line of the file would be missed if not for that tail handler. It can be a little cumbersome to nest in error handling within an error-handling block; for this example, try was used for simplicity.

Using the “stream” pattern

Streaming is a typical approach, and often relies on allocating memory along the way:

   if (std.Io.Dir.cwd().openFile(io, "test-filename", .{})) |file| {
      defer file.close(io);

      var gpa = std.heap.DebugAllocator(.{}){};
      defer _ = gpa.deinit();
      const alloc = gpa.allocator();

      var line = std.Io.Writer.Allocating.init(alloc);
      defer line.deinit();
      var buf: [64]u8 = undefined; // somewhat arbitrary buffer size
      var reader: std.Io.File.Reader = file.reader(io, &buf);
      while(reader.interface.streamDelimiter(&line.writer, '\n')) |written_count| {
         _ = written_count;
         _ = reader.interface.toss(1); // move past the delimiter
         std.debug.print("line: {s}\n", .{line.written()});
         line.clearRetainingCapacity(); // reset the buffer
      } else |err| switch (err) {
         error.EndOfStream => {
            if (line.written().len > 0) {
               std.debug.print("tail: {s}\n", .{line.written()});
            }
         },
         error.WriteFailed, error.ReadFailed => |e| return e, // or just else => |e| return e,
      }
   } else |err| switch (err) {
      error.FileNotFound, error.AccessDenied => {
         std.debug.print("unable to open file: {}\n", .{err});
         // loop back to try another or something
      },
      else => |e| return e, // don't continue; rather, bomb out
   }

This example employs a DebugAllocator; lines in the file could be any arbitrary length as the Writer.Allocating will allocate more space as needed when streaming from the file.

This code block is the most entrenched, of course, but offers the advantage of handling completely unknown file sizes with completely unknown line sizes. Note the many essential defer lines to clean up (after openFile, DebugAllocator, and std.Io.Writer.Allocating.init().)

Byte-wise

The above examples read chunks of data from a file according to some delimiter (‘\n’ or end-of-file). Reading in more granular chunks is straightforward:

   // ...
   const byte = try reader.interface.takeByte();
   std.debug.print("Byte: {}\n", .{byte});
   const int = try reader.interface.takeInt(u32, .little);
   std.debug.print("u32 int: {}\n", .{int});

More reading options

readSliceAll, readSliceShort, readSliceEndian, takeStruct, takeStructPointer, as well as peak*(), discard*(), toss(), *alloc() variants, and others are all worth investigation.

Writing

A simple buffer-centric writer (for testing, e.g.) can be constructed with fixed() (fixed() exists for Reader, as well, by the way):

   var buf: [1024]u8 = undefined;
   var writer: std.Io.Writer = .fixed(&buf);

Writing follows in expected ways (note the use of writer, below, not writer.interface, because writer, created above, is an instance of Io.Writer, not a specific implementation):

   try writer.writeByte(byte);
   try writer.writeInt(u32, int, .little);

Note that several functions, such as write() return usize, indicating “bytes transferred”, but note that the entire payload may not be transferred yet, so don’t rely on the return value matching the number of bytes in a buffer sent to such functions. See *drain() functions for more, and, importantly, after a block of writing functions, be sure to flush():

   try writer.flush(); // useless for our .fixed() writer, but a more typical writer.interface.flush(), for, e.g., a file writer, would be effective

Note that the above is useless for our fixed() buffer-destined writer, but a more typical, e.g., file Writer, would require writer.interface.flush() to ensure all bytes are drained.

See also print() and print*() variants, sendFile() and send*() variants, and splat*() variants.

Sources

3 Likes

This might encourage people to create, or use a global instance, where they shouldn’t. It should be mentioned that an application’s main function should be deciding the Io (especially for libraries), and that it itself can let std choose via juicy main.

It is also odd that you use an if instead of catch in that example. But that is not important

writeAll should be mentioned


One thing that annoys me is never mentioned is the very nice peek* and take* api, that take advantage of the readers buffering you can use to simplify your code. You dont need to go into too much detail about them, just acknowledge their existance and maybe an example reading non uniform data.
Or that could be in a seperate post, idk.

1 Like

Super, thanks. I can do all of that. I thought about peek* and take* (and recently took very nice advantage of take* in some code, but couldn’t decide if it was “making something long… longer” or not. :slight_smile: If it doesn’t already feel too long, I’m happy to add a little treatment.

Definitely should mention juicy main and the io available there. The earliest “notes” that led to this were mad pre-juicy, and never evolved.

(I think it’s normal “form” to edit the OP in place, here in docs, so will do that. Though probably tomorrow.)

One important pattern:

foo(&file_writer.interface) catch |err| switch (err) {
    error.WriteFailed => return file_writer.err.?,
};

Specifically, the code that creates the Io.Writer (i.e. the one also responsible for flushing) should not propagate error.WriteFailed, instead unwrap it into the specific error that occurred. Importantly, now with std.Io, one of those unwrapped error codes could be error.Canceled which you’ll want to propagate in order for cancelation to work properly. That was already true with other error codes like error.OutOfMemory, but now the stakes are raised somewhat.

Same deal with Io.Reader and error.ReadFailed.

2 Likes