Mixing stdin / stdout, named files and file operations

I will need to operate on several files; in some cases, I may want one of them to be stdin or stdout. So I will need to either open a named file or grab an already existing stream.

In some cases, I will want to seek the stream to the beginning or the end. Most of the time, I will want to read chunks from a file and write those chunks to another, a la cp, hopefully buffering the data.

Can anybody here help me put together an example of how this would be done with modern (post 0.15) zig? I promise to summarize everything in one final working program.

In C, I would have a routine do_work(FILE* inp, FILE* out), and I would call it with stdin / stdout / the result of fopen("blah", mode). Even some guidance as to which are the correct zig types for such a thing would be helpful.

For an even more concrete example, how would one write a routine copy_file(src_name, tgt_name)?

Thanks, and apologies for such a basic question. I find the whole new Io a bit impenetrable.

First, you can’t seek ‘stream’ files like stdio, you will need to check beforehand if you can seek or not to branch to different implementations, or somehow handle the error from seeking an unseekable file. IDK how you’d do that, cause IDK what you need to do.

The new reader/writers are able to detect that they are streaming from a file to another file and use a single file copy syscall automatically, this isn’t available on all platforms so it falls back to normal reads/writes.

It ofc only works if you don’t do anything in between.

You can explicitly call sendFile on the writer to force the optimisation, but this requires that you pass a *std.fs.File.Reader instead of the interface in your functions, but you need to seek so you already have that.

You shouldn’t do this manually as it would harm cross-platform functionality as not all Os’ support this optimisation so you’d need to fall back manually to normal reads/writes which the normal api already does for you.

To answer your question directly, if you want to handle everything yourself:

fn doWork(in: std.fs.File, out: std.fs.File) !void

or if you want the caller to set up the File.Reader/File.Writer:

fn doWork(in: *std.fs.File.Reader, out: *std.fs.File.Writer) !void

or if you don’t need to do anything File-specific:

fn doWork(in: *std.Io.Reader, out: *std.Io.Writer) !void

See std.fs.Dir.copyFile:

Yeah, sorry, I just threw into my message everything I may need to re-learn about std.Io. I am aware you cannot seek on stdin.

Let’s just refocus on implementing an equivalent of a C copy_file(FILE* src, FILE* tgt) routine, where src and tgt might come from named filesyten files opened with fopen(), or they could be stdin / stdout. The routine should iterate, reading from src a fixed memory block (say, 4KB) and write to tgt.

A separate question would be, how do you seek in a stream to the beginning / end (assuming the stream is related to a filesystem file).

Thanks again.

That’s handled for you by the interface.

const std = @import("std");

pub fn main() !void {
    var src = try std.fs.cwd().openFile("16KiB-file", .{});
    defer src.close();

    var dest = try std.fs.cwd().createFile("copied-file", .{});
    defer dest.close();

    // The reader's buffer can be zero-length since `sendFile`
    // will read into the writer's buffer if necessary and we know
    // we are using a writer with a non-zero-length buffer
    var reader = src.reader(&.{});

    var write_buf: [4096]u8 = undefined;
    var writer = dest.writer(&write_buf);

    const bytes_written = try writer.interface.sendFileAll(&reader, .unlimited);
    _ = bytes_written;
    try writer.interface.flush();
}

Running that with strace gives:

openat(AT_FDCWD, "16KiB-file", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 3
openat(AT_FDCWD, "copied-file", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096
pwritev(4, [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], 1, 0) = 4096
pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4096) = 4096
pwritev(4, [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], 1, 4096) = 4096
pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 8192) = 4096
pwritev(4, [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], 1, 8192) = 4096
pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 12288) = 4096
pwritev(4, [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], 1, 12288) = 4096
pread64(3, "", 4096, 16384)             = 0
close(4)                                = 0
close(3)                                = 0

However, in this case it should be possible to use fd-to-fd syscalls instead of falling back to pread/pwrite, but for me that seems to (probably unintentionally) require linking libc:

$ zig build-exe test.zig -lc
$ strace ./test
...
openat(AT_FDCWD, "16KiB-file", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 3
openat(AT_FDCWD, "copied-file", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
copy_file_range(3, NULL, 4, [0], 18446744073709551615, 0) = 16384
copy_file_range(3, NULL, 4, [16384], 18446744073709535231, 0) = 0
close(4)                                = 0
close(3)                                = 0

For some reason, std.fs.File.Writer doesn’t do copy_file_range syscall when libc isn’t linked, there is a zig native implementation.

I would say this is a bug, since there isnt any comment explaining why its done that way

If you set the writer to streaming mode, it will use the sendFile syscall which should be even less syscalls than copy_file_range, you can do that by using writerStreaming instead of writer.

1 Like
1 Like