When to flush

const std = @import("std");
const fmt = std.fmt;

var stdout_buffer = [_]u8{0} ** 1024;
var stdout_writer = std.fs.File.stdout().writer(&stdout_buffer);
var stdout = &stdout_writer.interface;

pub fn main() !void {
    var name_buffer = [_]u8{0} ** 256;

    for (0..60) |i| {
        const file_name = try fmt.bufPrint(&name_buffer, "output-{d:02}.ppm", .{i});

        var file_buffer = [_]u8{0} ** 1024;
        var fp = try std.fs.cwd().createFile(file_name, .{});
        defer fp.close();
        var file_writer = fp.writer(&file_buffer);
        var file = &file_writer.interface;

        const h, const w = .{ 9 * 60, 16 * 60 };

        try file.print("P6\n", .{});
        try file.print("{d} {d}\n", .{ w, h });
        try file.print("255\n", .{});

        for (0..h) |x| {
            for (0..w) |y| {
                if (((x + i) / 60 + (y) / 60) % 2 == 0) {
                    try file.writeByte(0x00);
                    try file.writeByte(0xff);
                    try file.writeByte(0x00);
                } else {
                    try file.writeByte(0xff);
                    try file.writeByte(0xff);
                    try file.writeByte(0xff);
                }
            }
        }
        try file.flush();
    }

    try stdout.flush();
}

I am new to zig and learning it . So I found this video from Tsoding about ppm and decided to write it in Zig . It’s a simple program but I runs a bit slower then the c version . I think file.flush() part is slowing the program .
The c version can be found here .
I would be very grateful to get some input in this program.

That flush seems fine, since you’re flushing each file once.

However, I can’t reproduce your results on Windows (I deleted the stdout part of your code since the stdout handle can’t be gotten at comptime on Windows).

Built via:

> zig build-exe checker.zig -OReleaseFast -femit-bin=checker-zig.exe
> zig build-exe checker.c -OReleaseFast -lc -femit-bin=checker-c.exe

Benchmarked with poop (you’d get more information from this since you’re not on Windows):

Benchmark 1 (4 runs): checker-c.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.39s  ± 20.8ms    1.38s  … 1.42s           0 ( 0%)        0%
  peak_rss           3.46MB ±  145KB    3.38MB … 3.67MB          0 ( 0%)        0%
Benchmark 2 (8 runs): checker-zig.exe
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           665ms ± 44.3ms     621ms …  755ms          0 ( 0%)        ⚡- 52.3% ±  3.8%
  peak_rss           3.02MB ± 84.4KB    2.99MB … 3.23MB          3 (38%)        ⚡- 12.7% ±  4.2%
  1. How are you compiling each program? What optimization mode? My guess is that you’re (perhaps unknowingly) using Debug mode with the self-hosted backend when compiling the Zig version, which won’t compare favorably to much just yet
  2. Since you’re not on Windows, try running each with strace -c to get a sense of the difference in syscalls
2 Likes

My guess would be buffer size.
Each file receives 1.555.200 Bytes of data, which means with a buffer of 1024 Bytes, you should end up with 1519 syscalls related to writing.
If you increase your buffer size times 4, you should only get about 380 syscalls related to writing.
So what’s the default buffer size in C? Depends on the libc implementation.
By default Zig uses on Linux the target x86_64-linux-gnu (at least for zig cc) which links to glibc.
So what does glibc use by default? 4096 Bytes.
musl (which you can choose with the target x86_64-linux-musl) uses by default 1024 Bytes. So it would be interesting if you get similar performance to your Zig version by just switching to that.

6 Likes

Yes making the buffer size 4096 seems to do the tick .

- 4096 buffer
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.88    0.076165           3     22801         1 pwritev
  0.77    0.000591           9        60           openat
  0.20    0.000151           2        60           writev
  0.16    0.000122           2        60           close
  0.00    0.000000           0         1           rt_sigaction
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00    0.077029           3     22985         1 total

- 1024 buffer
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 91.86    0.265692           2     91081         1 pwritev
  4.05    0.011719         195        60           close
  3.98    0.011520         192        60           openat
  0.11    0.000309           5        60           writev
  0.00    0.000000           0         1           rt_sigaction
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00    0.289240           3     91265         1 total

- c version
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.32    0.049250           2     22860           write
  1.27    0.000636          10        62           openat
  0.21    0.000104           1        63           fstat
  0.21    0.000103           1        62           close
  0.00    0.000000           0         1           read
  0.00    0.000000           0         8           mmap
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         2           pread64
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           prlimit64
  0.00    0.000000           0         1           getrandom
  0.00    0.000000           0         1           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    0.050093           2     23073         1 total


1 Like