As an exercise to learn the new I/O interfaces in Zig 0.15, I wrote a base64 Writer
. It was harder than expected (partly because I detoured into fuzz testing), but it was fun and I learned a lot. This is a summary of my implementation notes, along with observations and questions.
Repo: GitHub - benburkert/base64.zig: base64 encoding Writer experiment
Background
I chose a base64 Writer
because I need an allocation-free encoder usable from a jsonStringify
function, and base64 is a straight forward and familiar.
The main requirement is that it only emit full base64 chunks (3 input bytes to 4 output characters) to avoid adding padding midstream. A single input stream should produce a single contiguous base64 sequence.
Design & Implementation
I wanted to reuse std.base64
as much as possible. std.base64.Base64Encoder.encodeWriter
had what I needed, except it consumes a single byte slice (source
) rather than a byte stream. So my writer implementation focused on chunking input to that function, not on base64 encoding. I added an explicit āfinal flushā function, end
, to output trailing padding.
- Is a writer allowed to retain data in
buffer
afterflush
? If not, a separate internal buffer may be required. defer w.end()
would be a nice pattern, but it can error, so it isnāt practical.
Next, I jumped strait into the implementation of the drain
function. Iām used to writers being a single (and straight-forward) write
function that takes a single byte buffer or stream source. In these other situations, a base64 writer implementation mostly involves chunking out writes 3 bytes at a time until the final write or end of the stream. Those other implementations are simple enough to reason through an effiecent implementation on the first try, so thatās what I did for this new writer.
- I assumed that once I had
drain
implemented the otherVTable
functions would be for optimizing, not correctness, but I was wrong.
After spending 2+ hours on multiple drain
implementations (all of them incorrect), I decided this was a dead-end approach. The combination of buffering built into the interface and the drain
function accepting a splat-able byte vector really complicates the amount of book keeping for an implementation. It was too much to keep in my head while trying to juggle both correctness and performance.
- I also feel like my intuition from other languages didnāt cary over here since the interface is such a departure from previous writers iāve encountered.
New Approach
At this point it was clear to me that this new I/O interface required a more rigorous approach even for a simple encoder like this one. So I shifted to a more deliberate strategy: start with a naive but correct implementation, then build scaffolding to evolve it while mechanically checking correctness and performance.
The scaffolding began with a fuzz-test corpus that I could grow into an exhaustive suite. The same corpus would support profiling to quantify improvements. First step: get a naive writer implementation working.
- Itās a lot of up front work for a simple encoder, but Iām also hoping the scaffolding could be generic enough for any new writer implementation.
Naive Implementation
The naive implementation is a drain
, flush
, and end
function that only works with a 3 byte buffer. drain
encodes one full chunk if available; otherwise it buffers a single input byte. flush
calls drain
if a full chunk is buffered; otherwise itās a no-op. end
encodes any remaining buffered bytes and adds padding.
- Itās interesting that a minimal
drain
implementation only needs to perform one tasks per invocation: a) move bytes frombuffer
tosink
, or b) move bytes fromdata
tobuffer
. - By only consuming at most one byte from the
data
vector, thesplat
argument can be ignored entirely. - My hunch is that I got a better feel for the new I/O interface implementing the naive implementation vs if the first approach had worked.
- The fixed size buffer requirement avoids extra book-keeping (and aliasing) when rebalancing the buffer.
- Would a
seek
field (like instd.Io.Reader.VTable
) help offload buffer rebalancing from writers?
For correctness, I wrote unit tests that use std.Io.Writer
functions that in turn directly call the drain
and flush
functions. Reading through the std.Io.Writer
implementation, I identified those as flush
, writeAll
, writeByte
, writeVecAll
, and writeSplatAll
. Getting initial coverage on these different āoperationsā gave me enough confidence on the implementation to move onto fuzz testing.
- For a writer that provides more than just
drain
&flush
, there will most likely be more function coverage required beyond those five functions.
Fuzz Testing
Building on the idea of unit testing the different writer āoperationsā, my goal with fuzz here is to generate a large amount of random operations on a writer, then check that the final output matched the expected output. The fuzz test would exercise all the varitions I was unable to hold in my head earlier. Once I had the initial set of test cases, the fuzzer could expand the test casees and shake out all the bugs I added in the implementation.
- I used afl (zig-afl-kit) because the builtin fuzzer didnāt work on my mac laptop.
I started with a compact encoding of the test case because I learned from my previous experience with fuzzing that compact representations are the best for mutating. For example, "Zm9v:min:a:Zm9v"
is a test case that creates writer with a minimaly-sized buffer (min
), calls writeAll
(a
) with "foo"
(Zm9v
), and expects the output "foo"
(the first Zm9v
).
- This custom ASCII encoding may still be too verbose, I wasnāt very happy with cases generated by the fuzz tester, I suspect that thereās still to much entropy even in the compact form.
- I included the buffer length in the test case because fuzzing & profiling could be a good way to determine optimal buffer size. (And possibly provide some numbers to support or refute Andrewās statement about always using the smallest valid buffer.)
I then added zon encoding for the test cases, and used an LLM to generate an initial set of 30 zon tests to use as inputs to the fuzzer. I also wrote a command to convert a zon encoded test case to the compact encoding for the fuzzer.
- Here are all the zon encoded test cases, and here is the compressed versions.
- Itās slightly annoying that the zon parser doesnāt have a function that takes a
std.Io.Writer
yet. - I really like that zon deserializing errors turn into compile errors when you
@import
a.zon
.
A roughly 8-hour run produced ~1,800 corpus files over ~850 cycles. Most were invalid cases; some legitimate crashes appeared and mostly traced to integer overflows in std.Io.Writer
with absurdly large splats. Hangs were usually huge allocations. Thereās more triage to do here.
- I doubt fuzzing will really pay off until the implementation is optimized.
Profiling
I added two profiling commands that run all test cases 10x and exit. The first is a baseline that writes directly to the sink. The second writes through the base64 writer.
Benchmark 1 (159 runs): zig-out/bin/fuzz-bench-nop
measurement mean ± Ļ min ⦠max outliers delta
wall_time 50.8ms ± 13.5ms 21.8ms ⦠169ms 25 (16%) 0%
peak_rss 2.97MB ± 10.8KB 2.97MB ⦠3.06MB 5 ( 3%) 0%
cpu_cycles 5.85M ± 2.06M 2.41M ⦠27.9M 22 (14%) 0%
instructions 3.87M ± 1.26M 1.64M ⦠17.2M 34 (21%) 0%
cache_misses 122K ± 35.7K 43.0K ⦠472K 24 (15%) 0%
branch_misses 8.13K ± 3.25K 2.30K ⦠42.4K 17 (11%) 0%
Benchmark 2 (90 runs): zig-out/bin/fuzz-bench-b64
measurement mean ± Ļ min ⦠max outliers delta
wall_time 348ms ± 34.9ms 302ms ⦠514ms 10 (11%) š©+585.1% ± 12.0%
peak_rss 3.31MB ± 3.45KB 3.31MB ⦠3.34MB 1 ( 1%) š©+ 11.6% ± 0.1%
cpu_cycles 44.6M ± 5.53M 37.6M ⦠70.5M 10 (11%) š©+662.9% ± 16.4%
instructions 27.9M ± 4.13M 24.2M ⦠48.1M 9 (10%) š©+620.1% ± 17.9%
cache_misses 919K ± 78.7K 821K ⦠1.36M 7 ( 8%) š©+651.8% ± 11.7%
branch_misses 39.4K ± 13.7K 31.0K ⦠106K 12 (13%) š©+384.0% ± 27.4%
- The naive base64 writer is over five times slower than the baseline, which performed better than I expected, but thereās certainly room for improvement.
- I used this fork of poop that works on macOS
Conclusion
After building out the scaffolding, I havenāt yet optimized the writer itself. I hope to make a follow-up post once I do. Or maybe I can nerd-snipe someone into doing that work; the tooling is in place now!