Implementing a zero-copy block cache reader with current `rebase` semantics

pzittlau · June 17, 2026, 9:16am

Hi everyone,

I’m currently developing an append-only, structured stream database/storage. It is able to support multiple streams, where each of the streams has one append-only writer and functionally unlimited concurrent seekable readers.

Internally I have a BufferManager that keeps a freely accessible pool of 1MiB slots. The writer appends directly to one of these slots, and when it is full, it is compressed, flushed to disk, and the writer rotates to a new one. Any data a writer has written is regarded as final and cannot be changed. Concurrent readers are strictly coordinated in a pipeline behind the writer, meaning they can read both flushed historical blocks and the active uncompressed block of the writer of their stream.

To achieve maximum throughput and avoid memory copies, my goal is to map the Io.Reader.buffer directly to the pinned 1MiB slots in the cache pool. This allows users to avoid redundant copies.

I’ve already written the writer. The only roadblock is the reader due to the semantics of rebase and RebaseError.

The problem with `rebase` for fixed chunks

The interface expects that if a caller wants a contiguous view (peek and friends), the implementation can expand the available contiguous view via rebase:

    /// Ensures `capacity` data can be buffered without rebasing.
    ///
    /// Asserts `capacity` is within buffer capacity, or that the stream ends
    /// within `capacity` bytes.
    ///
    /// Only called when `capacity` cannot be satisfied by unused capacity of
    /// `buffer`.
    ///
    /// The default implementation moves buffered data to the start of
    /// `buffer`, setting `seek` to zero, and cannot fail.
    rebase: *const fn (r: *Reader, capacity: usize) RebaseError!void = defaultRebase,

In particular the second sentence is problematic. The problem in this case is at the block boundary. This is because we can’t shift inside the cache because the data is semantically immutable and because the error set allowed to return from rebase is just error{ ReadFailed, EndOfStream}. If I return EndOfStream, I falsely tell the caller the stream is finished; if I return ReadFailed, I crash the pipeline with a generic read failure.

The problem is mostly, that StreamTooLong isn’t in the set as this comment in peekDelimiterInclusive also acknowledges:

    // It might or might not be end of stream. There is no more buffer space
    // left to disambiguate. If `StreamTooLong` was added to `RebaseError` then
    // this logic could be replaced by removing the exit condition from the
    // above while loop. That error code would represent when `buffer` capacity
    // is too small for an operation, replacing the current use of asserts.

Rebasing just doesn’t really make sense in this case, which is okay and maybe I’m trying to hard to press this into this interface. Though I would like to be able to provide a better error than ReadFailed.

Possible Solutions

I see a few ways to solve this for my usecase:

Add a buffer for all intermediate reads. Obviously bad because of unnecessary double copies and additional unnecessary virtual function calls to fill the buffer.
Add a buffer for reads crossing chunk boundaries. This avoids double copying for most things but increases the implementation complexity by a lot and also requires a buffer of 1MiB^[1] for each Reader because of the semantics of the capacity parameter and it’s interplay with the buffer size for rebase.
Just throw a @compileError to disallow everything that uses rebase. This would mean that, as far as I can see, I can’t use any of the peek functions even though they are fine to call in the vast majority of the cases. It fortunately would still allow me to use nearly everything that copies things out of the buffer like readStruct, readSlice, stream, and most delimiter functions, but this goes somewhat against the zero copy goal.
Wait for the std to add something like StreamTooLong to rebase^[2]. And then use the peek functions and use the copying ones as a fallback.

I think the best thing would obviously be 4.

Has anybody run into similar problems and have I missed something that I could do or how I could resolve this? Or is the Io.Reader just the wrong thing for what I’m trying to achieve?

Thank you

cache block size ↩︎
or just change it in my local copy ↩︎

vulpesx · June 17, 2026, 9:40am

perhaps it makes sense for std to support the concept of StreamTooLong for rebase, I would make an issue for it, if there isn’t one already.

Regardless, with ReadFailed, it indicates an implementation specific error, you can store additional information in your implementation state that the caller could access, see std.Io.File.Reader as an example.

Try doing that and seeing how often it is encountered, it may not be worth worrying about.

It may also be worth providing multiple implementations with differing behaviour for different use cases.

pzittlau · June 17, 2026, 9:45am

I’ve searched a bit but didn’t find any. But also didn’t just want to open one willy nilly if I was just thinking about it in a wrong way.

Yeah that’s true and I’ll likely do this for now, though I find it a bit weak? in this case.

squeek502 · June 17, 2026, 9:54am

I’m currently focusing on these types of issues with the Reader/Writer interface.

The relevant issues here sound like:

`flate.Decompress`: direct_vtable cannot implement `discard`, `rebase`, and `readVec` · Issue #25035 · ziglang/zig · GitHub
- I don’t have a solution in mind for this yet, and it seems like the more extreme version of the problem you’re running into
Reader does not fully support the use case of keeping some amount of data buffered · Issue #25103 · ziglang/zig · GitHub
- This is related in the sense that some of the listed solutions are similar to the ones you’re suggesting, but the use case is different, and therefore the fix I have in mind might not end up being relevant here
- My current thinking on this particular issue is that I was wrong about the possible solutions–it should actually be viable to enforce that an implementation must always be able to rebase up to buffer.len. For the use case in the issue, the implementation can read new data during rebase if required to do so (note: this is not currently done for any existing Reader implementations in the standard library)

Just to be sure, you’re talking about using Readers with zero-length buffers?

EDIT: I think I misunderstood. I might need a small pseudo-code example to get on the same page.

Sze · June 17, 2026, 9:54am

If you are willing to use page mapping techniques this might be another possibility:

2.b map your read only chunks so that they map to physical pages and then use virtual page mapping to make the pages of two chunks appear as continuous.

pzittlau · June 17, 2026, 10:23am

Now that I think about it, I had a similar case for the rebase in the Io.Writer. There I just used failingRebase. Though there this isn’t that much of a problem, as rebase isn’t used as much, at least for me, and also because there isn’t another error there, that fit’s the semantics I’m trying to achieve.

I don’t think the issues capture what I’m trying to do, though second one is close and maybe even a general case of what I’m trying to do. This is just how I read it, but I imagine that there needs to be some kind of way for the concrete interface implementation to communicate to the caller that it needs to need some amount to be kept in the buffer to keep its semantics in tact. For the decompression this would be the window, for me this would be one entire chunk.

I think StreamTooLong or an integer like suggested in 1 in the issue might be the right call. I think the main thing about this, that needs to be relaxed is this all-or-nothingness.

No. I’m instantiating it with something like this:

pub fn init(stream: *Stream) Reader {
    const block = stream.buffer_manager.reserveBlock();
    const buf = stream.buffer_manager.getBlockData(block);
    return .{
        .stream = stream,
        .cached_slot = block,
        .logical_pos = 0,
        .interface = .{
            .vtable = &.{
                .stream = stream,
                .discard = discard,
                .readVec = readVec,
                .rebase = rebase,
            },
            .buffer = buf,
        },
    };
}

Here buf points directly into the slots of the buffer manager used for caching.

That’s a good idea but it will likely not work because the chunks can be read while they’re written to. And chunks of different streams are written out of order, so the cache is also unordered. And ordering the cache by stream is nigh impossible.

There are a few other options in the same vein that I’ve already discarded.
One would be to write each stream to a separate file and concatenate them afterwards. But this is also bad because it increases disk and file descriptor usage by a lot. I expect there to be >1k different streams on a regular and sometimes an order of magnitude more.
Another would be to take advantage of sparse files. But those are very filesystem dependent and also suffer from there being a lot of streams with vastly different sizes.
Another would be to forego the shared cache and have stream local buffers. But this is again bad because of the amount of streams and the likely need to then tune the buffer sizes of each of the streams. In my case I can just use clock eviction and be done with it.

squeek502 · June 17, 2026, 10:44pm

I think I’m still failing to see the overall picture. What do you want to happen when the reader gets to the end of the block? Would the behavior of Reader.fixed not work for your use case?

To put it another way, I’m unsure if you want your reader to act as if blocks were stitched together (e.g. r.takeStruct can read part from one block and part from another), or if you want the reader to act as if it’s only aware of one block (e.g. r.takeStruct hits EndOfStream if the full struct is not contained in the block).

pzittlau · June 18, 2026, 6:58am

Exactly. The Reader should be able to treat this, for the most part, as a contiguous, arbitrary, and seekable stream of bytes. The only reason this would likely need to get relaxed is peeking across chunk boundaries. It should still be able to take across boundaries.

Doing the stream, readVec and discard vtable functions across boundaries are trivial. Just rebase gives me headaches.

This will be used in a preprocessing step where I convert a profiling event log to be analyzed. You could then for instance imagine, that I want to do string duplication and store a string table there; or the raw events but sharded by thread; or even some arbitrary data structure like a flame graph.

I can’t uphold this abstraction everywhere because it simply isn’t possible to do everything contiguous and therefore I need to chunk it and peeking across boundaries just isn’t working.

Now that I think about it again on a new day, I could later also have a buffer pool that gets shared by the readers for the cross chunk reading/peeking instead of having one buffer per stream like I said above.

I guess the problem still is this all-or-nothing approach to rebase. Either you’re able to support buffering the requested capacity, or you’re not and have to return ReadFailed^[1] and let the user check the err field in the Reader. I guess there are more cases that we, or at least I, currently can’t think of where having a best effort rebasing would also work.

Maybe something like this, similar to how it’s done for the other things like write and writeAll?

pub fn rebase(r: *Reader, capacity: usize) Error!usize {
   ...
}

pub fn rebaseAll(r: *Reader, capacity: usize) Error!void {
    var index: usize = 0;
    while (index < capacity) index += try r.rebase(capacity - index);
}

This would support the current approach, which is now just rebaseAll and a more general one of partial rebases.

or EndOfStream but this is besided the point ↩︎

squeek502 · June 18, 2026, 7:44am

An approach that would work with the current Reader API, but would lose the “zero-copy” aspect, is effectively option 2 in the OP (or maybe the “buffer pool shared by the readers” thing you mentioned): a Reader implementation that takes a buffer (you determine the minimum size, it would likely need to be as big as the largest value you intend to peek), and then your Reader would treat the blocks as a source rather than the block data itself being the buffer (in other words, you’d implement a stream function that writes to w or fills buffer by copying bytes in from the appropriate block(s), and then you can just use defaultRebase).

I’m not actually sure how changing this would help your use case. In the scenario where you’re using immutable block data as the Reader buffers, if peek(4) is called and there’s only 2 bytes left in the current block/buffer, what would being able to return less than 4 from rebase do to resolve the problem?

Something to keep in mind: my understanding is that the (not adequately documented) intention of rebase is to ensure that capacity bytes can be put into buffer contiguously (and this is the behavior take/peek/fill relies on), so e.g. swapping out the buffer to a different block’s data would break that contract.

pzittlau · June 18, 2026, 8:31am

~~It would solve it for most cases. The problem right now is, that the peek function, except for peekDelimiterInclusive and peekByte, all call fill unconditionally and with that rebase.~~
Shit… I missed this if in fill:

pub fn fill(r: *Reader, n: usize) Error!void {
    if (r.seek + n <= r.end) {
        @branchHint(.likely);
        return;
    }
    return fillUnbuffered(r, n);
}

So yeah… I will just throw an error and be done with it or just take the double copying.

Implementing a zero-copy block cache reader with current `rebase` semantics

The problem with rebase for fixed chunks

Possible Solutions

The problem with `rebase` for fixed chunks