Help with new deflate functionality

(I also posted this to the Discord but if anyone here specifically has an example I’d appreciate the help!)

Does anyone have an example of how to use the new deflate functionality after https://github.com/ziglang/zig/pull/25301 was merged?

stdlib docs are sparse right now and it seems that flush can only be done once (https://ziglang.org/documentation/master/std/#std.compress.flate.Compress).

My compression loop with the old stdlib functionality was:

  • Write data until the target (compressed data) buffer is full, tracked via an estimate based on bytes written
  • Flush, re-check, repeat write/flush cycle until buffer is truly full
  • Write out compressed data from target buffer
  • Reset target buffer, and repeat

Now with the flush part of this process only being able to be done once, I’m curious how this is done instead.

Currently flush is writing the footer which is why it can only be called once. Based on Andrew’s comments on that PR, this may be moved to a separate end function before the release is cut.

I’m not sure why you’re trying to explicitly write the compressed output to a buffer…isn’t that what the writer you supply to Compress.init is already doing?

The output buffer is the buffer I’m referring to (so that’s the one being passed to init).

I did some more looking into it, and it seems like the main issue currently is that Compress is lacking the functionality that flush used to have in 0.14.1: https://codeberg.org/ziglang/zig/src/commit/d03a147ea0a590ca711b3db07106effc559b0fc6/lib/std/compress/flate/deflate.zig#L323-L337

I tried to see if I could work around that by just utilizing drain or rebase, but it seems those are locked down via asserts on the expectations of when they are generally called by Io.Writer.

I think this generally means that unless you want to wrap your stream in its own writer, your only real recourse is to deal in complete streams (at least at this point, I guess we’ll see when 0.16.0 comes out what things look like).

PS: The background on this is PNG encoding; my current logic is to process IDAT chunks at 16K intervals of compressed data. PNG chunks need checksum data of their own (outside of any in the stream), so the idea was to just buffer the bitstream and chunk it at that boundary. Likely what this means is that I will have to put complete streams into a single IDAT chunk, and adjust the interval to account for the fact we can’t necessarily track compressed size anymore (as we can’t know that ahead of time, and by the time it’s written out completely, it will be too late as the footer will have been written).

I’m actually realizing that implementing a IDAT-chunking writer on the compressed data might actually be trivial, so I might go that route after all. If that works then I think this is pretty much a non-issue, although hopefully we can get the end-of-stream stuff still separated from flush.

Alright, yeah a trivial writer with a pretty simple drain implementation does the job:

fn drain(w: *Io.Writer, data: []const []const u8, splat: usize) Io.Writer.Error!usize {
    _ = splat;
    const stream_writer: *@This() = @fieldParentPtr("writer", w);
    writePNGIDATSingle(stream_writer.file, w.buffer[0..w.end]) catch return error.WriteFailed;
    const len: usize = @min(data[0].len, buffer_size);
    @memcpy(w.buffer[0..len], data[0][0..len]);
    w.end = len;
    return len;
}

I can still preserve my 16K chunk size and everything looks good.

I guess one gotcha here as well would be that you can probably expect to have to retry single write calls because the compressor only does a partial drain when the window buffer gets close to full, (only processing the bytes required to completely fill it, as per Compressor.drain). This just meant that I had to adjust my logic from expecting the exact bytes to be written from a write to a writeAll.

PS: can I get away with pretty much ignoring splat here as shown? The interface contract specifically refers to bytes consumed from data, without mentioning if splat should be reflected in that either way, so I’m not too sure how a downstream caller would have to act if the splat was ignored and data[data.len - 1] was only consumed once.

I’m coming to the conclusion that this is okay; from what I can see, the main application of splat is in the splatting functionality in Io.Writer. These functions seems to deal with partial results, example: Io.Writer.writeSplatAll.

splat only matters if you are writing the last element of data, if you stop before that then you can ignore it in that case. You still have to deal with it eventually.

Io.Writer.writeSplatAll does not do the splat for you, it simply loops until all data is written, including the splat. If you always ignore splat then it will loop forever.

2 Likes

Hmm, looking at the code, it seems that it does deal with partial results of the last splat, so it seems that if splat was ignored in the sense that you were still processing the data, just say in a 1x only or even partially, everything would still be fine. I think maybe there was some confusion in what I meant, I didn’t mean completely ignoring data[0] when data.len was 1, for example, just that data[0] would only be processed a maximum of once, with the return from drain reflecting that accordingly. That part seems fine, from my reading.

One thing that I think should probably be done in this naive implementation though, and is pretty simple, is to just return if (data.len == 1 and splat == 0). That’s simple enough to do and seems correct.