Anyone know how std.zip works?

Blenzo · September 19, 2024, 2:02am

Hi, I’ve been wanting to give Zig a try for a while. I need a cli tool to read a range of cells from a .ods spreadsheet and cant find anything already available. I usually make cli tools with Rust but i thought this would be a good little project to get my feet wet with zig. .ods files are just zip archives and I saw they just added zip support to the standard library. The problem Im having is the lack of documentation. I can’t figure out how to read a file from a zip archive using the standard library. This is why I have…

const std = @import("std");
const print = std.debug.print;

pub fn main() !void {
    const file = try std.fs.cwd().openFile("test.ods", .{});
    defer file.close();

    var zip_iter = std.zip.Iterator().init(file);

    const entry = zip_iter.next();

    print("Found file: {s}", .{entry.name});
}

I know its not right. std.zip.Iterator needs comptime SeekableStream: type I’m not sure what that means and i cant figure it out with the documentation. Also i don’t see a entry.name: []u8 but I do see entry.filename_len: u32?

furpu · September 19, 2024, 3:02am

I never worked with std.zip but I believe this Documentation - The Zig Programming Language should help you understand the comptime SeekableStream: type thing.

ktz_alias · September 19, 2024, 3:39am

you may be useful unit tests in zip.zig.

github.com

ziglang/zig/blob/master/lib/std/zip.zig#L624


      
          }
          fn testZipError(expected_error: anyerror, file: File, options: ExtractOptions) !void {
              var zip_buf: [4096]u8 = undefined;
              var store: [1]FileStore = undefined;
              var fbs = try testutil.makeZipWithStore(&zip_buf, &[_]File{file}, .{}, &store);
              var tmp = testing.tmpDir(.{ .no_follow = true });
              defer tmp.cleanup();
              try testing.expectError(expected_error, extract(tmp.dir, fbs.seekableStream(), options));
          }
          
          test "zip one file" {
              try testZip(.{}, &[_]File{
                  .{ .name = "onefile.txt", .content = "Just a single file\n", .compression = .store },
              }, .{});
          }
          test "zip multiple files" {
              try testZip(.{ .allow_backslashes = true }, &[_]File{
                  .{ .name = "foo", .content = "a foo file\n", .compression = .store },
                  .{ .name = "subdir/bar", .content = "bar is this right?\nanother newline\n", .compression = .store },
                  .{ .name = "subdir\\whoa", .content = "you can do backslashes", .compression = .store },
                  .{ .name = "subdir/another/baz", .content = "bazzy mc bazzerson", .compression = .store },

These test is depended on utility routines for this zip module.

github.com

ziglang/zig/blob/master/lib/std/zip/test.zig

const std = @import("std");
const testing = std.testing;
const zip = @import("../zip.zig");
const maxInt = std.math.maxInt;

pub const File = struct {
    name: []const u8,
    content: []const u8,
    compression: zip.CompressionMethod,
};

pub fn expectFiles(
    test_files: []const File,
    dir: std.fs.Dir,
    opt: struct {
        strip_prefix: ?[]const u8 = null,
    },
) !void {
    for (test_files) |test_file| {
        var normalized_sub_path_buf: [std.fs.max_path_bytes]u8 = undefined;

This file has been truncated. show original

And welcome to ziggit.

Blenzo · September 20, 2024, 3:29pm

thanks for the replies. I got a little farther off of both your recommendations but I’m still on the struggle bus. I’ll admit I’m not an experience programmer, just a hobby, but i really want to figure this out.

It seems like the std.zip library is incomplete at this point as it lack a lot of basic functionality. I figured out how to find the file i need in the archive but I’m trying to figure out how to actually decompress it into memory so i can parse it. I think I figured it out but i don’t know what kind of writer i need to be able to write to memory instead of a file.writer. I made a page allocated u8, entry.uncompressed_size to store the data but i don’t konw how to get the data from the std.zip.decompress into it, if that makes sense.

const std = @import("std");
const print = std.debug.print;

pub fn main() !void {
    const cwd = std.fs.cwd();
    var file = try cwd.openFile("test.ods", .{});
    defer file.close();

    const skbl_strm = file.seekableStream();

    var iter = try std.zip.Iterator(@TypeOf(skbl_strm)).init(skbl_strm);

    var filename_buf: [std.fs.max_path_bytes]u8 = undefined;
    while (try iter.next()) |entry| {
        const filename_len = entry.filename_len;
        const filename = filename_buf[0..filename_len];

        try skbl_strm.seekTo(entry.header_zip_offset + @sizeOf(std.zip.CentralDirectoryFileHeader));
        _ = try skbl_strm.context.reader().readAll(filename);

        print("Target file found!: {s}\n", .{filename});

        if (std.mem.eql(u8, filename, "content.xml")) {
            print("Target file found!: {s}\n", .{filename});

            const local_data_header_offset: u64 = local_data_header_offset: {
                const local_header = blk: {
                    try skbl_strm.seekTo(entry.file_offset);
                    break :blk try skbl_strm.context.reader().readStructEndian(std.zip.LocalFileHeader, .little);
                };
                break :local_data_header_offset @as(u64, local_header.filename_len) +
                    @as(u64, local_header.extra_len);
            };

            const local_data_file_offset: u64 =
                @as(u64, entry.file_offset) +
                @as(u64, @sizeOf(std.zip.LocalFileHeader)) +
                local_data_header_offset;

            const pg_alloc = std.heap.page_allocator;

            var decomp_data = try pg_alloc.alloc(u8, entry.uncompressed_size);
            const writer = std.io.Writer(decomp_data);

            try skbl_strm.seekTo(local_data_file_offset);

            var lmt_redr = std.io.limitedReader(
                skbl_strm.context.reader(),
                entry.uncompressed_size,
            );

            _ = try std.zip.decompress(
                entry.compression_method,
                entry.uncompressed_size,
                lmt_redr.reader(),
                writer,
            );
        }
    }
}

mnemnion · September 20, 2024, 5:09pm

One should always be wary of drawing this conclusion when learning a new language. It might be correct, but it blocks the process of trying to figure out how to do what you want to do, and it often isn’t correct.

Readers and Writers are defined in std.io. std.zip takes any kind of Writer, so with a bit more exploring, I’m confident you can figure out the type that you’ll need.

LucasSantos91 · September 20, 2024, 6:00pm

You need two streams. One will read the metadata from the zip the file, the other one you use to do the actual reading.
This is modified from my own codebase, untested:

fn readZipFile(
    file: std.fs.File,
) !void {
    var seekable = file.seekableStream();
    var zipIterator = try std.zip
            .Iterator(@TypeOf(seekable))
            .init(seekable);
    // zipIterator created a copy of seekable. 
    // We can use seekable for ourselves.
    
    while (true) {
        const maybeNext = try zipIterator.next();
        if (maybeNext) |entry| {
            // Zip allows 0-sized entries, I think they're folders.
            if (entry.uncompressed_size == 0) continue;
            const totalOffset = entry.file_offset +
                entry.filename_len +
                @sizeOf(std.zip.LocalFileHeader);
            try seekable.seekTo(@intCast(totalOffset));
            // file now points to the beggining of the 
            // compressed data stream.

            // File readers are unbuffered. You probably want some buffering.
            var baseReader = std.io.bufferedReader(file.reader());
            var decompressor = std.compress
            .flate
            .decompressor(baseReader);

            // decompressor is now a reader that spits out decompressed data.
            // You don't need to dump it all into memory, you can just read it in pieces,
            // but here is how you decompress the entire file.
            const buffer = try allocator.alloc(u8, entry.uncompressed_size);
            defer allocator.free(buffer);

            try decompressor.readNoEof(buffer);
            // buffer now holds the entire decompressed data.
           
        } else break; // No more files in the archive.
    }
}

wrapitup · September 20, 2024, 6:56pm

The std.zip module seems to be designed to extract a zip file to a directory. Entry.extract accepts a std.fs.Dir and you can’t just pass in a std.io.AnyWriter. As a result trying to get it to write to an in-memory buffer is tricky. When I tried to get a working example from what you’ve provided, I kept coming across an error in the decompress method returning an error.ZipDeflateTruncated.

Instead I tried directly using std.compress.flate.decompressor like in @LucasSantos91’s code. That may work if the underlying files are indeed compressed with DEFLATE. That might help you continue your journey.

-             const writer = std.io.Writer(decomp_data);

-            _ = try std.zip.decompress(
-                entry.compression_method,
-                entry.uncompressed_size,
-                lmt_redr.reader(),
-                writer,
-            );
+            var decompressor = std.compress.flate.decompressor(lmt_redr.reader());
+            try decompressor.reader().readNoEof(decomp_data);
+
+            std.debug.print("{s}\n", .{decomp_data});

But overall, kudos on figuring out that what you wanted was mostly the Entry.extract method.

Blenzo · September 20, 2024, 8:28pm

Ya, that’s what i meant by it seemed incomplete. I figured there should be an entry.decompress that was geared towards doing what i was attempting. having to calculate offsets seems a little convoluted but i get its a new library, so I’m not complaining. It looks like you can use std.zip.decompress for a single file because std.entry.extract calls it in the way i was trying to call it but it passes in file.writer(). It says it can take any writer but when I look at the documentation for std.io there really isn’t any information or explanation on what the different writers are for or how to use them, and i am not familiar with the concept.

That error seems to mean there is still data in the buffer? maybe my offsets are off? I’m not sure why br.start and br.end are expected to be equal?

.deflate => {
    var br = std.io.bufferedReader(reader);
    var decompressor = std.compress.flate.decompressor(br.reader());
    while (try decompressor.next()) |chunk| {
        try writer.writeAll(chunk);
        hash.update(chunk);
        total_uncompressed += @intCast(chunk.len);
        if (total_uncompressed > uncompressed_size)
            return error.ZipUncompressSizeTooSmall;
    }
    if (br.end != br.start)
        return error.ZipDeflateTruncated;

Now that I’m looking for it, decompress just calls std.compress.flate.decompressor. This seems to work but now I’m confused on why decomp_data can be const. isn’t it mutated by try decompressor.reader().readNoEof(decomp_data);?

const pg_alloc = std.heap.page_allocator;
const decomp_data = try pg_alloc.alloc(u8, entry.uncompressed_size);
defer pg_alloc.free(decomp_data);

try skbl_strm.seekTo(local_data_file_offset);

var lmt_redr = std.io.limitedReader(
    skbl_strm.context.reader(),
    entry.uncompressed_size,
);

var decompressor = std.compress.flate.decompressor(lmt_redr.reader());
try decompressor.reader().readNoEof(decomp_data);

LucasSantos91 · September 20, 2024, 8:43pm

decomp_data is a slice, which is just a fat pointer. The data it points to is mutable.

Blenzo · September 20, 2024, 8:58pm

Okay, that makes sense.