I have a loop over the files in a tar file, skip some files, and write the rest to a target tar file:
…I managed to get to the point where the data should be written to the target file. But how do I get a reader which gives me the bytes of the current iterator item in the tar file? The old std.tar.Iterator.File had a method which directly gives me a reader object:
…the new std.tar.Iterator.File just gives me a file size:
…how do I go from such an iterator item to a reader over the bytes for that item in the tar file? Especially when I don’t have an offset to “seek” to the correct starting position?
…the reader that’s passed into the std.tar.Iterator keeps track of the current seek position, and apparently that’s updated by the iterator… and this together with the file size allows me to find the bytes in the raw tar data… but only by accessing both the original reader and the iterator item… (instead of just the iterator item like before)… and also only when I have direct random access to the tar file content.
But my situation is different because the data is coming directly from another tar file which is streamed in via a reader, not from bytes in memory…
So the question still stands… how do I get a reader for the current tar.Iterator item…
Tbh, this feels more like puzzle solving than programming…
PS: …it’s tempting to just pass in the original reader that was passed into the iterator, but of course as expected this completely messes up the iterator state…
Ah ok, I guess std.tar.Iterator.streamRemaining() is key:
…I guess that let’s me read bytes out of the iterator’s reader without messing up the iterator state…
Thanks! I’ll try that.
I have the nagging feeling that Readers and Writers should be more freely ‘pipe-able’ / ‘pluggable’… e.g. here I need to call a special method on the iterator to stream the content of the current ‘iterator item’ into a writer instead of ‘hey give me a reader for the current item which I can then directly plug into a writer’.
I think the basic idea that I can ask some object to give me a reader or writer object which I can then plug somewhere else, or connect the reader directly to the writer (e.g. some sort of piping) is very intuive.
Of course I haven’t thought that through, I guess there’s reasons
I can use std.tar.Iterator.streamRemaining() to write the data of the current file in the tar file into a std.Io.Writer.
But how do I connect that to a std.tar.Writer? This doesn’t seem to offer a std.Io.Writer interface, and instead has methods to write new file items to output tar file from various sources.
E.g. how do I connect std.tar.Iterator.streamRemaining() which expects a std.Io.Writer to std.tar.Writer.writeFileStream() which expects a std.Io.Reader? Is there a universal WriterReader which I can plug between the two? Is that even the correct approach (e.g. have some sort of intermediate reader/writer betwee the two methods?)
I will still see if I can cobble something together… worst case I guess would be reading the whole tar file into memory so that I have random access on the input data, which should help me to create an ‘adhoc fixed-buffer reader’ on some portion of the input data.
…or probably better: just read the current file item into an intermediate buffer…
I got further by going through an intermediate writer/reader pair:
var tar_writer: std.tar.Writer = .{ .underlying_writer = &file_writer.interface };
var file_name_buffer: [1024]u8 = undefined;
var link_name_buffer: [1024]u8 = undefined;
var iter: std.tar.Iterator = .init(&file_reader.interface, .{
.file_name_buffer = &file_name_buffer,
.link_name_buffer = &link_name_buffer,
});
while (try iter.next()) |tar_item| {
switch (tar_item.kind) {
.file => {
if (std.mem.startsWith(u8, tar_item.name, prefix)) {
// FIMXE: currently it's not possible to directly plug iter.streamRemaining()
// into a std.tar.Writer, so let's go through an intermediate buffer
var imm_writer: std.Io.Writer.Allocating = .init(arena);
defer imm_writer.deinit();
// stream the current tar item into the intermediate writer
try iter.streamRemaining(tar_item, &imm_writer.writer);
// get an intermediate reader on the intermediate writer's buffer
var imm_reader = std.Io.Reader.fixed(imm_writer.getWritten());
// ... and write the file data into the tar-writer
try tar_writer.writeFileStream(tar_item.name, tar_item.size, &imm_reader, .{ .mode = tar_item.mode });
}
},
else => continue,
}
}
try tar_writer.finishPedantically();
try tar_writer.underlying_writer.flush();
…this gives me a tar file which I can unpack with tar -xf sources.tar just fine on the macOS cmdline, but when opening the webpage this now shows an error about an unexpected EndOfStream, but I’ll leave it at that for now and try to debug later:
PS: huh weird… the generated tar is actually totally fine, and when refreshing the doc web page (when running in a local node.js http-server) it also works, just the first load is broken when running the node.js http-server locally).