Embed Folder In Zig

fuji-184 · October 20, 2024, 9:49pm

Hello. I’m new in Zig. I have been searching for how to embed a folder that will become 1 binary with other code in Zig. For better understanding is like “embed” in Go and “rust-embed” in Rust, but I haven’t find the equivalent in Zig. Is there a way to do it?

Sze · October 20, 2024, 10:29pm

Hi @fuji-184 welcome to Ziggit!

Zig doesn’t directly have a utility function to embed a whole folder of files, it does however have ways to iterate over the files within a directory for example within your build.zig and then you can create anonymous imports for those files, the anonymous import then can be used with @embedFile to access the binary data from within your source code.

Here is a simple game where I have done something similar zig15game/build.zig at 5d0a33d753528df890f0ab5dd56cd04ff6d0e1c2 · SimonLSchlee/zig15game · GitHub there I haven’t actually iterated over the folder, but this could be added, this guide shows how to iterate over a directory Filesystem | zig.guide

You would have to create names for the anonymous imports (possibly based on the name of the file) because that name is needed as a key for @embedFile.
Language Reference - @embedFile

The @embedFile("nameofmodule") value is a slice of bytes that corresponds to the content of the file you used to create the module.

fuji-184 · October 20, 2024, 11:10pm

Is there a way to handle the path and name dynamically sir? because what I’m trying is embedding build folder from Sveltekit (Javascript frontend framework) into Zig binary. The names of the files generated are always changed everytime I build the project and the amount of generated files are many because I have many routes, it can be pain if I need to change all the file names in the build file everytime I want to run the project

Sze · October 20, 2024, 11:31pm

Maybe you could run Svelte from your build.zig as a build step and then create a second build step that depends on the Svelte build step, inspects the generated files and turns them into a single bigger binary data blob that is structured in a way that contains enough structure/meta info (so that your program is able to navigate the binary blob) and then embed that as a single file.

But I am not completely sure about the details, I think others will also be able to give some advice.

Are the names completely randomly generated or do they stay the same as long as the routes don’t change? (Just asking, because I am wondering whether it would be possible to integrate Svelte in some way with Zigs caching system, would be cool if the cache could be used with it, but I don’t know how Svelte works in detail.)

yataro · October 21, 2024, 2:50am

Why would you need this? I assume from your input that you need to preserve the directory structure, but for what reason? If you only need to embed a directory to unpack it to disk later, it would be easier to use tar/zip and embed the archive.
But if you really need to get the directory structure in memory, you have to write some extra code as Sze explained.

Sze · October 21, 2024, 11:59am

@fuji-184 just as another option, you could make it so that your debug build just uses the build path from your svelte project directly and only your release build actually does this packing. (For example by switching between 2 implementations of “datafetchers/accessors” based on the release mode)

I think it is likely that this packing isn’t optimal in terms of development speed so it could make sense to skip it during development and only pack it for the final deploy.

geemili · October 21, 2024, 6:55pm

You can embed a tar file and then use the standard library to read it.

First create the tar file:

~/tmp〉tar cf files.tar *.zig

Then in the program, use @embedFile and std.tar.iterator:

const std = @import("std");

const FILES_TAR = @embedFile("./files.tar");

pub fn main() !void {
    var file_name_buffer: [std.fs.MAX_NAME_BYTES]u8 = undefined;
    var link_name_buffer: [std.fs.MAX_PATH_BYTES]u8 = undefined;
    var files_tar_byte_stream = std.io.fixedBufferStream(FILES_TAR);

    var tar_iter = std.tar.iterator(files_tar_byte_stream.reader(), .{
        .file_name_buffer = &file_name_buffer,
        .link_name_buffer = &link_name_buffer,
    });

    while (try tar_iter.next()) |file_entry| {
        var line_count: usize = 0;

        var file_read_buffer: [4096]u8 = undefined;
        while (true) {
            const bytes_read = try file_entry.reader().readAll(&file_read_buffer);
            if (bytes_read == 0) break;
            line_count += std.mem.count(u8, file_read_buffer[0..bytes_read], "\n");
        }

        std.log.info("embedded file = \"{}\", size = {} bytes, {} newlines", .{
            std.zig.fmtEscapes(file_entry.name),
            std.fmt.fmtIntSizeDec(file_entry.size),
            line_count,
        });
    }
}

~/tmp〉zig run embedded-tar.zig
info: embedded file = "invalid-call.zig", size = 10.24kB bytes, 61 newlines
info: embedded file = "next-number.zig", size = 0.965kB bytes, 33 newlines
info: embedded file = "nothing-after.zig", size = 374B bytes, 14 newlines
info: embedded file = "pass-buffered-writer.zig", size = 347B bytes, 14 newlines

Edit: Should mention, to create the tar file in the build.zig, you can use std.tar.pipeToFileSystem. You’ll want to wrap this up in a custom build step, but that is outside the scope of this post for now.

Sze · October 21, 2024, 9:34pm

The part I don’t like about tar is that it can separate files into multiple chunks, if the whole data of the files is embedded in the executable then I would want it to be embedded in such a way that I can use the data directly without first trying to piece the pieces together again.

For that the basic question is where and how that data is used within the program, if it is just uploaded to the gpu (or streamed somewhere else as a response) then iterating over a tar might be fine because it needs to be copied anyway, but if it is used on the cpu I would want it there accessible in its final form, instead of having to make another copy.

cryptocode · October 22, 2024, 9:42am

Not sure if suits your needs exactly, but I had similar requirements and wrote Stitch which let’s you attach blobs of data to executables, with any name (such as paths or generated names). It’s both a library and a command line tool.

dermetfan · October 23, 2024, 3:15pm

I wrote embed-dir for this use case but that was 4 years ago so it probably does not compile anymore. Nowadays I think I’d do it differently. Anyways, it could still serve as inspiration.

tobyjaffey · December 6, 2024, 11:47am

Here’s my solution, working with zig 0.13.0

00JCIV00 · December 6, 2024, 2:12pm

Not to derail the conversation, but does tar.pipeToFileSystem() actually create an archive? I thought it simply dumped the contents of an archive to the provided path.

I don’t think Zig’s std lib currently supports archiving (or zipping) files, though I’d love to be wrong about that.

tobyjaffey · December 6, 2024, 2:37pm

It sounds like it should work…

https://ziglang.org/documentation/master/std/#std.tar

pub fn writer(underlying_writer: anytype) Writer(@TypeOf(underlying_writer))

    Creates tar Writer which will write tar content to the underlying_writer. Use setRoot to nest all following entries under single root. If file don't fit into posix header (name+prefix: 100+155 bytes) gnu extented header will be used for long names. Options enables setting file premission mode and mtime. Default is to use current time for mtime and 0o664 for file mode.

00JCIV00 · December 6, 2024, 3:38pm

Maybe I’ve been reading that wrong. I always understood “write tar contents to…” to mean it’s taking the contents from an archive and writing it somewhere else.

permutationlock · December 6, 2024, 5:03pm

Skimming the source code it seems that you are correct.

bagggage · December 7, 2024, 1:42pm

I know for sure that Zig supports creating .tar archives and, apparently, .zip archives as well. You can create a .tar archive using Zig’s standard library. I discovered this while browsing through the source code of the documentation generator, look at this, аnd have successfully been using the archiving functionality from std.tar in my own project.

00JCIV00 · December 7, 2024, 4:08pm

Much appreciated! I’ll give that a shot!