Embed Folder In Zig

Hello. I’m new in Zig. I have been searching for how to embed a folder that will become 1 binary with other code in Zig. For better understanding is like “embed” in Go and “rust-embed” in Rust, but I haven’t find the equivalent in Zig. Is there a way to do it?

Hi @fuji-184 welcome to Ziggit!

Zig doesn’t directly have a utility function to embed a whole folder of files, it does however have ways to iterate over the files within a directory for example within your build.zig and then you can create anonymous imports for those files, the anonymous import then can be used with @embedFile to access the binary data from within your source code.

Here is a simple game where I have done something similar zig15game/build.zig at 5d0a33d753528df890f0ab5dd56cd04ff6d0e1c2 · SimonLSchlee/zig15game · GitHub there I haven’t actually iterated over the folder, but this could be added, this guide shows how to iterate over a directory Filesystem | zig.guide

You would have to create names for the anonymous imports (possibly based on the name of the file) because that name is needed as a key for @embedFile.
Language Reference - @embedFile

The @embedFile("nameofmodule") value is a slice of bytes that corresponds to the content of the file you used to create the module.

Is there a way to handle the path and name dynamically sir? because what I’m trying is embedding build folder from Sveltekit (Javascript frontend framework) into Zig binary. The names of the files generated are always changed everytime I build the project and the amount of generated files are many because I have many routes, it can be pain if I need to change all the file names in the build file everytime I want to run the project

Maybe you could run Svelte from your build.zig as a build step and then create a second build step that depends on the Svelte build step, inspects the generated files and turns them into a single bigger binary data blob that is structured in a way that contains enough structure/meta info (so that your program is able to navigate the binary blob) and then embed that as a single file.

But I am not completely sure about the details, I think others will also be able to give some advice.

Are the names completely randomly generated or do they stay the same as long as the routes don’t change? (Just asking, because I am wondering whether it would be possible to integrate Svelte in some way with Zigs caching system, would be cool if the cache could be used with it, but I don’t know how Svelte works in detail.)

Why would you need this? I assume from your input that you need to preserve the directory structure, but for what reason? If you only need to embed a directory to unpack it to disk later, it would be easier to use tar/zip and embed the archive.
But if you really need to get the directory structure in memory, you have to write some extra code as Sze explained.

@fuji-184 just as another option, you could make it so that your debug build just uses the build path from your svelte project directly and only your release build actually does this packing. (For example by switching between 2 implementations of “datafetchers/accessors” based on the release mode)

I think it is likely that this packing isn’t optimal in terms of development speed so it could make sense to skip it during development and only pack it for the final deploy.

You can embed a tar file and then use the standard library to read it.

First create the tar file:

~/tmp〉tar cf files.tar *.zig

Then in the program, use @embedFile and std.tar.iterator:

const std = @import("std");

const FILES_TAR = @embedFile("./files.tar");

pub fn main() !void {
    var file_name_buffer: [std.fs.MAX_NAME_BYTES]u8 = undefined;
    var link_name_buffer: [std.fs.MAX_PATH_BYTES]u8 = undefined;
    var files_tar_byte_stream = std.io.fixedBufferStream(FILES_TAR);

    var tar_iter = std.tar.iterator(files_tar_byte_stream.reader(), .{
        .file_name_buffer = &file_name_buffer,
        .link_name_buffer = &link_name_buffer,
    });

    while (try tar_iter.next()) |file_entry| {
        var line_count: usize = 0;

        var file_read_buffer: [4096]u8 = undefined;
        while (true) {
            const bytes_read = try file_entry.reader().readAll(&file_read_buffer);
            if (bytes_read == 0) break;
            line_count += std.mem.count(u8, file_read_buffer[0..bytes_read], "\n");
        }

        std.log.info("embedded file = \"{}\", size = {} bytes, {} newlines", .{
            std.zig.fmtEscapes(file_entry.name),
            std.fmt.fmtIntSizeDec(file_entry.size),
            line_count,
        });
    }
}
~/tmp〉zig run embedded-tar.zig
info: embedded file = "invalid-call.zig", size = 10.24kB bytes, 61 newlines
info: embedded file = "next-number.zig", size = 0.965kB bytes, 33 newlines
info: embedded file = "nothing-after.zig", size = 374B bytes, 14 newlines
info: embedded file = "pass-buffered-writer.zig", size = 347B bytes, 14 newlines

Edit: Should mention, to create the tar file in the build.zig, you can use std.tar.pipeToFileSystem. You’ll want to wrap this up in a custom build step, but that is outside the scope of this post for now.

1 Like

The part I don’t like about tar is that it can separate files into multiple chunks, if the whole data of the files is embedded in the executable then I would want it to be embedded in such a way that I can use the data directly without first trying to piece the pieces together again.

For that the basic question is where and how that data is used within the program, if it is just uploaded to the gpu (or streamed somewhere else as a response) then iterating over a tar might be fine because it needs to be copied anyway, but if it is used on the cpu I would want it there accessible in its final form, instead of having to make another copy.

Not sure if suits your needs exactly, but I had similar requirements and wrote Stitch which let’s you attach blobs of data to executables, with any name (such as paths or generated names). It’s both a library and a command line tool.

2 Likes

I wrote embed-dir for this use case but that was 4 years ago so it probably does not compile anymore. Nowadays I think I’d do it differently. Anyways, it could still serve as inspiration.