Zig build : How to handle an unknown amount of files in a folder

Hello everyone,

I’m currently building a game using zig, and I would like to use the zig build system as an asset pipeline for this game. Such asset pipeline would require for example taking all the files with a given extension in a folder (in my case aseprite files), run a command on them (to export them to one or multiple .png + .json files), put them in another folder, and run another command on that folder to pack all the .png in a nice sprite sheet that would get installed along the game.

I want to use the zig build system to harness its built in caching system and --watch feature for live asset rebuild/reloading in the engine (I got it to work for my shaders already), but I don’t know how if the build system can handle “all the files in a folder” without writing the path of all the files manually in the build.zig file (which would be kinda painfull for my artist friend). I tried poking around the build.zig api but I didn’t find anything resembling what I want for that system.

The only zig project that I know that handle a “folder of files” might be Zine but I don’t know if they support the same feature set that I want.

Alternatively, would it be possible for a zig program to use the same api as the zig build system to know which files have changed, where to cache them so I could build an asset pipeline myself, and add that program as a build step ?

Thanks !

1 Like

I think this should work:
In your build.zig, iterate over your resources folder and call addOutputFileArg for each file to get a lazy path. Use this lazy path as the root_source_file to createModule.
Create a writer from a std.ArrayList. For each file on the resources folder, append the name, such that, when you end the iterations, the ArrayList will contain something like this:

const sprite1 = @embed("sprite1");
const sprite2 = @embed("sprite2");
...

Pass this data to a WriteFile step, to get a lazy path. Make this step depend on all the “run command” steps.
Create a module passing the lazy path from WriteFile as the root_source_file and list all the other modules as imports.

Now in your consumer module, you can access them like this:

const resources = @import("resources");
const sprite1 = resources.sprite1;
1 Like

Use b.build_root.handle to get a std.fs.Dir to the root of your project, open the directory of your resources via this will let you use the iterate() function to get a directory iterator.
b.addSystemCommand is your friend

use addInputFileArg too, otherwise zig won’t rebuild when the inputs change.

note both of those functions are on the step returned
By b.addSystemCommand

use b.addOptions() to get a *Options, this lets you declaratively make a module, much nicer and more useful than a WriteFile
works like this options.addOption(T, "option_name", value)
then add it to your root module with addOptions("import_name", options)
and you will be able to @import("import_name").option_name to your heart’s content :3

Thanks for the tips !

One issue I see is that by listing the file in the setup phase of the build.zig, newly added files wont be picked up if the build is in --watch mode, which is one of the use case I would like to support.

I’ll experiment with that and come back with the results

I would not recommend iterating directories in your build.zig script, in part because as you noticed it won’t play well with --watch but also in part because it’s considered poor form to access or query the host system using std.process or std.fs APIs from your declarative build.zig.

What I would personally try is to write a tiny Zig CLI program in a separate file that takes paths to input files/directories as well as an output directory as command-line arguments, iterates through the file system for input files and uses std.process.Child to pass them forward to the command you need to invoke such that they resulting files are outputted to the specified output directory.

The Running the Project’s Tools section of the build system docs has an example of compiling and invoking a program in this way. If you run your compiled utility program using b.addRunArtifact(), pass input paths using run_step.addFileArg() and run_step.addDirectoryArg() and obtain a handle to the output directory using run_step.addOutputDirectoryArg(), the build system should cache everything and detect changes.

Note the use of should, because unfortunately it doesn’t look like std.Build.Step.Run currently properly caches and watches directory input arguments, so what I suggested might not work after all. It might work if you manually also call run_step.step.addDirectoryWatchInput() for each input directory argument, but I have not actually tried this in practice.

3 Likes

Oh I see, however don’t running a sub-program that only takes a directory with run_step.addDirectoryArg() mean that the command will get called every time the directory changes, and thus it would rebuild all my assets instead of only the one that have actually changed ?

yup, as long as you make sure to use the file arg input/output functions you should be fine. Not sure why they are complaining about std.process I explicitly stated to use the build systems system command utilities.
the only reason i said to use std.fs is because the build system doesnt provide a way that is as fine grained, as you said it would be annoying if you rebuilt every asset when only one changed

Unfortunately, this is where you have to make a choice between explicitly specifying each file individually, which is fast and only rebuilds what is needed but may result in added files not being detected and removed files causing errors, or watching an entire directory tree, which should properly handle added/removed files but will rebuild everything even if just a single file in that directory tree changes (which will also quickly bloat the cache).

I don’t know if the lack of support for globbing in the build system is an intentional design decision or something that is planned to be solved for in the future, but it should be noted that several prominent modern build systems with a heavy emphasis on performance, for example Meson, don’t support globbing because scanning a directory tree for changes is slow and unreliable.

I don’t know what your workflow is, but if adding or removing assets is a thing that happens infrequently, I would personally probably bite the bullet and list each file individually. It’s the fastest and most cache-friendly option. On 0.14.0-dev, with support for @importing ZON files, you could even have this “asset manifest” in a different file from the build.zig, if having them be separate makes things cleaner and easier to work with.

I used std.process as an example because I’ve seen people do things like running git rev-parse inline for version numbers. The main reason I discourage against using APIs that access the host is because there are plans to run build.zig scripts in a WASM sandbox as well as separate the “configure” and “build” phases into two distinct processes by serializing the build graph, so it’s probably unwise to rely on methods that will almost certainly stop working in the future. Your build.zig should ideally be as “pure” as possible.

The two things are not necessarily mutually exclusive. I think it would make sense to have the build system let you add entries to the list of watched directories.

That said it’s true that this is currently not supported AFAIK.

What if we create a zig program that used the existing cache functionality to create a manifest for one specific directory (I think the manifest is basically an index that also contains a hash of all the hashes?).

Wouldn’t that file then only change if any of the contents of that directory change?

And then a dependent build step could read that manifest file and create an embedding of all the files.

Just a thought, maybe somebody can try it out, or point out why it may not work.


Maybe std.Build.Step.writeManifestAndWatch or std.Build.Cache.Manifest.addListOfFiles could be used? I currently haven’t looked into the implementation enough to fully understand how it works or is meant to be used…

But when I look at manifest in the docs it seems to me as if manifests could be used to solve the multiple files syncronization problem, similar to how it already seems to be used to manage packages? But my understanding is lacking and it would be good to have the perspective of somebody who has worked on the build system internals.