Fetching dependencies without build.zig.zon?

Is there some build.zig API which allows me to fetch dependency without pre-declaring it in build.zig.zon? I want to avoid .zon because:

  • my thing isn’t actually a reusable Zig package with downstream dependencies, so I don’t need it
  • I’d rather not have an extra file in the root of my repository (though, if there’s a way to just inline build.zig.zon into my build.zig, that’d would work for me).
  • and, in general, I’d love to have the flexibility of just programmatically downloading hashed stuff.

Have you tried just using the http client from the standard library in the build.zig?

The short answer is most likely no, a build.zig.zon manifest is required for build system dependencies.

The longer answer is, it depends on what is your definition of a dependency is in this context. If it’s a build system dependency that you expect to be able to resolve via b.dependency("foo", .{}) and subsequently obtain modules/artifacts from, then you definitely need a build.zig.zon.

The exception would be if it’s a “dumb tarball” of files and not a Zig package with a build.zig, in which case you could in theory create a small Zig program that takes an archive URL and an output directory as command-line args and fetches/extracts that archive to the specified output directory. You could then invoke that program with b.addRunArtifact in combination with std.Build.Step.Run.addOutputDirectoryArg to obtain LazyPath handles to the directory and/or its files. But all of this would almost certainly be both worse performing and more work for you than just using the package manager as intended.

Would you mind expanding upon why you feel an aversion toward having a build.zig.zon in the first place?

Don’t run imperative code that accesses the file system or the network directly from your build.zig. In the future, the build system will be sandboxed, and your build function will be prohibited from accessing system resources, so this is not future proof, and even now it’s a poor idea since the code will not take advantage of the build system cache and might be inadvertently invoked by innocuous things like zig build --help or tooling like ZLS. If you need to e.g. fetch something from the Internet, you should use a run step as described above. The only thing your build function should do is declaratively construct a (serializable) graph of build steps to be processed by a build runner.

1 Like

Yeah you could totally do this pretty easily by adding some “Run” steps that run zig fetch. This is similar to a custom build step I added years ago called “GitRepoStep” (zig-build-repos/GitRepoStep.zig at master · marler8997/zig-build-repos · GitHub). I no longer use it since Zig added build.zig.zon but I could imagine some niche use cases where maybe it’s worth it to add this sort of thing especially if you’re not making an actual Zig package meant to be used by others.

2 Likes

Aha, zig fetch indeed looks like what I need. What’s would be the best way to integrate that with build.zig? I can do

    const stdout = b.addSystemCommand(
        &.{ b.graph.zig_exe, "fetch", url },
    ).captureStdOut();

which would give me a hash of the download.

What would be the right way to get ~/.cache/zig/p/ path I should be joining that hash to?

One of the main reasons for dependencies required to be in build.zig.zon is so that there is a canonical, simple, declarative place to list all the dependencies of a project. This is useful for all kinds of tooling, including third party stuff like package indexes which want to know which packages are depended on the most.

The use case you are describing goes against that premise, so that’s why it’s an uphill battle. Personally I don’t find those bulleted points against having the declarative file compelling.

4 Likes

Here’s a sample implementation:

// REDACTED: this version would fetch the package from the
//           network every time, see my reply for a better version

One thing to note is that this does solve a current problem with the build system around lazy dependencies. This example implements a true lazy dependency, however, I have a prototype working that will fix lazy dependencies for all zig build files files as well (Associate lazy dependencies with steps and only fetch them if those steps are built · Issue #21525 · ziglang/zig · GitHub). However the core team may not get to it before the next release so it may be a while.

P.S. this implementation is also bad because I think it will fetch the entire dependency from the internet EVERY TIME!!! very bad. You can fix this by adding a “hash” field to the step and using that to check if it already exists beforehand.

1 Like

I couldn’t help myself, here’s the fixed version that properly checks if the package has already been fetched and doesn’t try to download it from the network every time:

const std = @import("std");

pub fn build(b: *std.Build) void {
    const target = b.standardTargetOptions(.{});
    const optimize = b.standardOptimizeOption(.{});

    const stb = ZigFetch.create(b, .{
        .url = "https://github.com/nothings/stb/archive/31707d14fdb75da66b3eed52a2236a70af0d0960.tar.gz",
        .hash = "1220aefdc5ff6261afb86675b54f987d9e86c575049b2050ee8f23d49c954ff4970a",
    });
    const exe = b.addExecutable(.{
        .name = "example",
        .root_source_file = b.path("src/main.zig"),
        .target = target,
        .optimize = optimize,
    });
    exe.addIncludePath(stb.getLazyPath());
    exe.linkLibC();
    b.installArtifact(exe);
}

const ZigFetchOptions = struct {
    url: []const u8,
    hash: []const u8,
};
const ZigFetch = struct {
    step: std.Build.Step,
    url: []const u8,
    hash: []const u8,

    already_fetched: bool,
    pkg_path_dont_use_me_directly: []const u8,
    lazy_fetch_stdout: std.Build.LazyPath,
    generated_directory: std.Build.GeneratedFile,
    pub fn create(b: *std.Build, opt: ZigFetchOptions) *ZigFetch {
        const run = b.addSystemCommand(&.{ b.graph.zig_exe, "fetch", opt.url });
        const fetch = b.allocator.create(ZigFetch) catch @panic("OOM");
        const pkg_path = b.pathJoin(&.{
            b.graph.global_cache_root.path.?,
            "p",
            opt.hash,
        });
        const already_fetched = if (std.fs.cwd().access(pkg_path, .{}))
            true
        else |err| switch (err) {
            error.FileNotFound => false,
            else => |e| std.debug.panic("access '{s}' failed with {s}", .{pkg_path, @errorName(e)}),
        };
        fetch.* = .{
            .step = std.Build.Step.init(.{
                .id = .custom,
                .name = b.fmt("zig fetch {s}", .{opt.url}),
                .owner = b,
                .makeFn = make,
            }),
            .url = b.allocator.dupe(u8, opt.url) catch @panic("OOM"),
            .hash = b.allocator.dupe(u8, opt.hash) catch @panic("OOM"),
            .pkg_path_dont_use_me_directly = pkg_path,
            .already_fetched = already_fetched,
            .lazy_fetch_stdout = run.captureStdOut(),
            .generated_directory = .{
                .step = &fetch.step,
            },
        };
        if (!already_fetched) {
            fetch.step.dependOn(&run.step);
        }
        return fetch;
    }
    pub fn getLazyPath(self: *const ZigFetch) std.Build.LazyPath {
        return .{ .generated = .{ .file = &self.generated_directory } };
    }
    pub fn path(self: *ZigFetch, sub_path: []const u8) std.Build.LazyPath {
        return self.getLazyPath().path(self.step.owner, sub_path);
    }
    fn make(step: *std.Build.Step, prog_node: std.Progress.Node) !void {
        _ = prog_node;
        const b = step.owner;
        const fetch: *ZigFetch = @fieldParentPtr("step", step);
        if (!fetch.already_fetched) {
            const sha = blk: {
                var file = try std.fs.openFileAbsolute(fetch.lazy_fetch_stdout.getPath(b), .{});
                defer file.close();
                break :blk try file.readToEndAlloc(b.allocator, 999);
            };
            const sha_stripped = std.mem.trimRight(u8, sha, "\r\n");
            if (!std.mem.eql(u8, sha_stripped, fetch.hash)) return step.fail(
                "hash mismatch: declared {s} but the fetched package has {s}",
                .{ fetch.hash, sha_stripped },
            );
        }
        fetch.generated_directory.path = fetch.pkg_path_dont_use_me_directly;
    }
};
1 Like

Yeah, totally! This is for personal consumption only, I’d recommend against this for anyone else. Dependencies which you can parse rather than compute are a huge deal and are a requirement for having a robust ecosystem of interdependent stuff.

For the reference, let me describe the specific use-case where I think I need what I need:

In TigerBeetle, we currently are shelling out to gh release download in a couple of places:

I’d love to avoid shelling out to gh there, as it is really not necessary.

The first one we use to download pre-compiled copies of llvm-objcopy. This use-case I think would be nicely covered by using build.zig.zon, except that:

  • we really don’t like adding more files to the root of our repository. jorandirkgreef wants us to remove .gitignore even, which I myself am slowly coming around to :rofl:
  • the proper fix here is teaching zig objcopy to do what we use llvm-objcopy for, the whole current thing in general is a stop-gap nonsense.

The second one is more interesting — here, we download the previous release of TigerBeetle from GitHub. We need that for testing our upgrade path. This use-case I think isn’t covered by build.zig.zon:

  • there’s a matrix of things, version X os X arch X debug, and generating it programmatically is easier
  • in fact, because there’s a new version and a new release every weak, you can’t really even encode all the hashes at all.

I understand that the proper path for me is to write my own step that combines Zig’s http client with zip with chmod with (optionally) sha, but I am just wondering if there’s some sort of zig curl around which I could just re-use :upside_down_face: I am totally cool with the answer being “no, just write the custom thing yourself, if you need it”!

1 Like

That would be zig fetch:

andy@bark ~> zig fetch --help
Usage: zig fetch [options] <url>
Usage: zig fetch [options] <path>

    Copy a package into the global cache and print its hash.

Options:
  -h, --help                    Print this help and exit
  --global-cache-dir [path]     Override path to global Zig cache directory
  --debug-hash                  Print verbose hash information to stdout
  --save                        Add the fetched package to build.zig.zon
  --save=[name]                 Add the fetched package to build.zig.zon as name
  --save-exact                  Add the fetched package to build.zig.zon, storing the URL verbatim
  --save-exact=[name]           Add the fetched package to build.zig.zon as name, storing the URL verbatim

I’m open to expanding its capabilities, provided that the best practices use case is not sabotaged.

I’m in the process of finalising a small case study to share with the community about what we’ve done integrating Zig Build into a moderately complex non-Zig project. I ended up implementing a separate fetch-assets tool, which maintains a manifest with hashes. There’s also a code generator to generate zig build rules, but that’s not relevant to this topic.

The main build.zig is here: rcloud/build.zig at feat/pomi/14-zig-build-take-two · pomi601/rcloud · GitHub and you can see the libraries it depends on to pull in and run the fetch-assets tool from a dependency by following the links in the build.zig.zon.

I’m not sure zig fetch would work for our use case, but I’d need to think about it more. One issue I ran into is that zig fetch assumes that a .tar.gz file is a source tarball and it unpacks it. This is not desirable in our non-Zig use case, because we need to maintain access to the original .tar.gz file.

1 Like

Oh, that is too good. Here, I made a worse version of it:

// Use 'zig fetch' to download and unpack the specified URL, optionally verifying the checksum.
fn fetch(b: *std.Build, options: struct {
    url: []const u8,
    file_name: []const u8,
    hash: ?[]const u8,
}) std.Build.LazyPath {
    const copy_from_cache = b.addRunArtifact(b.addExecutable(.{
        .name = "copy-from-cache",
        .root_source_file = b.addWriteFiles().add("main.zig",
            \\const std = @import("std");
            \\const assert = std.debug.assert;
            \\pub fn main() !void {
            \\    var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
            \\    const allocator = arena.allocator();
            \\    const args = try std.process.argsAlloc(allocator);
            \\    assert(args.len == 5 or args.len == 6);
            \\
            \\    const hash_and_newline = try std.fs.cwd().readFileAlloc(allocator, args[2], 128);
            \\    assert(hash_and_newline[hash_and_newline.len - 1] == '\n');
            \\    const hash = hash_and_newline[0..hash_and_newline.len - 1];
            \\    if (args.len == 6 and !std.mem.eql(u8, args[5], hash)) {
            \\        std.debug.panic(
            \\            \\bad hash
            \\            \\specified:  {s}
            \\            \\downloaded: {s}
            \\            \\
            \\        , .{args[5], hash, });
            \\    }
            \\    const source_path = try std.fs.path.join(allocator, &.{args[1], hash, args[3]});
            \\    try std.fs.cwd().copyFile(
            \\        source_path,
            \\        std.fs.cwd(),
            \\        args[4],
            \\        .{},
            \\    );
            \\}
        ),
        .target = b.graph.host,
    }));
    copy_from_cache.addArg(
        b.graph.global_cache_root.join(b.allocator, &.{"p"}) catch @panic("OOM"),
    );
    copy_from_cache.addFileArg(
        b.addSystemCommand(&.{ b.graph.zig_exe, "fetch", options.url }).captureStdOut(),
    );
    copy_from_cache.addArg(options.file_name);
    const result = copy_from_cache.addOutputFileArg(options.file_name);
    if (options.hash) |hash| {
        copy_from_cache.addArg(hash);
    }
    return result;
}

(more seriously, I was always annoyed just how much boilerplate one needs to write for a custom step, and it looks like that an inline run step is actually more consize here!)

3 Likes

With an .unpack = false, this would be helpful. I would still need to be able to programmatically add dependencies to the .zon file, since the dependencies are derived from analysis of a set of source files, but I think parsing and emitting zon is already in std somewhere. I’ll have to have a look at this again.

Madman. You just implemented the equivalent of JavaScript’s eval for Zig :slight_smile:

2 Likes

That’s an improvement over a custom step because I fully intend to break custom steps by making the “configure” phase and “make” phase be in separate processes. The benefit of doing it that way instead of a custom step is that since it runs in a separate process, it can have lowered privileges, it is guaranteed to integrate properly with the build system’s caching, file system watching, error reporting, concurrency, and it will have sandboxed failures and be independently killed if it hogs too many resources.

4 Likes

It seems an easy win is to allow specifying the hash on the CLI for caching&integrity verification:

Feels like most users of zig fetch CLI would actually want that?

Most users would not use this because the main use case is fetching a dependency for the first time and you do not know the hash. If you already knew the hash you could just put it and the URL directly into your build.zig.zon, no need to use zig fetch.

If I understand correctly you’re using this tool in some kind of automated infrastructure, whereas the original motivation was developer tooling, manually executed when adding a new dependency to a project.

2 Likes

Ahhhh, right, I totally missed that! So zig fetch is both a build.zon editor, a-la cargo add or npm install —save, but it also could be used as a low-level download tool a-la curl.

Yeah, than it’s not clear to me whether what I want is in scope for zig fetch at all.

Well, I’m not sure I would go that far. I think your use case is interesting, too. I mean, like you said we have this networking implementation handy, might as well expose its capabilities for tooling that wants to take advantage of it, right?