Dependencies proxy

I recently encountered a situation where deleting a tag in a dependency repository broke my build.

To avoid this problem, I wrote a small wrapper that stores dependencies, but this required changing the URLs in build.zig.zon (even though the hashes remained the same). This will also require making the cache public for the opensource project.

In this situation, it would be useful to have a dependency proxying protocol like GOPROXY.

Internally, it could work like this:

  • In build.zig.zon, you specify the original URL, for example, https://curl.se/download/curl-8.17.0.tar.xz
  • If the environment variable ZIGPROXY=https://zm.org is specified, the archive will be downloaded through it, for example, using the URL https://zm.org?upstream=https://curl.se/download/curl-8.17.0.tar.xz
  • Since zig build will check the archive hash, you can use untrusted mirrors.

This could also be an option for existing community mirrors, although this will increase storage requirements.

Another option is to make it very easy to immutably vendor dependencies.

I really like build.zig.zon design, where what identifies a dependency is its hash, not its URL/name, which enables transparent caching on the project (vendoring) machine (~/.zig-cache) and ecosystem (hypothetical zig-proxy).

  • If the environment variable ZIGPROXY=https://zm.org is specified, the archive will be downloaded through it, for example, using the URL https://zm.org?upstream=https://curl.se/download/curl-8.17.0.tar.xz

I think this URL should use package hash, rather than upstream URL:

This field is the source of truth; packages do not come from a url ; they come from a hash . url is just one of many possible mirrors for how to obtain a package matching this hash .

I think today already it’s possible to write a tool to populate ~/.zig-cache? You won’t get automatic integration with zig build, but you can say something:

Run `zig build fetch-from-cache` if the project fails to build

I agree that the hash is the source of truth, but to use it as a global ID, the mirror must have a registry to convert the hash to a URL.

Also, if the mirror wants to proactively download dependencies (for example, by tracking versions with nvchecker), it will also need human-readable names.

Actually, I made a tool that modifies build.zig.zon without storing it in the repository, but that seems suboptimal. I haven’t considered manually patching .zig-cache; would that be portable between versions?

2 Likes

zig can just provide the URL to the proxy in addition to the hash, if it isn’t yet cached per the hash it can fetch it from the URL.

There is never a situation where zig won’t have a URL to give the proxy.

Pre downloading dependencies is not relevant to zig, it should not affect how zig interacts with the proxy.

1 Like

Yes, but in a sense, the URL is optional. If there’s nothing in the cache, simply return a 404. For example, this could be useful if the mirror distributes private packages to companies that don’t formally have a URL.

Something like this:

GET https://pkg.earth/zig/pkg/N-V-__8AAHZtBwG9QppZpuNfPc2a1fXJcbcTdrduloHsdV2f?url=https://github.com/curl/curl/archive/8fc23088db24d60390c93f123c11fc18accd7c8c.tar.gz&source=zig-build
1 Like

the proxy can behave how it wants.

my point was: there is no reason for zig not to provide the URL to the proxy, so that the proxy can use it to fetch the package if it needs to.

I would imagine they would already have their own repositories or archives available on a private network, simply use the URL to point to the package on that network.

1 Like

For clarity, note that if you use a git source, deleting a git tag will not break your build as zig fetch will always first resolve it to a commit hash.

I’m guessing in your case you were depending on a generated tarball that went away once you deleted the tag.

There are plans for being able to specify mirrors, that said a proxy is a slightly different story that was not discussed yet AFAIK.

My recommendation in the meantime is to use git instead of non-statically-hosted tarballs.

2 Likes

Thanks for the tip. Using the commit hash would indeed work.

But that doesn’t solve the problem entirely. Some dependencies are available on GH, others only on Forgejo, and C libraries are often only available on the website. Failure of any of them will break the build. The author can also delete the repository (I’ve actually encountered this, albeit in a C++ build).

In the long term, I’d like to have reproducible builds that don’t depend on external infrastructure.

2 Likes

Tbh I see the current decentralization (e.g. no central package registry and hosting) as one of the main strengths of the Zig package manager, especially because a ‘package’ can be completely free-form (doesn’t have to be a Zig project, instead just a bunch of files without a build.zig.zon and which are hosted anywhere).

Projects which require more control can still host their dependencies on their own infrastructure (this may require an ‘override feature’ though, so that a top-level project can override the URLs of sub-dependencies - but IIRC such a thing was planned already).

PS: it would also be nice if a specific package ‘snapshot’ can continue to be entirely defined by its URL without any additional info outside the URL, so that the whole system works with the regular internet infrastructure (CDNs for caching etc) without having to implement a special ‘package manager protocol’ which requires some sort of custom backend for hosting packages.

1 Like

It would never have occurred to me to suggest a centralized package registry :slight_smile: Keeping copies of any dependencies is what I want.

As I mentioned above, there’s currently no convenient way to duplicate everything on my infrastructure.

You have to download all the packages to internal CDN and change the URLs in build.zig.zon, but it doesn’t work for open-source projects (since you’ll be serving any builds your users make) and doesn’t work for transitive dependencies (you’re not going to fork them all and change the URLs).

The idea is to preserve the original URLs in build.zig.zon so that anyone who wants to can route any downloads through a caching proxy.

This isn’t my idea, but a standard for Go (which also uses URLs as package IDs), Rust, NPM, and most other languages.

But unlike npm/maven, zig’s dependency is any archive, meaning regular Varnish is a valid proxy for zig. Just cache all the archives and you’re good to go.

1 Like

It’s not really about control. If your dependencies are decentralized, then the probability that the project doesn’t built at any given moment increases exponentially with the number of dependencies. Any git forge or static hosting is unavailable at least some time, and you need all your deps to be available at the same time to build. My prediction here is that any decentralized system grows either vendoring or mirroring/proxying (see, e.g., Go proxy or nix cache). Though, I am intrigued by support torrenting for fetching and serving packages · Issue #23236 · ziglang/zig · GitHub, it might be a particularly nice way to implement mirroring.

1 Like

I’ll add that Go has gone through this process. Versions prior to 1.13 downloaded packages directly from the source (for example, GH), which often broke builds.

Later, Google deployed https://proxy.golang.org/ as a centralized entrypoint, and since version 1.13, all dependency installations go through this service by default.

I think the lesson is that matklad is right, and the system should become distributed as early as possible to remain reliable and independent.

A good example is package managers in Linux (particularly apt, dnf, pacman). There are thousands of mirrors; your VPS provider likely offers one.

P.S. If a mirror can be set globally, CI can transparently speed up dependency downloads by maintaining a local mirror.

Yeah ‘control’ is poorly worded but I mean the same thing.

IMHO a centralized package host is a much jucier attack target than a system where packages are hosted all over the internet (and yeah, this torrent approach might be just the right solution).

FWIW, apart from breaking language/stdlib changes, my Github CI builds mainly fail because of some problem within the Github infrastructure, not because some external host isn’t available.

1 Like

I don’t want to gloat, but GitHub is partially down :slight_smile: https://www.githubstatus.com/incidents/1jw8ltnr1qrj