Checked in modules?

With node.js, it’s handy to be able to check node_modules into source control. This ensures that they cannot be externally modified/404’d and makes my repository stand-alone so I can archive it with confidence.

I can achieve this by walking my build.zig.zon dependencies, downloading and changing the url to a directory path. But, is there a tool or mechanism already for doing it? Should there be?

Perhaps “zig ossify/shrinkwrap/fossilise”, etc.

Never tried this, so might be wrong, but:

  • The correct terminology here would be vendor
  • The key idea is zig fetch command, which takes a directory as an argument and puts its contents into cache of downloaded packages, identified by the hash of the content of the package. Content-hashing is key, it means that Zig doesn’t really care where from the package is coming, as long as the hash matches.
  • You can write zig build vendor-save command which iterates over dependencies in build.zig.zon file and saves them to ./dependencies folder inside your repository
  • You can then write zig build vendor-load command, which iterates over the contents of the ./dependencies directory and calls zig fetch on each entry.

With this setup, you can manually call zig build vendor-save after adding a new dependency, and zig build vendor-load before building the source code.

I don’t know how to automate this though. Ah! I guess you can have some code in the configure phase of the build (so inside, fn build, not in fn make of some step) that just checks that build.zig.zon, ./dependencies, and global package cache are all in sync. should be relatively fast I think.

10 Likes

A similar concept to this is supply chain security.

At a company, you may want to protect yourself from your dependency changing underneath you (perhaps an attacker replacing the current version with a malicious one). This is covered by the hash, but it will still stop your builds if the tars become unavailable.

It’s common to use a tool like artifactory to store your tars on company servers for faster / more reliable fetching.

I’m also not sure how this can be automated. Since there is no concept of a central repository right now, one cannot just pass --repository-url=repository.mycompany.com in the build.

One idea might be to add an alternative protocol to the package manager instead of https that uses content addressing (urls that don’t refer to a location but contain hashes that represent the content), then the company could host their own node/server for that protocol that then returns the content for those hash addresses.

I imagine something like nix/guix/ipfs or different peer to peer systems could be used as the technology. The hash makes sure the content is valid, then something downloads the content using the url that basically is just a hash, with some other protocol.

1 Like

There is an intentional bit of friction for this use case because the purpose of packages is distribution of labor, and “vendoring” dependencies is antithetical to participating in that process. It interferes with the premise that the goal of a dependency is to be using the same code as other people, collaborating on bug fixes and enhancements by sending them upstream, and making a continuous effort to use an unpatched version.

The two stated goals here are:

  • Ensure they cannot be externally modified
  • Ensure they can be fetched

The first goal is already solved with content hashes. If an external modification occurs, it would be equivalent to a fetch failure, so really we are left with only the second point, which is planned to be addressed with mirrors.

An implicit point being made here is that using source control means that dependencies and source have the same failure mode. Either all of it worked, or all of it failed, and that is a desirable property. Similarly a tarball can be created that contains source as well as all dependencies.

Before I dig into that, I want to split that into two parts and then cast shade on one of them. The two parts:

  • Storing dependencies in the same location as the source
  • Ability to ship a tarball that contains dependencies as well as source

For the second one, I think this use case is generally invalid because the purpose of a tarball source distribution is entirely for package managers, for example Debian or Homebrew. However system package managers don’t want vendored dependencies, in fact that is typically disqualifying and those vendored deps have to be patched out in favor of real dependencies that are managed via the system package manager.

So if you’ve agreed with me so far then I’ve whittled down the use case to merely ability to mirror the dependencies in source control. That means this use case can be solved with some more creative ideas than copy pasting into a “dependencies” folder in the source tree.

For example, if your source control system is git, dependencies could be copied onto a special, separate branch, and a mirror could be added in build.zig.zon to reference this data. Git even has some fancy data store things where this data could be addressed in a more sophisticated manner without a user-visible branch, but that’s a bit outside my current knowledge area.

Some related Zig issues to track:

4 Likes

I typed all that out but I also want to add that I think it would make sense to add support for a “packages” directory declared in build.zig.zon which basically uses the directory as mirrors. And then sure, why not, some tool to auto-populate this directory. Whether or not this can avoid the data being redundantly stored in the global cache is a separate issue. There might be some neat trick that can be done there with hard links.

Related:

4 Likes

TBH, vendoring in a sense of having a downstream-patched copy of a dependency, and vending in a sense of providing a redundant source for the same content hashes feel like completely different use-cases to me, which perhaps shouldn’t be lumped together. One is very bad, the other, I argue, essential.

If you have many dependencies, then the probability that all of them are available at any given point in time can be quite low, as this is an AND property and you need to multiply individual probabilities.

So some solution is worthwhile here. Rust&Node solve it by taking care of hosting the source-of-truth for dependencies, Go&Nix solve via centralized caching layer. Either is pretty costly! I’d imagine Zig wants something more decentralized, and local vendoring feels like a decent solution, provided that there are enough checksum verification along the way to make the use-case of patching dependencies sufficiently inconvenient.

(of course, there’s also an option of not having dependencies :upside_down_face: )

If I am not mistaken we are in full agreement - perhaps my intro paragraph was misleading. The friction here I realized after typing out my thoughts has specifically to do with using path for what canonically is maintained by a third party rather than treating the vendored copy as a mirror.

1 Like

Which one is very bad? IME, neither is, and at least one is essential while the other becomes essential over time. Specifically for a cross-build capable solution at systems level itself, IME & O, these are the bare necessities for successful, continuous and historical (cross- / systems )building.

From a professional embedded/cross building/systems providing background, I would want to be able to do both with my tool of choice. I hope zig’s open to that argument.

There is an additional use case, which is the need to supply a source code package to a customer for building on a machine with no public internet access, or access which is severely broken, like the current http proxy bug in zig (which breaks zig’s ability to fetch dependencies).

Because I couldn’t figure out an official way to do this, we hacked our own solution: a json manifest file, build steps which create fat distributions, and a build option to point to a provided assets directory:

Somewhat described in the read me here:

1 Like