Build.zig.zon : check upstream checksum

I’m fiddling today with the build system and something bugs me.

(for context I’m trying to compile a Qt hello world via zig and fetch qt libs via the build system. Might or might not be a bad idea, but it’s not necessary directly relevant)

Problem statement :

I have http://random_mirror_im_not_fully_confident_with.com/tarball.tar.xz
I also have a md5sum for tarball.tar.xz on the official website, so I want to check.
I can do this by hand, so far so good.

If I use the url directly as .url in the build.zig.zon, zig will nicely and helpfully download the source, extract it and provide a .hash = ... for me.
(btw : I love the scheme of 0.16 that gives you both an easy to reach inflated folder and a canonical re-compressed version in the global cache, but I digress)

Now, what bugs me is how do I make sure the stuff zig downloaded is what’s expected ? I have zig’s provided hash that “protect” whatever is in zig-pkg, but it’s inflated and cannot run a md5sum on the folder.
I was thinking I could download manually, run the checksum, manually inflate and point .path = ... in the zon file but in that case the .hash = is not used as far as I know. So while working for me locally, I could not switch to upstream fetching with the same confidence.

It feels like zig is encouraging me to blindly fetch this stuff and this feels wrong that zig is not pushing me toward good practice.

Am I missing something obvious ? Does it seems a legit concern ? How would you address this ?

Digression - where I just want to babble on software dependencies The stuff above is pretty low stakes in my little experiment, but I'm genuinely surprised feeling zig pushing me toward not doing the right thing. I'm following zig closely for some years now, while the "this is hard" feeling has been common, "this is wrong" feeling is new (or crazy rare at least)

Supply-chain attack becomes more of a concern by the day lately, and it seems to me the package manager model “à la” pip/npm etc.. will become unsustainable.
But boy ! The convenience ! Manual dependency management is a hard sell in comparison (until you have catastrophic ecosystem collapse of course, because trust is gained slowly but breaks fast…)

Anyways, I find zig’s approach so interesting (as usual tbh). Strategic friction where the bad idea is just slightly harder to do. Let’s offer convenience in the usage, but still force manual authoring of dependencies. It’s a fine line to walk, and the “decentralized but fingerprinted” road taken by zig so far delighted me.
But the example above made me realize that just blindly zig fetch is a big temptation, and … Not sure honestly, just had to put that out somewhere I guess.

Note that I’m not saying zig fetch --save is a bad idea. It’s a brilliant QoL feature. But it makes me wondering if something is in a local optimum here …

Anyways, love zig :high_voltage:

2 Likes

I think you can zig fetch /path/on/my/machine, and that will print the Zig checksum of the package’s content. So, download manually, cross-check checksum against the website, then run zig fetch ~/dowloads/package.tar.gz to learn the checksum, then put the result to your build.zig.

2 Likes

I think this is actually good concern and community maybe could adapt to it by writing down the expected hash on release notes for their packages. Maybe zig build system could also implement some sort of signature verification in future? (imo overkill)

I agree with your digress though, zig is not yet on the level of how distro packagers do their thing, but it is miles ahead of typical software development setups. zig fetch / zig build --fetch does allow you to do offline builds, and once we get proper sandboxing and trusted mirrors we are closer to how distro maintainers package software.

1 Like

I guess the hash is mostly useful as a cache key, and less for security - which I guess would need a central and trusted registry(?).

In the end you’re responsible for the external packages you’re pulling in, not the package author.

E.g. the main problem of npm isn’t that supply chain attacks are possible, but that typical JS projects pull in thousands of dependencies.

Ah !

So zig fetch --help specify it can point to a

  - A tarball file (with or without compression) containing package source

which appears to be a category that encompass “folder”.
But reaching out to zig fetch for local resource didn’t occur to me at all.

In insight, it makes sense, thank you very much for the tip.

I did make the exercise and I do end up with the exact same hash.
Download by hand → check sum → extract → zig fetch /local/path → update zon file with remote .url
Slightly cumbersome but acceptable for the developer I guess, since one person has to do it once and then others can trust the source via the hash.

What I find a bit confusing however, is that after zig build --fetch local/path/folder, the .zon file looks like

.dependencies = .{
    .dep_name = .{
        .url = "local/path/folder",
        .hash = "..."
    }
}

zig build is happy with this situation as long as the canonical tarball is present in the cache. But if you remove it, you are left with error: invalid URI: InvalidFormat.

Maybe zig build --fetch local/path/folder should generate the hash and something along the lines of .url = "replace me by a correct url for this package"… Minor inconsistency for the time being I suppose.

Mmmh. Yeah, “protect” is ambigious here I suppose.

It was my understanding that zig-pkg hash is a control sum of the package contents as well as a registry cache key.

So my build.zig.zon asks the user to fetch www.upstream.com/noice.zip but is delivered with a check sum that will prevent compilation in the event of upsteam.combeing compromised.

In the end you’re responsible for the external packages you’re pulling in, not the package author.

Agreed. So, as the package author my job is to make sure I distribute the hash that indeed checksums the intended content, hence my original question.

Of course, you have various threat models. My understanding is that as long as I get a legit build.zig.zon, I can be confident the dependencies I will pull will be as intended by the package author. Kind of a tree of trust. You have no guarantees on the entry point, but if you trust that first zon file you know you will at least end up with whatever each package author of each recursive dependencies intended.

Wheras pip install noice will pull in noice’s dependencies, probably latest version of each (because version pinning is opt-in by the build system - and even version pinning doesn’t tell you much about the content), and you can end up with something totally different that what was present on a given remote url at the time the author look at it. On any of the recursive dependencies !

Maybe this is an incomplete on incorrect understanding … Please correct me if this is inaccurate.

E.g. the main problem of npm isn’t that supply chain attacks are possible, but that typical JS projects pull in thousands of dependencies.

Well, sure is a problem. I would however see it more as a multiplying risk factor than the main problem. As far as supply attack goes, you have some level of risk on each dependency, each bringing some attack surface.
Interestingly enough, writing the above made me think that pulling dependencies recursively is maybe already some kind of footgun. Maybe each package author should provide a flat list of pinned, control summed dependencies, that you would need to copy-paste manually so you have a strong friction against the dependency tree going too deep. Dunno, tricky tradeoff with convenience, and also going slowly but surely off topic. :wink: But thanks for the intellectual stimulation.

1 Like

Re package management, I kinda like how Deno solves publishing to JSR. You just run deno publish, this will open the browser and require an (infrequent) login to the JSR package registry and a page with an ‘approve’ button appears. Pressing the button communicates back to Deno that publishing has finished.

Requires infrastructure and associated cost though, and it is as centralized as it gets - which I think doesn’t quite fit into Zig’s philosophy (not sure if the ‘package security question’ can be solved without at least a central registry which stores package URL + version + hash - but doesn’t need to do the actual hosting).