Clever stuff!
The patch is straightforward enough. Given the screen shot, it functions correctly, as well. With the current tools available, this approach is probably optimal.
I remain curious what sort of minimal and general actions can be added to the build system to enable this kind of dependency flow. The essence of a build system is “detect changes to files and take action accordingly”, after all. So opening the file in a user function and comparing its contents is a tiny inner-system, it duplicates something the build system already knows how to do.
I’m musing here about how to a) generalize the idea and b) make it bulletproof by expressing it in the build graph.
So what are the hazards here? One is that this technique relies on a staging area, for a couple reasons: you need the canonical version of the header to check against, and (generally, not necessarily here) we want to both fail on change and not mutate the program by clobbering the old version with the new one. This version doesn’t mutate the header file before it fails, but since CI throws away all the work, it would be acceptable if it did. But this isn’t always the case.
In fact we’ve identified a general hazard with build.zig
. The build system doesn’t just produce out-of-band artifacts, it’s also capable of changing the project itself. This feature is essential, but now we have a distinct class of problem: we want to be able to use the build system to make an up-to-date version of the project, and test it, but in general that means installing mutated files where they belong, and if that fails, the project is left in an inconsistent state.
Imagine that, instead of failure being defined by any change to the header, it’s defined by an artifact built with that header failing: we need that header to provide an API, and it doesn’t. Detecting that involves installing it.
Often this can be repaired with git reset --hard HEAD
but a) that’s using revision control to clean up and b) no one likes to type that string, for obvious reasons! I, for one, frequently run builds without having committed the chances I’m about to test.
The documentation is aware that this is a dangerous situation, here’s the quote:
Be careful with this functionality; it should not be used during the normal build process, but as a utility run by a developer with intention to update source files, which will then be committed to version control. If it is done during the normal build process, it will cause caching and concurrency bugs.
So I’m seeing a couple avenues for improvement. Keep in mind that I’ve read the current documentation, and spent some time with Build.zig
, but the build system remains under-documented, and I’m not 100% sure what it can and can’t do right now.
The first level is providing a way to create an action dependent on a tracked file changing during the build process. The build system natively uses a staging area, so that would allow us to let the build system figure out whether a new artifact is the same as an old one, and give users an opportunity to insert an intermediate step. This is the one where I’m not confident that such an affordance doesn’t already exist.
For the motivating problem, that would eliminate all (I think) of the custom code. An onChange
step could be made a dependency of the install step, and if the build is CI
then it fails.
This works whenever the old and new files can be statically compared to determine if an invariant is violated. But that’s not always or even usually the case.
So the second thing to consider is adding a transaction concept to the build. As in databases generally, in the absence of an explicit transaction, every change is considered to be one. In this context, that means that an install step just installs the new version in the way it does now.
But if a step defines a transaction, then the old version of everything is saved to the cache before new versions are installed, and the build script has to either commit or rollback. Given that the build system creates a dependency graph, it should be possible to make it a load error to create a dependency graph where any transaction is not either committed or rolled back: load error meaning that build.zig
will fail before taking any action if the graph doesn’t have the correct shape. Close enough to a compile error, given comptime
it probably can be a compile error.
That would cover all the bases. The build script can make arbitrary changes to the state of the project (as it can now) and also guarantee that if those changes don’t pass muster, they won’t be visible in the final state of the project at exit. The only failure mode left is that file systems aren’t actually a database, so the rollback can itself fail under rare circumstance, but that can be loudly and prominently displayed to the user if and when it happens. I’m not sure how far it’s appropriate to take this, in terms of chasing the long tail of reliability, but I want to note that for rollback failure, the cache would contain everything it needs to try the rollback again. I’d hazard that when we’re talking about actual filesystem / syscall failures like this, it’s ok to clean up out-of-band, as long as the problem is cleanly reported.
Something to consider! This would generalize quite nicely: consider downloading nightlies of some dependency, and conditionally installing them in order to test the project against the new version, such that only passing versions actually get installed.