CI Infrastructure

I’m not too privy on the CI infrastructure, but runner times are high. There’s unnecessary workloads, and things trickle into expenses.

  • Some files have no interaction with the build process. You can push these into new tarballs.
  • Other tool and reference files don’t require the entire bootstrap process in CI. These can be rebuilt independently and pushed into new tarballs.
  • Not once are changes to source files benign comment edits or refactors. Not every such thing should or needs to skip the bootstrap story, but if we can run a quick lookup on these changes, handful of runner jobs are shaved. I believe this could be smoother with some form of the #14656 proposal.

Those are only a few obvious steps. Less straightforward is considering specialized images. Things like Unikraft push the envelope and speed up runners by huge margins, but that’s admittedly still rough around the edges. Overall, the more initiative is taken for this, the easier it will be to budget for infrastructure down the line (see financial report).

1 Like

Our CI right now works fine, to be honest. When we have a big merge train it can get a little behind, but that’s no more than a slight annoyance – anyone severely affected by it is probably kind of misusing the CI, since you should always run tests locally before opening a PR.

The only reason it’s behind ATM is because our aarch64-macos runners went offline for a day or so, due to the team member who owns the physical machines traveling.

Hey Matthew, I appreciate your insight, but here’s my perspective:

  • I think you must open up budgetting opportunities long term, especially as PRs, tests, tier-1 targets in CI pile up. Ideally it should be efficient and affordable for 3rd parties to reproduce.

  • Contributors can’t run cross-platform tests locally, nor could you guarantee their reports. Test jobs offer to reflect a PR against all cases, but it shouldn’t churn the whole bootstrap on benign issues, when it can conditionally opt for smaller selective jobs where appropriate.

  • For developer productivity, CI times is also relevant alongside compilation speeds. Given LLVM support, you can’t count on Zig’s own compiler backends to corrode those bottlenecks.

As Zig nears production environments, perhaps when security vulnerabilities are confronted, forethought in this regard can be valuable. I believe that’s important and beneficial to address.