Deduplicating identical build steps

Help me find a nice pattern for the following CI setup. For the sake of an example, suppose I am
building a binary with unit tests and integration tests.

So, I might have something like

fn build_binary(b: *Build, target: ResolvedTarget, optimize: OptimizeMode) *Step.Compile {
    const root_module = b.addModule("binary", .{
        .root_source_file .b.path("./src/main.zig"),
        ...
    });
    ...
}

fn build_unit_tests(b: *Build, target: ResolvedTarget, optimize: OptimizeMode) *Step.Compile { ... }

fn build_integration_tests(b: *Build, target: ResolvedTarget, optimize: OptimizeMode) *Step.Compile {
    const binary = build_binary(b, target, optimize);

    const options = b.addOptions();
    options.addOptionPath("binary", binary.getEmittedBin());

    const root_module = b.addModule("test", .{
        .root_source_file .b.path("./src/test_integration.zig"),
        ...
    });
}

Note how integration tests need the binary build.

And then I wire things up in main:

const target = b.standardTargetOptions(.{});
const optimize = b.standardOptimizeOption(.{});

b.step("compile", "").dependOn(&build_binary(b, target, optimize).step);

b.step("test:unit", "").dependOn(&b.addRunArtifact(
    build_unit_tests(b, target, optimize),
).step);

b.step("test:integration", "").dependOn(&b.addRunArtifact(
    build_integration_tests(b, target, optimize),
).step);

Note how in the above there are essentially two identical Step.Compile that build the binary. I
think this is ok: if I run zig build compile and then zig build test:integration, the binary
will be built only once. Although the steps are distinct in build.zig process memory, they are equal
as values, have the same dependency information, and re-use each other caches.

Now, assume I also want to also have a CI step, which compiles my binary for a set of supported
targets, and runs the tests in debug and release:

const ci = b.step("CI", "");
for (supported_targets) |target| {
    ci.dependOn(&build_binary(b, target, .Debug).step);
}
for (.{.Debug, .ReleaseSafe}) |optimize| {
    ci.dependOn(&build_unit_tests(b, b.graph.host, optimize).step);
    ci.dependOn(&build_integration_tests(b, b.graph.host, optimize).step);
}

Now this becomes more problematic: a single invocation of zig build ci will attempt to build exe
twice, once in the target loop for cross-compilation, and once in the optimize loop for integration
tests. Again, this probably won’t lead to recompiling the code twice, but I think at least ā€œis
this freshā€ checks will be executed twice? And I think if I have some ā€œhas_side_effects = trueā€
dependencies (e.g, embedding current time or git commit as build metadata), those will also be
executed twice, and could actually force extra compilation?

And, with two loops, it’s not entirely trivial to re-use the build_binary step programmatically,
as I need to match the right target and optimization level.

It looks like I need some sort of step interning here, where I cache the steps created so far,

fn build_binary(b: *Build, target: ResolvedTarget, optimize: OptimizeMode) *Step.Compile {
    const Cache = struct {
        var global: std.AutoArrayHashMapUnmanaged(
            struct { ResolvedTarget, OptimizeMode },
            *Step.Compile,
        ) = .{};
    };

    const gop = Cache.global.getOrPut(b.allocator, .{ target, optimize }) catch @panic("OOM");
    if (gop.found_existing) return gop.value_ptr.*;
}

but this feels somewhat complicated, especially if I need to do this for many different kinds of
steps, and want to manage the cache in a less add hoc manner than via a local static variable
(admitedly, local statics feel ok for build.zig).

Is this the best pattern to ā€œdeduplicateā€ the build graph?

I believe you can do the following:

  • make the build binary its own named step.
  • use dependencyFromBuildZig with @This() to get the build file as its own dependency, passing in the target and optimize values.
  • use that dependency to getArtifact to get the compile step out of the self dependency.

I think this would work for you. (I normally test before answering, but I’m away from my computer and I’m being lazy.)

2 Likes

Wow, I didn’t realize that dependencyFromBuildZig is a thing, this is a neat thing, thanks!

2 Likes

Why don’t you just pass the *Step.Compile you already created, instead of making new ones?

fn build_unit_tests(b: *Build, binary: *Step.Compile) *Step.Compile { ... }

fn build_integration_tests(b: *Build, binary: *Step.Compile) *Step.Compile {
    const options = b.addOptions();
    options.addOptionPath("binary", binary.getEmittedBin());

    const root_module = b.addModule("test", .{
        .root_source_file .b.path("./src/test_integration.zig"),
        .target = binary.root_module.resolved_target,
        .optimize = binary.root_module.optimize
        ...
    });
}

fn build_ci_all_targets(b: *Build, binaries: []const *Step.Compile) *Step.Compile{
    const ci = b.step("CI", "");
    for |binaries| |binary| {
        ci.dependOn(&binary.step);
    }
    return ci;
}

In

for (supported_targets) |target| {
    ci.dependOn(&build_binary(b, target, .Debug).step);
}
for (.{.Debug, .ReleaseSafe}) |optimize| {
    ci.dependOn(&build_unit_tests(b, b.graph.host, optimize).step);
    ci.dependOn(&build_integration_tests(b, b.graph.host, optimize).step);
}

I can do

for (supported_targets) |target| {
    ci.dependOn(&build_binary(b, target, .Debug).step);
}
for (.{.Debug, .ReleaseSafe}) |optimize| {
    const binary = build_binary(b, g.graph.host, optimize);
    ci.dependOn(&build_unit_tests(b, b.graph.host, optimize).step);
    ci.dependOn(&build_integration_tests(b, b.graph.host, optimize, binary).step);
}

but this just makes the duplication explicit. To remove the duplication, I also need to somehow match build_binary’s from the first loop with the second loop, but the mapping is non-trivial, as the loops are over independent axes.