Initial implementation of `zig build --watch` just landed in master branch

andrewrk · July 13, 2024, 2:16am

introduce file system watching features to the zig build system

ziglang:master ← ziglang:watch

opened 10:32PM - 10 Jul 24 UTC

## Feature Explanation ``` --watch Continuously rebui…ld when source files are modified --debounce <ms> Delay before rebuilding after changed file detected ``` Uses the build system's perfect knowledge of all file system inputs to the pipeline to keep the build runner alive after completion, watching the minimal number of directories in order to trigger re-running only the dirty steps from the graph. Default debounce time is 50ms but this is configurable. It helps prevent wasted rebuilds when source files are changed in rapid succession, for example when saving with vim it does not do an atomic rename into place but actually deletes the destination file before writing it again, causing a brief period of invalid state, which would cause a build failure without debouncing (it would be followed by a successful build, but it's annoying to experience the temporary build failure regardless). The purpose of this feature is to reduce latency between editing and debugging in the development cycle. In large projects, the cache system must call `fstat` on a large number of files even when it is a cache hit. File system watching allows more efficient detection of stale pipeline steps. Mainly this is motivated by incremental compilation landing soon, so that we can keep the compiler running and responding to source code changes as fast as possible. In this case, also keeping the rest of the build pipeline up-to-date is table stakes. This also paves the road towards #68. A `Run` step combined with `--watch` connects file system updates directly to new code inside an already-running executable. It takes steps closer to more advanced use cases as well: * IDE plugin running `zig build` as a child process, speaking a compiler protocol (#615) to learn about type information, perform refactors, request rebuilds, receive errors, etc. The protocol will multiplex between an arbitrary number of `Compile` steps, each with a running instance of the compiler. * An application, compiled in debug mode, speaking the build runner protocol, able to poll on a file descriptor and learn when a hot swap is available, requesting it when it is convenient, or learning about new installed asset updates occurring. [Run step asciinema demo](https://asciinema.org/a/WBcdJqzWCHPfG1zHFeK1iayQn) This demo only shows 1/2 terminals used, but in the other window I'm editing assembly files with vim and saving them. [Compile step asciinema demo](https://asciinema.org/a/KAMDdrkh1fmUNOy3myVwGmOO3) - getting quick compile error feedback. Incremental compilation is not done yet, this is full compiler rebuilds, but skipping codegen. [unit test workflow asciinema demo](https://asciinema.org/a/AApIUJ5nHZmiLjE41oUDueOr5) ## Follow-Up Work * #20598 * #20599 * #20600 * #20601 * #20602 * #20603 * #20604 * #615 * #20605 * #20606 * improve Run step efficiency by making cache system integrate better with Cache.Path, making it avoid filesystem watches when a step dependency edge already tracks the dependency * emit the file_system_inputs message early so that even when the compiler crashes, we still know the set of files that trigger recompilation. * audit all file system watch directories and eliminate cases where zig-cache/o/ directories are watched because those are supposed to be handled instead by step dependencies. Except, incremental cache mode will mutate those artifacts. - should we use a different zig-cache subdirectory for incremental artifacts?

For those adventurous Zig users out there living on master branch, and who happen to use Linux, give the new --watch flag a try, and see if you like it.

Windows and macOS support are close, just needs some OS-specific glue code to make it work. I’m fairly confident the abstractions that I made will hold up nicely when it comes to those file system APIs since it’s based on watching directories only. Contributions welcome!

The way I see it, the end goal here is having incremental compilation + keeping the compiler alive for a long time for maximum compilation speed, and in that case keeping the rest of the build pipeline up-to-date is table stakes.

priddis · July 13, 2024, 2:25am

Awesome, I’ve been looking forward to this feature!

JPL · July 13, 2024, 6:12am

0.14.0-dev.244+0d79aa017

/home/soleil/.zig/zig build --watch -Doptimize=Debug --build-file /home/soleil/Zterm/src-zig/buildGencurs.zig

unrecognized argument: '--watch'
  access the help menu with 'zig build -h'
error: the following build command failed with exit code 1:
/home/soleil/Zterm/src-zig/.zig-cache/o/d761d71e938304ca5caaa9256c8033e5/build /home/soleil/.zig/zig /home/soleil/Zterm/src-zig /home/soleil/Zterm/src-zig/.zig-cache /home/soleil/.cache/zig --seed 0x68f5394f -Zf4225a4579ced9ad --watch -Doptimize=Debug

maybe I didn’t understand how to implement the --watch option

andrewrk · July 13, 2024, 6:32am

The downloads page needs about 12 hours to catch up to master branch.

matklad · July 13, 2024, 9:53am

My recollection is that doing a bullet proof abstraction is quiet tricky here, as it’s easy to loose updates. If you read the file first, and then register a watch, then any updates in between the two events are lost. It might be useful to have an atomic “readAndWatch” primitive, or, after registering a watch, do a first freshness check even before you get a first update event.

I can’t say that I fully understand the current implementation, and maybe it does handle it correctly, but the loop in build_runner looks suspicious to me:

    var w = try Watch.init();
    rebuild: while (true) {
        runStepNames();
        // If file is updated here, will it be picked up?
        try w.update(gpa, run.step_stack.keys());
        try w.wait(gpa);
        markFailedStepsDirty(gpa, run.step_stack.keys());
    }

If I am reading the code correctly, update here takes a set of file paths. To avoid the races, it seems it should accept path together with hashes of the content actually observed by runStepName.

IntegratedQuantum · July 13, 2024, 10:04am

Couldn’t you just register the watch first? Then you might get an update before reading, but that doesn’t seem to bad.

Luke · July 13, 2024, 11:22am

Oh that’s great: open a communication channel between the build system and application · Issue #20604 · ziglang/zig · GitHub

andrewrk · July 13, 2024, 8:01pm

Not on the first build - same as how it won’t be picked up with a regular zig build if you edit source files while the build is running. However, once the watch is established, it will pick up changes that happen during rebuilds because the change notifications will queue up inside the fanotify buffer.

jamii · July 21, 2024, 6:25pm

I haven’t looked at any of the code yet, but on the off-chance you’re not already aware of the many inotifiy gotchas it might be worth reading correct or inotify: pick one — wingolog.

andrewrk · July 21, 2024, 6:26pm

it’s using fanotify instead I’m actually pretty pleased with the API

dimdin · July 21, 2024, 8:59pm

Using the latest master, on debian stable, I am getting panic: reached unreachable code.

❯ uname -srm
Linux 6.1.0-22-amd64 x86_64
❯ zig-dev version
0.14.0-dev.367+a57479afc
❯ zig-dev build test --watch
Build Summary: 3/3 steps succeeded
test success
└─ run test-recover success 478us MaxRSS:1M
thread 487331 panic: reached unreachable code
/home/din/zig/master/lib/std/posix.zig:7318:19: 0x112fb98 in name_to_handle_atZ (build)
        .INVAL => unreachable, // bad flags, or handle_bytes too big
                  ^
/home/din/zig/master/lib/std/posix.zig:7305:30: 0x10f6bf7 in name_to_handle_at (build)
    return name_to_handle_atZ(dirfd, &pathname_c, handle, mount_id, flags);
                             ^
/home/din/zig/master/lib/std/Build/Watch.zig:101:40: 0x10f677e in getDirHandle (build)
            try posix.name_to_handle_at(path.root_dir.handle.fd, adjusted_path, stack_ptr, &mount_id, std.os.linux.AT.HANDLE_FID);
                                       ^
/home/din/zig/master/lib/std/Build/Watch.zig:161:67: 0x10fcc3c in update (build)
                            const dir_handle = try Os.getDirHandle(gpa, path);
                                                                  ^
/home/din/zig/master/lib/std/Build/Watch.zig:323:35: 0x10fda98 in update (build)
        .linux => return Os.update(w, gpa, steps),
                                  ^
/home/din/zig/master/lib/compiler/build_runner.zig:405:21: 0x110363f in main (build)
        try w.update(gpa, run.step_stack.keys());
                    ^
/home/din/zig/master/lib/std/start.zig:532:37: 0x10de985 in posixCallMainAndExit (build)
            const result = root.main() catch |err| {
                                    ^
/home/din/zig/master/lib/std/start.zig:277:5: 0x10de4a1 in _start (build)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0xa in ??? (???)
Unwind information for `???:0xa` was not available, trace may be incomplete

error: the following build command crashed:
/home/din/src/zig-recover/.zig-cache/o/a1ff0bdedd6e79a7f0ca2eeccf2c252c/build /home/din/zig/master/zig /home/din/zig/master/lib /home/din/src/zig-recover /home/din/src/zig-recover/.zig-cache /home/din/.cache/zig --seed 0xa2fd6fbd -Z7be4afe0b650cc74 test --watch

andrewrk · July 21, 2024, 9:04pm

Yeah looks like Linux didn’t future-proof their flags argument in name_to_handle_at, so that will need to be handled as an error instead of unreachable.

I don’t think anyone filed that bug on the issue tracker yet.

dimdin · July 21, 2024, 9:12pm

#20720

purefns · July 21, 2024, 9:27pm

Looks great on latest master!

❯ uname -rms
Linux 6.6.37 x86_64

zig-build-watch

matklad · July 22, 2024, 10:11am

This is of topic, but, in general, watching directory recursively feels like an unsolved problem at this point. Luckily, it seems that zig doesn’t have to do this.

The fundamental problem is that you can’t treat “walk the directory recursively” and “watch the directory” as separate operations. That is inherently racy.

Though, I am skeptical that a hypothetical non-racy API could provide guarantees that wingolog wants. They seem to want to see the total order of modification events, but it’s not clear if that exists at all, especially for cases like networked file systems.

It seems some form of eventual consistency is more achievable:

the input to the API is a predicate, which describes which subtree of the file system is watched (it’s a predicate over all existing and future paths)
the output of the API is the stream of events, which gurantees that at least one event is delivered per path after that path becomes quiescent (importantly, this includes the initial even “this path exists” when registering a walkwatch).

This API requires some state full bookkeeping. And that stateful booking I think is an explanation for why watchman is a service, rather than a library: it essentially polyfills missing stateful OS APIs.

Now, as Zig doesn’t do recursive watching, I think that it dodges most of the complexity. Though, I am not sure how I feel about the races it does have. For the purpose of --watch flag during development, this is completely irrelevant. The thing is, no one is doing watching correctly, and that works absolutely fine at small scale, you only get to “gosh, I really need something like watchman” once you have giant monorepo, VCS, and production incremental build system that all want to process changes incrementally.

andrewrk · July 22, 2024, 8:54pm

It does in some cases, for example if you add a directory to a WriteFile step:

github.com

ziglang/zig/blob/eac7fd4da5992299a1f2fb59c5aa237c0c6c6761/lib/std/Build/Step/WriteFile.zig#L236


      
          
              var it = try src_dir.walk(gpa);
              defer it.deinit();
              while (try it.next()) |entry| {
                  if (!dir.options.pathIncluded(entry.path)) continue;
          
                  switch (entry.kind) {
                      .directory => {
                          if (need_derived_inputs) {
                              const entry_path = try src_dir_path.join(arena, entry.path);
                              try step.addDirectoryWatchInputFromPath(entry_path);
                          }
                      },
                      .file => {
                          const entry_path = try src_dir_path.join(arena, entry.path);
                          _ = try man.addFilePath(entry_path, null);
                      },
                      else => continue,
                  }
              }
          }

You are right that if somebody adds a directory while the pipeline is being run, it may fail to acquire a watch mark.

I feel comfortable with this limitation. Even without file watching, there is already the existing limitation that editing a source file during the build pipeline may cause a problem, for example, if the file is read twice and expected to be the same both times.

To me the main concern here is whether this limitation interferes with the use case of using the build runner as a child process in an IDE, where users are expected to be editing files at any time, including potentially while the build is running.

matklad · July 23, 2024, 9:41am

I think this would be fine, but not ideal. Here, this=possibility of lost updates if some events happen in quick succession (eg, you add an @import, and then modify the imported file while the build is running, or some such).

Some things that happen in this situation:

Some editors write files frequently: there might be a very low auto-save delay.
git switch changes a bunch of files all at once (in general, switching branches is a good quick test for change-tracking and incremental)
Symlinks are in general hard to handle correctly, if you want to wach stuff.
Some changes are in-memory only. So there needs to be some code somewhere which merges the file-system view, and the in-memory view, and it also needs to handle the case where the source of truth for the document changes.
That’s some fiddly code, and a lot of bugs can hide between server-side file watching, client-side file watching, and reconciling the two, and the bugs are annoying to reproduce. Which is the main thing which makes me queasy about the possibility of lost updates: if you know that the low-level build always gives correct results (eventually), then any residual observed bugs must be on the editor’s side.
At the same time, if there are bugs, they are benign. It’s hard to get stuck with an inconsistent state, the user can almost always unbreak things by saving the file. Which makes the bugs into mere annoyances, but also makes them harder to fix (user work-around, rather than report issues)
This might have implications for incremental compilation. I guess the right term here is “repeatable read”? Depending on the implementation, it could be the case that a compilation reads the same file from disk twice, assumes that result is going to be the same, and something panics if that’s not the case. To avoid crashes, there either needs to be some sort of snapshot semantics, where the reads within a single compilation are repatable (eg, by only reading each file once), or otherwise the code should be prepared to not crash (its ok to return an error here and recompile from scratch)
Ah, I guess another interesting case here is cancelation (not sure whether there should be cancellation). If build is canceled mid-way, but caches some results, it must still watched the files touched.

plaukiu · July 25, 2024, 6:30pm

i’ve been consistently calling zig build via entr for some time now and i have concluded that this is the way to go for me. having continuous support from the compiler, together with a couple of scripts to accelerate grepping and std navigation has proved me faster and more comfortable than the blotchy and unexpected nature of lsp ever did. hooray for one dependency getting goodbyed.