There’s fn wait(ChildProcess) !Term, which blocks until the child has exited. Is there an easy way to get something like
fn wait_now(ChildProcess) !?Term
which doesn’t block, and returns a null if the process hasn’t exited yet?
In case this is an XY problem: I have a bunch of fuzzers, which I want to run in a loop for some time. On a multicore CPU, I also want to run one fuzzer process per CPU. So I’d love to write the code a-la
var fuzzers: [num_cpus]?ChilldProcess = .{null} ** num_cpus;
for (0..60) {
for (&fuzzers) |*fuzzer| {
if (fuzzer.* == null) fuzzer.* = spawn_fuzzer_process();
}
std.time.sleep(1 * std.time.ns_per_s);
for (&fuzzers) |*fuzzer| {
if (fuzzer.?.wait_now()) |exit_code| {
if (exit_code.is_error() report_error();
fuzzer.?.* = null;
};
}
}
I haven’t tried so I am not sure about maybe other better alternatives, but I would imagine this could work:
Have 2n+1 processes, 1 main process that fills a queue, n child processes that pull from a queue and start a “grandchild” process to execute the command, then block on the completion of the grandchild, when it’s done pull the next item from the queue or exit if the queue is empty.
That way as long as there is work new processes get started, once the work queue is empty/closed, we (in the worst case) wait for the last “process starter” to complete until they get joined, but this is fine because the starters have done their work already.
The only thing I imagine that might be problematic about that, might be that you have a bunch more processes, but I would have to do some experiments to see whether that is actually a problem. I guess you also have more moving pieces and if your starters could crash in some way then you would have a similar problem again, but if the starter processes don’t fail in 99.9999% of cases then you at least aren’t blocking in the normal case.
Instead of:
Each starter could run in a while loop:
while(queue.popFront()) |item| { // popFront() returns null if the queue was closed by the write end?
const fuzzer = spawn_fuzzer_process();
const exit_code = fuzzer.wait_now(); // blocking
if(exit_code.is_error()) results_queue.pushBack(.{.id=item.id, .code=exit_code});
}
// work queue empty -> exit
The parts I don’t currently know are:
whether such a queue could be implemented easy/efficiently via for example io_uring
can we reuse something the zig compiler uses, or do we want something with more / other features?
are there some details I am unaware of, because I haven’t done any multi threaded/process programming, in zig yet
Until I have done some zig multi process programming, I may have some misconceptions, based on what other languages have hidden via their abstractions.
If you mean “how to check if a child exited without going to sleep” then yes, just do it in non-blocking manner. I meant the machinery with child processes in general.
And also some kinda more elaborated version, ~10 yrs old too:
here it is
int do_wait_workers(struct monitor *m)
{
int err;
int k, s;
int n;
/* account for signal merging */
while (1) {
n = 0;
for (k = 0; k < m->nworkers; k++) {
struct worker *w = &m->workers[k];
if (!w->pid)
continue;
err = wait_worker(w);
if (err) /* this one did not exited */
continue;
if (WIFSTOPPED(w->status)) {
log_msg("WARN: '%s' stopped... being traced?\n", w->name);
continue;
}
if ((!(WIFEXITED(w->status))) && (!(WIFSIGNALED(w->status)))) {
log_msg("WARN: '%s': WTF? (status = %d)\n", w->name, w->status);
continue;
}
/* worker terminated */
n++;
w->pid = 0; /* mark as not running */
s = WEXITSTATUS(w->status);
log_msg("INFO: '%s' terminated with status %d (0x%.8X)\n", w->name, s, w->status);
edsm_put_event(monsm[k], EC_WORKER_EXITED);
if (WIFSIGNALED(w->status)) {
/* most likely worker crashed during initialization */
int sig = WTERMSIG(w->status);
log_msg(" by signal %d (most likely crashed)\n", sig);
continue;
// w->restart = 1; (?)
}
if (s & EXIT_WORK) {
log_msg(" during it's working stage\n");
} else {
log_msg(" during it's init stage\n");
}
if (s & EXIT_RESPAWN) {
w->restart = 1;
log_msg(" and WANTS to be restarted\n");
} else {
log_msg(" and DO NOT WANT to be restarted\n");
}
}
if (!n)
break;
}
return 0;
}
wait_worker() function is quite simple, it’s just as:
int wait_worker(struct worker *w)
{
pid_t pid;
int status;
pid = waitpid(w->pid, &status, WNOHANG);
if (-1 == pid) {
log_msg("OOPS: waitpid(%d): %s (worker '%s')\n", w->pid, strerror(errno), w->name);
return 1;
}
if (!pid)
return 1;
w->status = status;
return 0;
}
If I remember right, that mon (which is an event driven state machine, btw) was designed not only to watch if a worker process has exited, but also it monitors workers’ behavior in this manner:
worker did not consume no CPU in a period - kill it
worker did not perform no I/O operations in a period - kill it
But also bother about restarting them.
Something like that, I do not remember exact details.
After a year or so I realized that inventing a system inside a system is not a right way to go and I separated all those workers (they were not a bunch of same program copy in my case, each has it’s own specific job) into separate services (in systemd terms) - and let them managed by systemd.
Sure, look for edsm on my gh page, same nick as at this forum.
There is a version in D where I’ve made some abstraction,
so it goes both for Linux and FreeBSD. Timers in particular are interesting,
because they are not fd in FreeBSD.
That’s great, but it would be interesting to see a (single thread) application, that
handles multiple clients, sending data using various protocols
interacts with multiple instances of some DBMS
interacts with multiple instances of some KVS (REDIS or so)
not to mention such a “little things” as signals and file system events.
BTW, about file system - there is IN_Q_OVERFLOW event that screwed my brain out a couple of times and eventually I came to a conclusion that in case of very frequent events it is better to use polling (opendir/readdir/closedir) approach, so now I use inotify only for things which I am sure about to happen rarely.
Take a step forward, add a state machine layer on top of event loop and you’ll get:
automatically structured code (a function for each particular state/event combination)
100+ level of concurrency without fibers/goroutines compiler magic
true OOP (entities interacting with messages - this is exactly what Alan Kay meant, not that “incapsulation/inheritance/polymorphism” triad set on the edge)
For a simple scenario as in your example (launch a child then wait indefinitely for broken pipe) it might work, but your while is a CPU hog (at least) and this is not a solution for a server with “serve one or more clients in each child” model, since such waiting mechanism would significantly complicate the logic - you would have to poll periodically with this _ = stdin.write.
I don’t have a server problem, I have “I need to run 10 fuzzers problem”. For the server, of course I’d avoid spawning sub process in the first place, and, if I do need sub processes, I’d shovel pidfd into io_uring or something. But this question is a different genre — I am looking for the simplest solution to solve the problem in the small.