Best way to implement embarrassingly parallel loops

WiserOrb · April 2, 2025, 9:25am

Let’s say I want to apply a function to every item of a slice. This function is pure, there is no data dependency.

In my use-case the function needs a buffers to operate, so I also pass it

for(items) |*item| {
    foo(item, buffer);
}

My first approach was to use a thread pool and do a round-robin scheduling. (The items are a few KB in size, so there are no problem of false sharing)
I’ve used the Thread.Poll of std

var wg = std.Thread.WaitGroup{};
const Job = struct {
    start: usize,
    end: usize,
    stride: usize,
    
    fn run(job; Job, comptime func: anytype, items: []T , common_args: anytype) void {
        var i: usize = job.start;
        while (i < job.end) : ( i += job.stride) {
            @call(.auto, func, .{items[i]} ++ common_args);
        }
    }

};
for(0..n_jobs) |i| {
    const job: Job = .{
        .start = i,
        .end = items.len,
        .stride = n_jobs,
    };
    

    pool.spawnWg(&wg, Closure.run, .{ job, foo, items, .{ buffers[i] } });
}
pool.waitAndWork(&wg);

However this form of static scheduling has the problem that on unbalanced loads it doesn’t use the cpu cores effectively.

The solution is to use a dynamic scheduler, however I’m not experienced in building one.
Are there implementation in zig? There is something in std that I’ve missed?

matklad · April 2, 2025, 10:36am

GitHub - judofyr/spice: Fine-grained parallelism with sub-nanosecond overhead in Zig looks super cool, but I’ve never used it, and not an expert in the domain.

WiserOrb · April 2, 2025, 12:42pm

Thanks. I remember that project, seems very promising. I definitely have to test it.

Zonion · April 2, 2025, 12:48pm

This great articles explains how this work:
https://blog.molecular-matters.com/2015/08/24/job-system-2-0-lock-free-work-stealing-part-1-basics/

LucasSantos91 · April 2, 2025, 1:13pm

I’ve had a lot of success with this very simple approach:

fn run(counter: *std.atomic.Value(usize), items: []const Item) void{
    while(true){
        var index = counter.fetchAdd(1, .monotonic);
        if(index >= items.len) return;
        doWork(items[index]);
}

WiserOrb · April 2, 2025, 2:32pm

Remarkable simple