How to async/concurrent execute of subsets of tasks consecutively

lukeflo · February 11, 2026, 4:11pm

Hi,

I’m a little bit stuck with getting my code to work the way I want it.

Since I’m really new to Zig and documentation (especially on master branch) code is sometimes missing and often outdated, I’m hoping for some help from the experts here

What I try to accomplish is the following:

I want to download (GET) file content in chunks and afterwards put it back together as complete file. Additionally, I’m trying to define a max count of parallel downloads. Always when a single chunk download is finished, the next chunk should start downloading until there are no chunks left and the whole body is downloaded an put together to a new local file. I know thats a very common task, but its only the frame for a more specialized workflow in the final code. But for now I’m stuck with the basic task itself.

For a different program in Rust I accomplished that by using a Semaphore together with a Vec of Tokio tasks. Every task there get a permit as long as the max permit count of the semaphore isn’t reached. If there is no free permit, it idles until an earlier task is finished. That works fine, but the whole tokio/Semaphore code interface of Rust is very abstract and there is a lot of stuff happening in the background I don’t see, let alone understand if I look into the code.

Thus, I thought using a std.Io.Semaphore together with some async/concurrent functions could make it work in Zig too. I have the following code which is a simplified version of the stuff I want to accomplish. Its not really downloading anything but slicing a string into parts and returning each part after a sleep of 2 seconds. This should mimic a potential downloading time for bigger chunks good enough. The code runs fine and the file is put together correctly. But all tasks are executed in parallel meaning the whole code only takes 2 seconds. But what I want with this example should be about 12 seconds, because the max permit count of 3 of the semaphore.

const std = @import("std");
const Io = std.Io;
pub fn main(init: std.process.Init) !void {
    const allocator: std.mem.Allocator = init.gpa;

    var threaded: Io.Threaded = .init(allocator, .{ .environ = init.minimal.environ });
    defer threaded.deinit();
    const io = threaded.io();

    // Some dumb blind text
    const content =
        \\Far far away, behind the word mountains, far from the countries Vokalia and
        \\Consonantia, there live the blind texts. Separated they live in Bookmarksgrove
        \\right at the coast of the Semantics, a large language ocean. A small river named
        \\Duden flows by their place and supplies it with the necessary regelialia. It is
        \\a paradisematic country, in which roasted parts of sentences fly into your
        \\mouth. Even the all-powerful Pointing has no control about the blind texts it is
    ;

    const ContentRange = struct {
        start: usize,
        end: usize,
    };

    const part_size = 50;

    // Calculate parts count
    var part_count = (content.len / part_size) + 1;
    var last_part_size = content.len % part_size;
    if (last_part_size == 0) {
        last_part_size = part_size;
        part_count -= 1;
    }

    // Create array of part ranges
    var ranges = std.ArrayList(ContentRange).empty;
    defer ranges.clearAndFree(allocator);

    // Add ranges to array
    for (0..part_count) |part| {
        const start = part * part_size;
        // INFO: Remember the "end" index of a http request rangesis inclusive,
        // while "end" index of byte/string slice is exclusive
        const end = if (part == part_count - 1)
            start + last_part_size - 1
        else
            (part + 1) * part_size - 1;
        try ranges.append(allocator, .{ .start = start, .end = end });
    }

    for (ranges.items) |r| {
        std.log.info("Range {d}-{d}: {s}", .{ r.start, r.end, content[r.start .. r.end + 1] }); // add one to end index since byte slices are exclusive

    }

    const start_time = std.Io.Timestamp.now(io, .awake);

    var local_file = try std.Io.Dir.createFile(.cwd(), io, "local_file", .{});

    const max_concurrency_count = 3;
    var semaphore: Io.Semaphore = .{ .permits = max_concurrency_count };

    var futures: std.ArrayList(Io.Future(anyerror!void)) = .empty;
    defer futures.clearAndFree(allocator);

    // TODO: Maybe a different implementation with Io.Group, Io.Queue or Io.Select
    // var futures: std.Io.Queue(Io.Future(!void)) = .init(&.{});

    for (ranges.items, 0..) |r, i| {
        std.log.debug("Start putting range {d} to futures", .{i});
        const chunk = content[r.start .. r.end + 1];
        try futures.append(allocator, io.async(downloadAndWrite, .{ io, &semaphore, &local_file, chunk, r.start }));
        semaphore.post(io);
        std.log.debug("Finished putting range {d} to futures, semaphore increased", .{i});
        // try futures.putOne(io, io.async(downloadAndWrite, .{ io, &semaphore, &local_file, chunk, r.start }));
    }

    for (futures.items) |*func| {
        try func.await(io);
        const now = Io.Timestamp.untilNow(start_time, io, .awake);
        std.log.debug("Completed part {d}ms after start", .{now.toMilliseconds()});
    }

    var buf: [1024]u8 = undefined;
    var stdout = std.Io.File.Writer.init(.stdout(), io, &buf);
    const stdout_writer = &stdout.interface;

    try stdout_writer.print("Finished\n", .{});
}

// Fake a download, waiting 0.5s before returning the chunk.
// Then writing the chunk to the newly created file at the correct position.
fn downloadAndWrite(io: Io, sema: *Io.Semaphore, file: *std.Io.File, chunk: []const u8, start: u64) anyerror!void {
    try sema.wait(io);
    try io.sleep(std.Io.Duration.fromMilliseconds(2000), .awake);
    try file.writePositionalAll(io, chunk, start);
}

Output (where I would await more 2 seconds steps):

debug: Completed part 2000ms after start
debug: Completed part 2000ms after start
debug: Completed part 2000ms after start
debug: Completed part 2000ms after start
debug: Completed part 2000ms after start
debug: Completed part 2000ms after start
debug: Completed part 2001ms after start
debug: Completed part 2001ms after start
debug: Completed part 2001ms after start
debug: Completed part 2001ms after start

I already thought using either std.Io.Group, std.Io.Select or std.Io.Queue for this job might be better, but documentation on those structs is rare. Most stuff online uses something like std.Thread.Pool which seems deprecated in master.

Thus, I’m thankful for any idea/hint, whatever! And sorry for the long text and if I’m missing something obvious. Zig is still new to me, as is this low level async/concurrent stuff (I know “async is not concurrent” )

Thank you

Southporter · February 11, 2026, 4:57pm

Your problem is here. The semaphore.post(io) should be in downloadAndWrite, not in the for loop.
What is happening is that you create the futures and then immediately increment the semaphore, making more permits. Then when you actually run the futures, the premits will be N + 3 and so all of them are able to get a permit and run.

I pulled your code and moved it to downloadAndWrite and it only did it in batches of 3:

// Fake a download, waiting 0.5s before returning the chunk.
// Then writing the chunk to the newly created file at the correct position.
fn downloadAndWrite(io: Io, sema: *Io.Semaphore, file: *std.Io.File, chunk: []const u8, start: u64) anyerror!void {
    try sema.wait(io);
    // free up the permit on exit.
    defer sema.post(io);
    try io.sleep(std.Io.Duration.fromMilliseconds(2000), .awake);
    try file.writePositionalAll(io, chunk, start);
}

lukeflo · February 11, 2026, 8:43pm

Thanks for the fast and detailed reply. Now that you explained it that totally makes sense. As I don’t have the opportunity right now, I’ll try it tomorrow and report!

However, just fueled by interest and the willing to learn: Are there more idiomatic/efficient ways to achieve this using some of the tools mentioned above (Group, Select, Queue)? Since I couldn’t find many real usage examples, if any, would be happy for some insights/ideas.

Southporter · February 11, 2026, 9:39pm

Idiomatic will come with time. I am not proficient with the new Io stuff (my code is mainly stuck on 0.15.2), so someone more knowledgable will be able to provide more information on the efficiency/idomatic part.
I can explain the different parts and how they interact with a few ideas on how to accomplish your situation.

Group

A group is a collection of futures that can be canceled or awaited together. So in your example, instead of sticking the futures in an ArrayList, you could instead Group.async them and then later just run Group.await to wait for all of them to finish.

Select

Select is similar to a group but with one exception. Instead of waiting for all of the futures to finish, it takes the result from the first one that finishes and cancels the rest. It will return that result to the caller of Select.await.

Queue

A queue is like a Go channel or a Rust mpsc/mpmc channel. You can plug work into the queue and have a worker or workers on the other end pulling data off and running it. By itself this will not limit the number of workers or how many things are done in parallel. It’s just a container for the work to be done.

Brainstorming

Now to your overall question. Using a semaphore and spawning all of the tasks into a group would work. This will allow only a certain number of tasks to run at once with the convenience of a single Group.await at the end to make sure they are all done.

Another option would be to push all the work into a queue. You then spawn X workers who pull off the queue, do the processing, and then come back to the queue. They will block once the queue is empty. This limits the number of things in parallel.

Performance in either case will largely depend on how each task is implemented. If you have multiple calls to async inside a task, then you may have idle time while the IO happends but the Semaphore is locked or nothing is able to pull off the queue because all workers are blocked.

I think a more optimal solution would be to set up your Io implementation how you want, and spawn all tasks into a group. Don’t worry about how much is running in parallel as that will be handled by the Io implementation. Then when any of them are waiting for a future to finish, the implementation can go work on the next one. At the end, await the group.

Conclusion

A lot of this is talking in the abstract. The actual solution will depend on your problem and the actual constraints you want to make on the system. For example, you said:

In order to pick the best solution, one would need to know why you need a max count, what that count is, etc. Those constraints will inform how you compose the different Io tools to accomplish it.

lukeflo · February 12, 2026, 8:05am

Hi,

just tried your approach. First of all, its working much better. However, the result seems to be inconsistent. Sometimes it really processes batches of 3, sometimes its 1-5-3-1 (I reduced the sleep time to 500ms):

debug: Completed part 500ms after start
debug: Completed part 1000ms after start
debug: Completed part 1000ms after start
debug: Completed part 1000ms after start
debug: Completed part 1001ms after start
debug: Completed part 1001ms after start
debug: Completed part 1501ms after start
debug: Completed part 1501ms after start
debug: Completed part 1501ms after start
debug: Completed part 2001ms after start

In general, thats no big deal since it works good enough. But I would like to understand why it sometimes processes more than the max permit of 3 at the same time.

Furthermore, many thanks for the elaborated explanation. But there is one thing I don’t understand regarding Group (and thats what kept me from using it so far; maybe due to my not so good English skills): If I would use a Group with a Semaphore, should I spawn all tasks into one group, or generate a single group for every batch of 3 (or whatever will be the max permit count)?

Update:

After trying some things, I was able to solve the irregular async execution using a Group and a Semaphore:

// ** omitted code see above ** //

    const max_concurrency_count = 3;
    var semaphore: Io.Semaphore = .{ .permits = max_concurrency_count };

    var futures: std.Io.Group = .init;
    errdefer futures.cancel(io);

    for (ranges.items, 0..) |r, i| {
        std.log.debug("Putting range {d} to futures", .{i});
        const chunk = content[r.start .. r.end + 1];
        futures.async(io, downloadAndWrite, .{ io, &semaphore, &local_file, chunk, r.start, start_time });
    }

    try futures.await(io);

    const now = Io.Timestamp.untilNow(start_time, io, .awake);
    std.log.debug("Time elapsed: {d}ms", .{now.toMilliseconds()});

// ** //

It works like a charm. Perfect execution in blocks of 3 every time:

debug: Part downloaded after 500ms
debug: Part downloaded after 500ms
debug: Part downloaded after 500ms
debug: Part downloaded after 1001ms
debug: Part downloaded after 1001ms
debug: Part downloaded after 1001ms
debug: Part downloaded after 1501ms
debug: Part downloaded after 1501ms
debug: Part downloaded after 1501ms
debug: Part downloaded after 2002ms
debug: Time elapsed: 2002ms

The only downside is that I need to catch all errors from my downloadAndWrite function to match Io.Cancelable.Canceled:

fn downloadAndWrite(io: Io, sema: *Io.Semaphore, file: *std.Io.File, chunk: []const u8, start: u64, start_time: Io.Timestamp) !void {
    sema.wait(io) catch return Io.Cancelable.Canceled;
    defer sema.post(io);
    io.sleep(std.Io.Duration.fromMilliseconds(500), .awake) catch return Io.Cancelable.Canceled;
    file.writePositionalAll(io, chunk, start) catch return Io.Cancelable.Canceled;
    const now = Io.Timestamp.untilNow(start_time, io, .awake);
    std.log.debug("Part downloaded after {d}ms", .{now.toMilliseconds()});
}

Otherwise the compilation will always fail with:

/home/lukeflo/.cache/zig/p/N-V-__8AAC_uTRUrhIpzwcTOMDh5tBuMQQ3cDzGRmhAegCJd/lib/std/Io.zig:1042:24: error: expected type 'error{Canceled}!void', found '@typeInfo(@typeInfo(@TypeOf(main.downloadAndWrite)).@"fn".return_type.?).error_union.error_set!void'
                return @call(.auto, function, args_casted.*);

As I understand from this thread its necessary to catch all errors to use a Group like this. That, of course, is not very helpful for error handling since I would need to somehow collect the “real” errors if a download fails and store them in a different way. Or are there other possibilities to coerce an error returned from the async executed function to Io.Cancelable.Canceled while also keeping the original error?

Southporter · February 12, 2026, 2:41pm

You already worked out the answer to this, but I will confirm that sticking them in a single group is the right call here.

Yes, it looks like the tasks passed to the group can only return the Canceled error. If you want to have a more robust error reporting (i.e. show why a specific download failed) then you will likely have to do more to capture the exact failure and map it to a specific task. This is more than Zig’s error handling allows in general, so is not specific to Io. This can be done similar to the Diagnostic pattern in the std library.
Doing it without the Group will make it so that you can get the larger error union, but does require that you handle each Future individually. It’s a trade off.

miagi · February 12, 2026, 6:08pm

Maybe you can have a worker function that does the synchronous task and handles the errors, and call it in async way, the way you want it.
Some snippets that might give you an idea:

/// Worker for hashing function
fn hashWorker(
    comptime T: type,
    reader: *Io.Reader,
    result: *[T.digest_length]u8, // Pointer to store the final hash
    name: []const u8,
) !void {
    std.log.debug("Started hashWorker for {s}", .{name});

    //  Run hashStream and write directly into the result pointer
    const hash = try hashStream(T, reader);
    @memcpy(result, &hash);
    std.log.debug("Finished hashWorker for {s}", .{name});
}

/// Uses a userspace to calculate and return a hash for streams
fn hashStream(comptime T: type, instream: *Io.Reader) ![T.digest_length]u8 {
    var result: [T.digest_length]u8 = undefined;

    var h = if (@hasDecl(T, "Options")) T.init(.{}) else T.init();

    while (true) {
        const data = instream.take(instream.buffer.len) catch |err| switch (err) {
            error.EndOfStream => {
                const buffered = instream.buffered();
                if (buffered.len != 0) h.update(buffered);
                break;
            },
            else => return err,
        };
        h.update(data);
    }

    h.final(&result);
    return result;
}

I’m using threading though, but I have a feeling that the concept might work:

/// Copy function with checksuming logic
fn copyChecksum(
    io: Io,
    buf: []u8,
    src_file: Io.File,
    dest_file: Io.File,
    max_chunk: usize,
    src_p: []const u8,
    comptime HashType: type,
) !void {

    // Init stack buffers needed for hashing
    var src_hash: [HashType.digest_length]u8 = undefined;
    var dst_hash: [HashType.digest_length]u8 = undefined;
    var hash_buf: [Constants.copy_buf_size]u8 = undefined;

    // Open a new FD for parallel hashing and create a reader
    const src_file_hash = try Io.Dir.cwd().openFile(io, src_p, .{
        .mode = .read_only,
        .follow_symlinks = false,
    });
    defer src_file_hash.close(io);
    var src_file_hash_reader = src_file_hash.readerStreaming(io, &hash_buf);

    // Spawn source hash worker
    var src_thread = try std.Thread.spawn(.{}, hashWorker, .{
        HashType,
        &src_file_hash_reader.interface,
        &src_hash,
        "source",
    });

    // Make a copy
    try copy(io, buf, max_chunk, src_file, dest_file);

    // since writer is flushed here, ctx.buf can be reused for dest_file hashing
    var dest_reader = dest_file.readerStreaming(io, buf);
    // Spawn destination hash worker
    var dst_thread = try std.Thread.spawn(.{}, hashWorker, .{
        HashType,
        &dest_reader.interface,
        &dst_hash,
        "destination",
    });

    src_thread.join();
    dst_thread.join();

    std.log.debug("Source file hash:\n{x}", .{src_hash});
    std.log.debug("Destination file hash:\n{x}", .{dst_hash});
    if (!std.mem.eql(u8, &src_hash, &dst_hash)) {
        return error.HashMismatch;
    }
}