Using your example and adapting it to http (dusty), then would I have to beforehand associate each route with a certain number of tasks (256KB/task) ? Something like:
for (0..num_tasks1) |_| {
try group.concurrent(io, root_handler, .{io});
}
for (0..num_tasks2) |_| {
try group.concurrent(io, register_handler, .{io});
}
Should I use a different group (std.Io.Group) per route ?
Then when I listen (try dusty_server.listen(address);) zio/dusty would automagically do the load balance ?
More noob questions:
Is it possible to add corroutines dynamically while dusty is listening ?
Is it planned for the future dynamic add/removal of routes in dusty ? For ephemeral routes.
Dusty automatically spawns tasks for each connection, you don’t need to do that yourself. See this minimal example:
The default runtime config, as shown in the example, will use one OS thread total, but one coroutine per active connection. If you have slow handlers and have 100k active requests, it will run 100k coroutines.
For dynamic routes, I’ve not considered it, but I don’t see it as super important, you can always achieve this yourself using wildcard matches in the router and the handling the sub-routing yourself.
Perhaps a silly question, but since this thread has gained some talk about memory usage/active requests, I was curious, is there way to see how the dusty application is doing at runtime? E.g., to know if you’re getting saturated by requests? It doesn’t have to respond to that of course, just be able to indicate it in some fashion. That way, people could know to spin up more nodes behind a load balancer. If this isn’t a foolish question, my only thoughts were if the internal logic logs something or if it exposes a socket over which a status query can be sent.
Not at the moment, both dusty and zio need to expose some metrics.
For dusty, it’s mostly informative, how many requests are we processing, what are the status codes, duration histogram, etc.
For zio, it’s more interesting, it can do things like task latency between it’s schedule time and when it actually runs. If that is too high, you might need to run more threads, for example. There are many metrics that can be useful that could be exported from zio.
This seemingly works on my machine with both zio and Io.Threaded, which feels unsurprising.
However, the devil is in the details:
If I give the NONBLOCK flag to inotify_init1, the readStreaming returns some WOULDBLOCK type error. Does this mean that if the Io.Threaded threadpool has just one thread, the blocking call would freeze everything else? Well no, because Io.concurrent would return the ConcurrencyUnavailable error earlier. But can the readStreaming actually yield? Will zio always work with a “foreign” fd? Is Io.concurrent even the right mechanism for this, or should I spawn a thread?
If I try to use a file reader (either streaming or positional), call takeStruct and discardShort, this only works with zio, not std, and only with NONBLOCK on the inotify_init1. In other cases (blocking zio or any std) the takeStruct just doesn’t return.
It works because on Io.Threaded, it uses just blocking read, and zio uses io_uring, which is also fine with the blocking fd.
It’s not universally correct, but unfortunately it can never be, because Io implementation might always need to have a file open in some way, or even not use syscalls at all for some mock implementation.
If you wanted it to be more correct, I’d open the fd in non-blocking mode, and then use Io.Batch. It will actually use poll + read on Io.Threaded, and it’s better if you use the epoll backend on zio for some reason.
Correct, once you have succeeded using io.concurrent on Io.Threaded, you have your own thread and can block it in whatever way you wish.
One thing to note, Io.Threaded will never yield, it uses blocking syscalls, or poll loops, it always blocks the thread.
On the other hand, zio and Io.Evented (in the future), will always yield during readStreaming, but with blocking fd, you could potentially freeze some implementation.
Zio on io_uring will always work with any fd. On the epoll/kqueue backend, it’s more complicated, bu you probably don’t need to care about this.
Given that this is a pollable fd we are talking about, I’d say io.concurrent and readStreaming is fine, but if you wanted to be absolutely sure it’s going to work with any implementation, you would need to spawn a thread use std.posix.system.read in a loop, and then maybe use Io.Queue to post events.
This is surprising to me and I’ll need to test it, I’d have expected it to work even through the streaming reader.
I tried this out with our Datastar 0.16 web framework, and yeah, just works. The whole framework does some gnarly things with SSE + pubsub + timers + file watchers + queues. It gives Io a decent workout across the board.
The neat thing is that there are zero changes to any of the library code to go from threaded to coroutines. Just a build time option to trigger 1 line of code in the user’s app main() function, and everything works.
(Works = Linux, Mac, FreeBSD across all release modes)
It’s slightly faster than threaded on the test benchmarks, and significantly better latency / tail latency numbers. (Assuming you set executors to .auto and use multiple workers).
So have promoted that experiment branch to master, and made it a first class recommendation for building Datastar apps with the framework.
Hi, I wanted to give it a try with the Io implementation, but I have an issue with they way the runtime is initialized. Is it necessary that the initialization allocates the Runtime struct? I require to hold the struct myself so I can mock out the Io implementation in tests. The struct being on heap makes this impossible for me.
To better illustrate what I mean I’ll show you start of my mocked io function:
I can do this with std Io implementations, but not with zio. This is not possible to do if I do not hold the Runtime struct in my TestCtx. I also assume I can not just copy the Runtime because surely the init would not allocate it on heap if I could.
Yes, runtime needs known address from the start, it can’t be copied. I could have changed it to a different pattern that requires passing undefined Runtime ptr to Runtime.init, but that would make it more complex to use and easier to mess up.
However, I’m not sure why do you need to copy zio.Runtime for your mock. Why not just have:
const TestCtx = struct {
parent_io: std.Io,
}
And then use *TestCtx as the userdata for your mock implementation.
Because in what I am testing I call io.netLookup where io is provided by the runtime, then I copy the provided io and just replace individual functions with my own implementation. I can not pass different userdata without implementing the full io interface in my TestCtx, which I would prefer not doing, even if it was just delegating to another implementation.
I’ll probably change the naming later, use create/destroy for the default case, and allow init/deinit, to be more consistent with the mainstream conventions, but for now I want to keep it backwards-compatible.
So I have tried it and didn’t get results that I was hoping for. For context I have a little Zig resolver shared library and wanted to get rid of additional threads. I tried test hitting local system resolver. I only use .netLookup and Io.Select in my program. The zio implementation with max_thread=1 was at least 2x as slow as single-threaded Io.Threaded. I had to raise it to around max_thread=12 to match the single threaded speed. There might be some issues with how I test it or with my program, but I guess I will stick with Io.Threaded for now because it works fine for me. I will try again later when I have different workload to test it on.
The issue with my test is probably that I am hitting system resolver with same domain over and over, which means it is cached and there is not much waiting for IO. The results would probably be different in more realistic scenario though.
That makes total sense. When using getaddrinfo, it would be hard to beat Io.Threaded, which can just directly call it.
On macOS and Windows, I use the native async resolvers, so those should work better, but probably still slower than just calling GAI.
Btw, writing a custom DNS resolver for Linux is on my short-term TODO list, but I’m taking the time with the design, because I want it really well done in terms of caching.
Do you mean whole new program? I am wondering if just doing in zio’s netLookup what Io.Threaded does would improve zio’s lookup dramatically for linux. I don’t quite understand why netLookup is a io vtable function and not a function that just takes any io. It seems to me like that logic should be reusable… I think I will try doing that at some point next week.
No, internal implementation of netLookup. Completely transparent to the user. Just implemented without depending on the OS/libc resolver. On Linux/io_uring it can be done completely async. And with good caching, it possibly means 0 syscalls for repeated lookups.
I’ve released version 0.13 with the async DNS resolver mentioned in the posts above. It’s pretty fast, beating any other solution I’ve tried. And since DNS resolution was the last thing that required the auxiliary thread pool on io_uring, I was also motivated to support -fsingle-threaded. Both the DNS resolver and single-threaded mode are only recommended to use on Linux with the io_uring backend. On other systems, using the thread pool is better.
EDIT: actually, Windows with IOCP backend is fine as single-threaded as well, file ops are async, DNS is async, so it should be ok
Wrote a post about timeouts in zio. Explains zio.AutoCancel, which is the most general timeout mechanism in the library. This approach is something that can’t be expressed in std.Io directly, so the custom API still matters.
I’ve released another version with what I’d call “last mile” changes. The key changes are:
Support for sendfile-like operations for sending file over network sockets. These are currently emulated, but they still do better than the naive loop, because it runs read and write concurrently. Platform-specific variants on Linux, Windows and FreeBSD will be done later. Those are the only OSes that do support async sendfile. This is unfortunately not wired via std.Io because std.Io.net.Stream.Writer doesn’t support it, it has an unimplemented stub that does not call the vtable. I don’t want to overload the guys with my PRs, so once some of them get merged, I’ll open a PR to fix this in std.Io.
Support for debug_io, which you can expose as std_options_debug_io and it will make std.debug.print and std.log calls async.
Support for resolve_beneath. This is a security feature, so unlike stdlib, if you use the flag and the system doesn’t support it, it will fail. This behavior is controllable with the resolve_beneath_mode build option.
Support for file locking. I’ve been avoiding this, because there is no good async way of doing it, but I’ve settled on the non-blocking OS calls with sleep loop.
After some fixes, I’ve re-enabled task migration, so for example if unlock mutex, the task waiting on the mutex will get scheduled on the same thread, avoiding cross-thread wake up, which is like 100x slower. This is controllable at runtime using allow_task_migration, defaults to true.