Timeouts in std.Io

I’ve been integrating zio into more real-world projects, now I have HTTP client and server, NATS client, PostgreSQL. Whenever I do it, I think about migration path to std.Io and there is a repeating pattern of problems, timeouts.

Pretty much any production server needs to have timeouts on everything. Blocking forever is simply not an option. In my view, that’s one of the differentiating factors between a toy and a system, that somebody is able to support in production environments.

And it’s where the interface lacks. I’m looking for options how to make it usable for server applications. I’m happy to help pushing the change forward, as it’s really important to me.

The easy targets are timedWait on Conditon and Event. The futex operations in the vtable already support timeouts, so it’s just a matter of exposing that.

Network timeouts are harder, due to the difference between blocking syscalls and event loops. In blocking mode, you typically set timeouts on the socket directly and the OS respects them. So it’s a per-socket setting. In the evented world, you typically set timeouts for each operation, as they are implemented as timers. In the case of io_uring the timers can be directly linked the operating, but it’s still per operation. I’d really like to add timeout to e.g. netRead but that might he hard to implement in the threaded version without a separate syscall per operation.

For example, netConnect has timeout option, but it’s actually not implemented in the threaded version.

The alternative would be the change interface to use “fat” handles, containing not just the file descriptor, but also timeout info. In the case of blocking, it could be set after conmecting, for evented it would he used per operation.

Or maybe even just netSetReadTimeout and expect the evented impl to remember it somewhere internally.

Anybody with ideas about this?

6 Likes

i also find more convincible to use socket wrappers (you called it fat handles)

additionally to timeout options etc it worth to use optionals

something like socket: ?std.posix.socket_t = null instead of raw fd

Another case where timeouts are a bit of a struggle is when dealing with child processes. I have a program which spawns a child bash process that it runs a command with, and reads and processes the stdout of the child until either: the process ends, 10k bytes of stdout are processed, or 3 seconds have passed.

I may not be doing it optimally with what is currently available, but it is the best way I could figure out how to do this. Basically I

  • Spawn the process
  • Set up an atomic bool to track if the process has been killed
  • Spawn a concurrent task
    • The concurrent task sleeps for 3 seconds, updates the bool to true, and then kills the process
  • Process the output from the child until EOF (either the child has closed its stdout (likely exited), or the child has been killed by the concurrent task) or 10k bytes
  • If 10k bytes have been reached, cancel the previous task to kill the process, and kill the process immediately
  • If the process has not been killed (as per the previous atomic bool), wait for it.

The problem is that you can kill a process that has ended via a wait, but you cannot wait for a process that has been killed via a kill, else the program will panic due to a null pointer access.

Maybe this is truly the best way for the standard library to handle this, but it was really awkward for me to land on this solution.

I’m sorry, but how is this relevant? Where is the timeout handling in that code?

just look into the code, timeouts (and all the other events are handled by the EDSM engine)

here (every thing is handled here)

At the Future API layer (async, concurrent, await, cancel), you can give anything a timeout by doing a select with a timer. If the timer runs out, cancel the other stuff. This requires concurrency, which may impose limits on portability of the code.

At the lower-level operation API layer (see the changes that just landed in 2b19134c86223236b3fffc1360577a31d0251604), all operations can be given a timeout when using Batch.awaitConcurrent. All the networking and file system functionality is expected to be migrated to become part of the Operation/Batch API layer. Using this API layer requires operation-level concurrency, which is more portable than future-level concurrency, but still less portable than not using a timeout. Note that Batch.awaitAsync is always safe to use and guarantees no additional failure modes are imposed.

2 Likes

Will std.Io.net.Stream.Reader use this API? I’m mainly worried that if it depends on awaitConcurrent, the reader might avoid it.

I don’t see a reason to change net.Stream.Reader. Do you?

Edit: I think I understand the confusion here:

All the networking and file system functionality is expected to be migrated to become part of the Operation/Batch API layer.

What I mean by this is specifically, functions will be deleted from Io.VTable and instead become an Operation, like this.

net.Stream.Reader is not part of the VTable, it is a higher level API.

Yeah, I know this. :slight_smile:

The question is whether the reader and writer will have a way to configure timeouts and have those respected across reads/writes.

as to writers i do no think write timeout is actually needed