How to make good use of new std.Io in a storage engine?

I’m working (since 0.14 … ouch) on a storage engine/key-value store and have looked into it, if I could use Io for that. There are two main problems which arise in the commit protocol[1]:

  1. Some commit protocols(mine included) assume that one transaction only ever executes on one worker, because this makes it way easier to implement (and somewhat counterintuitively faster) because you don’t have to keep a complex dependency graph. This isn’t possible with the current work-stealing approach, as I understand it.
  2. Usually for most commit protocols you expect that all transactions, that were executed on the same worker, are already committed. This makes it again easy to track that these dependencies are committed(for me basically just a lamport clock). This isn’t possible by passing it with an argument.

If I could make wishes I would love there (1) to be a configuration to disable work-stealing - this might also be a good thing for other endeavours where you want precise control. And (2) to be able to get some notion of a worker. For the threaded implementation this is just a worker id. For fiber based ones this could either be the fiber id if they are never allowed to switch workers[2] or both the fiber id and the underlying worker id[3]. I haven’t really thought about how this would be handled with stackless coroutines but from intuition just the worker id could be enough. And (3) if I’m really impudent some way to control on which worker something executes.

The first 2 are basically a must for me and the 3rd is more wishful thinking.


  1. Which decides whether a transaction can commit based on the dependencies it has and the already committed transactions. ↩︎

  2. Which would be quite bad I think because this would basically introduce unnecessary allocations for resource management. ↩︎

  3. Assuming fibers can switch workers, when they are finished/unused. Which at least for me isn’t work stealing because the fiber isn’t working at the moment. ↩︎

2 Likes
  1. the work stealing is a detail of a particular implementation, it would be possible for an implementation to allow you to disable that. But for the most part your code should avoid relying on a particular implementation.

  2. while the interface does intend to be as useable as possible for compute tasks, it won’t be able to cater to every use case. You will still be able to use more traditional approaches like OS threads if you need them

What exactly is a worker in this context, and what makes it so you need to know which worker did what? What unnecessary allocations do you think work stealing would introduce to your code?

If a worker fundamentally only consists of state, then it need not be tied to a thread/fibre/etc. You could have a shared collection of worker states, and each task assigned to a particular one.

Perhaps you are thinking too fine-grained with your application of the interface.

For the most part, uses of thread pools can be directly replaced by concurrent.

1 Like

Yes I want to avoid to rely on a particular implementation. But the problem is that one can’t “query” if an implementation is work stealing or not. I suspect that under most circumstances a work stealing one is actually the better more performant approach and that’s also why they’ve chosen this one. But there are special cases were one just can’t allow work stealing and a commit protocol is, at least for me, a pretty obvious example.

True. Though than one can’t really have the nicety of the user being able to choose what concurrency model they have.

This is kinda a long one and I try to simplify a bit. I will also go into some tangents which aren’t really needed but this topic is quite deep. In general I’m speaking about a transactional(ACID) system, where one transaction can have multiple operations inside it, think get, put, delete. It also has a WAL and so on.

Now the user does some stuff and eventually wants to commit their transaction. Of course (based on the isolation level) this can either succeed or fail, for instance if somebody changed a value that transaction read. Or they need to wait until something else commits (a dependency). The commit protocol does this checking. Of course there are a bunch of ways to do it: 2PL, timestamp based, or keeping a DAG. All methods have their advantages and disadvantages. In the naive case you have to keep the entire history of operations in memory and then test whether you can commit or not. This is of course slow and memory intensive. But it allows work-stealing and also for a worker to interleave the execution of different transactions.

A nice simplification is to disallow for a transaction to be executed on multiple workers. So all operations of that transaction will be executed by one worker. The key thing here is that you don’t have to synchronize access to the transaction’s own state or its logging path.

Each worker can have its own private WAL file and log buffer. When a transaction commits, it only needs to flush the buffer of the worker it belongs to. If I allowed work-stealing, the logs for that single atomic transaction are now split across at least two physical files. To make that durable, you effectively need a 2PC protocol between OS threads. That overhead completely kills the advantage of using things like modern NVMe SSDs because you’re back to heavy inter-thread coordination.

I use a logical clock (basically a Lamport clock) to track dependencies. If I read a page that was last written by Worker X, my transaction now depends on Worker X flushing its log. If my transaction is Worker X, this dependency is trivial and “free.” If transactions are moving between workers, tracking who needs to flush what becomes a many-to-many problem that is significantly harder to solve without global bottlenecks.

In my code I just have a pool of threads, which I call workers. These workers internally have queues of fibers (ready, waiting_io, waiting_lock, finished, …). They pop fibers from the ready queue and execute until they are finished, they need to do io, or wait for a lock or something similar (a blocking operation) and push them on the respective queue. Each fiber executes one operation of one transaction and is then regarded as finished(and can be removed). The thing is that finished fiber are (at least conceptually) in a global cache where each worker can pull from. So if one worker needs more fibers it can easily get them without allocating. And when it doesn’t need them anymore it can just return them to the global cache.

I hope this is somewhat clear. I not just ask. I will eventually (hopefully soon™) make this code public and maybe then it will become clearer.

This was more that, when I disallow fibers to switch(both active, and finished ones) a worker would more likely need to allocate new fibers instead of just pulling them from the global cache.

1 Like

I don’t know because I still need to get more familiar with the new Io in practice, but from what you describe I am wondering whether you would be better off handling your transactions at a bigger granularity, maybe you could use the batch api to chunk together multiple operations and then have multiple batches that are coordinated somehow (instead of having many fine grained futures internally).

Overall I am somewhat skeptical that your very particular usecase really should be wrangled into using fine grained Io directly, it seems like you have much more domain knowledge about what can be done efficiently and trying to use Io to implement it seems like adding a whole lot of general flexibility into the mix that hinders you from actually using that knowledge from doing the concrete known thing, you already know how to organize more effectively.

I think in the end the user using a storage engine just wants to send it some data and know that it was saved or not, so I think using Io to communicate with the engine makes sense, but allowing the user to swap out the internal mechanisms may be counter productive, if the alternative would be that the implementation becomes much more straightforward and optimized by being able to make more useful assumptions that a general Io implementation isn’t allowed to make.

So I would explore whether you can separate between the external interface used to communicate with the user and the internal thing used to coordinate the work being done. But that is just my impression from what you wrote, I don’t have the full details and I haven’t really worked in that area yet.

3 Likes

Unfortunately this isn’t really possible because you don’t know what operations need to be done for each transaction because you can just have an if in the usage code and do something else.

That’s also why I currently have my own implementation for a storage backend and fibers and so on. I would basically just need the two things(no work stealing, and some notion of a stable executor id). But it’s still early days for Io and I’m somewhat optimistic that this will discussed in the future anyway, so I just keep quiet.

I could also do some hacky ways like using rsp as a key to get some fiber local state but this is just bad.

Yes that’s the key thing. On the one hand I know what want to do and what performance I will roughly get if I do it myself. On the other hand it would of course be nice to play together with the language and the general ecosystem.

That’s also a thing I’m thinking about and it should be possible I think. So the user hopefully doesn’t really need to care and can just use it and everything works.


We are getting further away from the topic of this thread. Could someone maybe pull this out?

1 Like