KVig - an ACID compliant key value store (and some I/O observations)

Not usable, but it technically can store and retrieve data, even with file failures. Safety is not there yet, because of the lack of testing – don’t use it, it’s not stable, it’s not safe. It uses an own cache instead of mmap and a classic write-ahead-log. No allocation after initialization. It also uses an own I/O abstraction until std.Io stabilizes.

(Undocumented) Code is here: xash/kvig: key-value store in zig - Codeberg.org

I started this project not only because I wanted to learn more about DBs, but also to try out the new I/O approach. It’s a nice fit: a (small) kv-store should be easy to embed in any project with any I/O usage, but also has some interesting needs to be as efficient as possible. Because the store is not the main program itself that can just throw io_uring at every problem, it has to rely on a good I/O abstraction. So a few observations from building an I/O abstraction that is very close to std.Io, but not quite:

  • O_DIRECT is needed as a hint for skipping the kernel cache. Otherwise performance tanks a lot. If the system does not support it, can be skipped safely.
  • fsync is needed as a guarantee for successful writes, with fdatasync as a further, noticeable optimization. For ACID we don’t need all metadata updates (only when file size changes). sync should be separated from write calls, we only want the speed penalty of sync when it is truly needed.
  • multiple writes: the current std API seems to support writes from multiple buffer into a consecutive block in the file. For WAL-merging the target ranges are all over the file, so multiple writes are necessary, with batching being a performance must. Could pass offsets: []usize as a write parameter …

… or, a bit more radical proposal, have all IO operations return a Future. This allows batching from the backends, e.g. uring can submit multiple writes and only yield fibers on await. I did this by changing the IO VTable functions to something like fileSync: *const fn(*anyopaque, file: Io.File) Future(Io.SyncError!void). Blocking implementations can just fill the result of the future directly. Combined with some helper functions (or distinction between File.open and IO.openFile), it can look like this:

// how I used it
const future = file.sync(io);
const result = try future.await(io);
// as a one liner:
const result = try file.sync(io).once(io);

// or an idea to merge the `once` call into the file namespace, e.g.:
const result = try file.sync(io); // just a wrapper for the one liner ^^
// vs
const future = io.fileSync();

There is a non-detectable footgun lurking though, as you can but shouldn’t .once a future multiple times: it cannot change the future because it is *const from being inline (would be no problem with the namespace split idea, though). If there was already a discussion about why this approach wasn’t chosen, I’d be happy for some links. I doubt the current approach with io.async(writePage, …) can be as fast, but I’ll try it out once std.Io.IoUring works again. Changing the I/O calls in KVig is quick.

All in all, if these three things are implemented (and I’d be happy to help out if accepted), the performance hits levels of lmdb, which is to say, it’s very fast, while still being abstract enough for a I/O testing implementation. Cannot wait to implement one – another perfect fit for a DB, to emulate all possible errors and thread races –, but for that I’d like to have decided on a I/O API.

Cheers!

6 Likes

Codeberg is down, so I can’t check the code, but I wonder what do you do in await/once?

There are benefits to this kind of future-based code, but there are also downsides. Everything then gets more complex. Code needs to be aware of asynchronicity. While using std.Io, libraries can be written in a way that is completely oblivious to the fact that they could run in an asynchronous context.

My Io nearly does the same as std.Io. But the API of the implementation (Threaded, Evented, …) does not return the result of an operation, but a Future of that operation. If you look at std.Io.VTableit has something like fileStat: *const fn (?*anyopaque, File) File.StatError!File.Stat – I would suggest to change it to return Future(File.StatError!File.Stat).

The library that does not care about that async/Future stuff could still use a wrapper. My suggested namespace split, for a blocking future call, i.e. const stat = File.stat(io) would just be a wrapper around const future = io.vtable.fileStat(file); return future.await(io); (that’s basically what my once does). If your library cares about potential async-yness, it could use e.g. const future = io.fileState(file);Two ways of I/O, but for the I/O implementation it’s the same, and you can still have the nice api of const stat = File.stat(io).

When having multiple I/O operations I really don’t want the overhead of all the fibers/context switching with dozens of async calls, when I only want to give the kernel a list of write commands.

await/once source; same as std.Io – just that once doesn’t update the future, introducing problems when using a Future after calling once. But allows it to have the .once(io) call just after future construction.

pub fn Future(Result: type) type {
    return struct {
        anyfuture: ?*AnyFuture,
        result: Result,

        pub fn await(f: *@This(), io: Io) Result {
            const anyfuture = f.anyfuture orelse return f.result;
            io.vtable.await(io.userdata, anyfuture, std.mem.asBytes(&f.result));
            f.anyfuture = null;
            return f.result;
        }

        pub fn once(f: @This(), io: Io) Result {
            const anyfuture = f.anyfuture orelse return f.result;
            var result: Result = undefined;
            io.vtable.await(io.userdata, anyfuture, std.mem.asBytes(&result));
            return result;
        }
    };
}

Interesting. I wrote an alternative implementation of the std.Io interface, but I also have a custom interface for the runtime. I was considering having e.g. both read and asyncRead methods in the custom API, one that would seem blocking, one that would return a future-like object. The later one would be mainly for using it in select. But all of this only works if you have an event loop in the background, because e.g. with kqueue, you need to actually run the recv syscall yourself after getting readiness status, plus things like fstat needs to actually run in a different thread. Can’t wait for codeberg to get back up, so that I can check how you do this.