Sorry if this isn’t the right category, I think it might fit better in brainstorming but I can’t post there.
I’m working on an allocator API for Mojo, and I wanted to get thoughts and opinions from (1) other systems programmers and (2) from the community around what I think may be one of the only widely used allocator APIs. Most of my work is in places where the lack of async (or other facility to manage coroutines) causes some fairly substantial problems, so I haven’t been able to use Zig as much as I would like. Instead, most of my time is spent with C++ (mostly when I need to talk to GPUs) or Rust (where I feel the pain of the lack of a stable allocator API fairly deeply), with some C SIMD kernels sprinkled in because WG21 still hasn’t gotten around to adding restrict to ISO C++. So, I need to lean on all of you for how Zig has managed custom allocators since I tend to write what I need and I don’t have people re-using my code most of the time.
Background on Mojo
Mojo takes most of its inspiration from Rust, C++ and Swift, with the end goal for its type system looking like a mixture of Zig and a theorem prover like Lean 4. The general consensus is that we’re going to have reflection API that looks like Zig’s comptime, and aside from struct manipulation many of the capabilities of comptime already exist in Mojo (ex: you can make a List of types, iterate over time, and make a HashTable of the second type to the fifth type). However, Mojo is leaning on algebraic types to integrate better with how most programmers expect types to work, and because they are more flushed out in type theory which gives us a solid theoretical foundation to be able to eventually specify things like “Any function which does not perform blocking syscalls and does not take any locks” as a function argument and have the compiler be able to enforce that. Mojo also has linear types, which can be used to implement pointers that you must eventually free for the program to compile.
If you go look at Mojo, you’ll see a lot of AI branding, because “We can do AI better” is probably the best/only way to get VCs to pay for a new systems programming language. It is a general purpose language that’s designed to deal with heterogeneous hardware, and has the very attractive quality of try to make the compiler as “dumb” as possible. When I say, making the compiler dumb, I mean that the autovectorization pass in LLVM, which has many, many, many issues, is disabled, and the compiler does a more or less direct mapping from SIMD[DType.float32, 16].__add__ to vaddps ymm, ymm, ymm, and if you put that in a loop the compiler won’t get in your way and try to use the potentially slower (like Skylake X) AVX512 version. This is a very nice property for high performance code, since we can be generic over SIMD width and when you say “I want the AVX2 version”, you get the AVX2 version even on an AVX512 capable CPU, with a single implementation of the kernel.
Mojo is also in an odd place where it can do GPU compute and SIMD very well, but we’re mostly calling out to Python or directly to libc for IO.
Things I’m already planning to address
VTables only when you need them
Algebraic types mean that Mojo can easily devirtualize everything that’s known at compile time. This means that if you just want to call out to libc malloc/realloc/free then the allocator struct is a zero-sized type. If you need a bit of state, you can store just that state and not the vtable. If you need runtime flexibility, then trait objects give you a vtable.
Breaking Up Capabilities
One pattern I frequently find myself implementing for IO buffer recycling is to have a MPMC queue of free buffers. Being able to use a consumer handle on a queue like that as an allocator would be great, but the consumer handles can’t deallocate. So, this means that Allocator, Reallocator and Deallocator are separate traits. This does produce a bit of a mess when implementing an allocator since most allocators should do alloc + free just fine, but I’m leaning towards the venerable tradition of “writing reusable library code is harder so that application code is easier to write”.
Parameterizing Allocators by Type
This is one I’m still on the fence on, but I think the best way to handle things like arena/slab allocators which have particular type requirements is to make it so that Allocator[T] means you can allocate a T, and then that is implemented for all of the types the allocator can provide. For example, a slab allocator might implement Allocator[T] for all T which are 64 bytes, which can be done via generics (extension[T] Allocator[T] for SlabAllocator[64] requires sizeof[T]() == 64: ...)
Integrating Ownership
Mojo has a Rust-like ownership system, although it’s designed to be less annoying. This means that the type system can express “Allocator which can only allocate once” as a zero-overhead abstraction. This can be used to turn a buffer into an “allocator” for things which only allocate once with no overhead over having a version which just takes a buffer. With some type system tricks, this can be expanded to any number of allocations.
Algebraic Effects
Some allocators are infallible (like a single buffer that only handles one allocation of a comptime known size), others are not (malloc). In the case where the allocator is infallible, error handling is unwelcome overhead. There are also some cases in distributed systems where it is actually useful to have an async allocator (Distributed shared memory via RDMA or CXL). Later on, this will likely get expanded to handle whether the allocator can block, whether it takes locks, and other concerns for various areas of systems programming alongside the language-level effect system.
Things I’m looking for
I know that there are all sorts of weird things out there, so let me know what kinds of weird things you’ve had to do to allocate memory that Zig’s normal allocator API can’t handle. Anything with a program counter is in bounds (especially weird AI/ML accelerators), and FPGAs are “nice to have” since Mojo can technically use CIRCT, but the meaning of “allocate memory” is a bit hazy when using HLS.
What do you think Zig’s allocator API is missing?
Any thoughts on my current plans?
Thanks for any feedback you can provide!