Zuballoc: A hard real-time sub-allocator

Justus2308 · April 15, 2025, 3:51pm

Hey guys, I’ve implemented a hard real-time allocator based on the OffsetAllocator by sebbbi in pure Zig. It is suitable for sub-allocating any resource and uses the floating-point value distribution for binning.
The ‘preferred’ way to use it is to store the index of the allocation metadata alongside the pointer returned by an alloc function but there is also a working Allocator interface which stores metadata next to the actual allocation.
Let me know what you think!

kj4tmp · April 16, 2025, 1:35am

Welcome to ziggit!

Super cool!

Could you elaborate on the meaning of “hard real-time”?

What makes this implementation especially suited for real time use cases? Because it offers a deterministic latency for allocation?

If the backing allocator is still making system calls, does it defeat the purpose of your allocator?

Justus2308 · April 16, 2025, 8:57am

Hey, thank you!

I think the exact definition of hard real-time is a little fuzzy, but what I mean by it is that this allocator will finish its operations in about the same amount of time every time, without any major outliers.

For example I wouldn’t consider DLmalloc (an allocator used by Android (?)) hard real-time, because although its speed of operation is very consistent most of the time, it postpones coalescing all the free nodes until it actually runs into space problems and then coalesces them all at once which leads to very good average but very bad worst-case performance.

This implementation offers true, non-amortized O(1) time complexity for alloc and free operations. It does this by not using any loops, but instead bit sets and bitwise operations (mainly count-trailing-zeroes) to find nodes that hold enough memory to satisfy a request.

More precisely, the allocator will calculate the bin for a given size by converting it to an 8 bit float and interpreting the float as an int first. This results in a nice distribution across bins with 17 exact bins for small allocations (much less aggressive/wasteful than e.g. power-of-two binning). The bin sizes can be found here.

Then it tries to pop a node from the free list at its bin size. If there is none, it will instead query the bitsets for the next lowest set bit, which indicates that there are free nodes available in that bin. They will be oversized, but they will fit the allocation.

If there is any excess memory (because of an oversized bin or alignment) it will be put into a new node and put back into the free list at the appropriate bin to combat internal fragmentation.

The metadata for each node keeps track of its direct in-memory neighbors and will try to coalesce itself with them when it gets freed.

All of this works without any loops which results in guaranteed O(1) time complexity.

The gpa arg in the init function is maybe a little misleading, it is only used to allocate space for some bookkeeping, the allocations themselves don’t do any syscalls and only split up the memory supplied to the init function via buffer.

I hope this helps, thanks for your interest!