Why is allocator an interface?

tgirod · August 11, 2024, 11:15am

If I understand correctly, it is possible to pass an allocator as anytype, effectively making its type comptime defined. Having an Allocator interface instead allows us to change the allocator type at runtime.

What is the point of swapping the allocator type at runtime? Can anyone give me some use cases?

IntegratedQuantum · August 11, 2024, 11:56am

I think it’s less about the ability to swap the allocator at runtime and more about the other implications of passing around the allocator type at compile time:

If the allocator type needs to be comptime known, all structs that contain allocators (such as ArrayList and most other datastructures, as well as all structs that contain these) need to be generic over the allocator type.

This would be less convenient to use and increase compile time (because there are more instances of all these generic functions).

tgirod · August 11, 2024, 1:00pm

I’m wondering about the convenience and overhead here, as most datastructures already have comptime parameters for the included types ? Would it cost much to also pass the allocator type at comptime ?

IntegratedQuantum · August 11, 2024, 1:33pm

Well in terms of convenience, the main problems are not the datastructures themselves, but the places you use them in. Let’s say you have a data structure that contains a list:

const MyStruct = struct {
    list: ArrayList(u32),
    pub fn init(alloc: Allocator) void {...}
    ...
};

Then either you need to make the entire struct generic, which also propagates the problem one layer up:

pub fn MyStruct(comptime AllocatorType: type) type {
    return struct {
        list: ArrayList(AllocatorType, u32),
        pub fn init(alloc: AllocatorType) void {...}
        ...
    };
}

Or you need to hardcode the allocator type:

const MyStruct = struct {
    list: ArrayList(GeneralPurposeAllocator(.{}), u32),
    pub fn init(alloc: GeneralPurposeAllocator(.{})) void {...}
    ...
};

This is the convenient choice, but makes it more difficult to change the allocator.
And this includes even changing the config of the allocator. Like imagine you found a memory leak, but the stack trace was too short and yout want to momentarily increase the number of stack traces captured by the GeneralPurposeAllocator. Then you need to go into 100 different files and change the allocator type to GeneralPurposeAllocator(.{.stack_trace_frames = 15})

As for the cost, it’s difficult to estimate the actual impact. If you mostly use one allocator for everything, then it will be negligible.
But if you use many different allocators and arenas(note that the arena type would also be generic over its child allocator type), then the amount of generated code might easily explode.

tgirod · August 11, 2024, 1:39pm

Right, thanks for the clarification !

LucasSantos91 · August 12, 2024, 2:33am

If this becomes a problem, you can always switch to a type erasure (like the current std.mem.Allocator) and bring down the code size. Or you can limit the amount of allocators types you have, like define a set of 3 allocators that are allowed in your codebase, and you pick whichever one fits your use case the best, even if it’s not the perfect one for the job.
I agree with @tgirod, in terms of generated machine code anytype is strictly superior to what we have today. In most cases you know exactly which allocators you are going to use in your codebase, and leveraging that information at compile-time generates better code. In the worst case scenario, you can use the std.mem.Allocator and you’re gonna get exactly the same code that we have today.
The only real problem of anytype is that you lose ergonomics. It’s harder to read and tooling gets confused.
I talked about this before here and here.

chung-leong · August 12, 2024, 4:17am

The most obvious use case is debugging. Depending on runtime conditions, we might or might not want to swap in an allocator with verbose reporting.

cancername · August 12, 2024, 8:38am

This isn’t always the case: large code sizes increase cache misses.

matklad · August 12, 2024, 10:32am

This works with comptime parametrization just fine — you can use an allocator that internally checks a runtime flag.

More generally, comptime parametrization is strictly more powerful than runtime parametrization, because one can always pass a VTable as a comptime parameter. Eg, if you have something like this:

fn uses_allocator(allocator: anytype) !void {
    const ints = try allocator.alloc(u32, 1024);
}

which allows the allocator to be comptime, you could also define

const DynAllocator = struct {
   context: *anyopaque,
   vtable: struct { alloc: *fn(...), free: *fn(...) }
}

and then pass that everywhere:

const my_allocator: DynAllocator = ...;
uses_alllocator(my_allocator)

That the pattern the relevant Rust APIs are build around: say, a Write trait can be used both as a bound for comptime parametrization, as well as a runtime vtable-based dyn Wrtie value, and it holds that dyn Write: Write (On the left, there’s a specific object with an opaque context pointer and a vtable. On the right there’ a constraint on types)

mnemnion · August 12, 2024, 9:06pm

I do think this points at a missing primitive in Zig, but it isn’t an easy problem. I’ve read all the interface-related issues I could find, and agree with the reasons they were closed. Mostly they suggest recreating what C++ has, or providing a more ergonomic version of patterns which Zig is more than capable of providing. A few propose something which just isn’t possible, or if it is, they didn’t come up with a way to do it.

What we want is something which can be used dynamically, but which can and will compile statically when that isn’t actually necessary. @AndrewCodeDev did some empirical exploration in one of the posts you link to, and showed that under some circumstances the compiler can optimize through the use of an interface, but it’s safe to say that this is delicate, and a lot of common Allocator patterns, like storing the pointer in a struct, are going to make that infeasible.

It would take a really good proposal to get this, and I have a draft… which is not that really good proposal, so I’m going to sit on it until it is. If that ever happens.

There are other problems: you can’t have a struct which is compatible with other structs if you use an anytype Allocator as a field on that struct, with a similar problem for wrapped allocators like StackFallbackAllocator. In this context anytype is viral, it starts to propagate. It’s a great tool, anytype, but I wouldn’t want to replace every function parameter in my programs which takes a struct with an Allocator pointer with anytype.

I guess this is the same problem, but at another level: light use of anytype is fine for a function parameter, but it gets gnarly when you need to re-specialize structures, because it spreads. And the resulting monomorphization isn’t free, it’s probably fine for a small/disciplined codebase, but the more comptime-defined fields a struct offers, the more possible permutations of every concrete member function there are, and that can get pretty bloated if you’re not careful.

I think the std model for Allocator is the better of the alternatives, for now, since it’s possible to specialize on one allocator type if measured performance needs dictate doing so. My major issue with it is that there’s no way to extend it, which makes fast-path-zero in particular a big overhaul to make possible, see #20683 for a discussion about that possibility.

LucasSantos91 · August 12, 2024, 9:51pm

I think you’re referring to the fact that if we define:

fn S(comptime Allocator: type) type{
  return struct{
    allocator: Allocator,
  };
}

const S1 = S(Allocator1);
const S2 = S(Allocator2);

Then S1 and S2 are incompatible, that is, you can’t have a homogenous container that contains both S1 and S2. That’s a fair assessment, but it should be noted that this is not a limitation of the language, but a limitation of computers, and we do have tools to solve this, each with its own trade-offs.
For instance, If S1 and S2 have different sizes, how much space should the container reserve for its elements? You can take the biggest size of the two, or allocate the elements somewhere else and store pointers.
Also, how would know if a certain element belongs to S1 or S2, so that you can call functions on it? If you know you want your Ss to be stored in a homogenous container, you can just define your Ss to use the same allocator, like the type erasure std.mem.Allocator. Or you can have your Ss use a tagged union allocator.
What I’m trying to say is that this problem is not an accidental complexity, that is, caused by the language, but an intrinsic complexity. The current design of std uses a type erasure, and therefore it forces one very specific solution onto everyone, rather than letting each user decide on which of the above solutions would work better. If you think the type erasure is what would generate the best code for your use case, you could just do S(std.mem.Allocator). But if in your project, a tagged union allocator would work better, you could go with that. In the vast majority of cases, users would have a single main allocator type that would be used in most of the code base, and only certain niche functions would use specialized allocators. By passing the specific type of the allocator at compile time, there would definitely be performance gains. And, of course, if your code is suffering from code bloat, to the point of decreasing performance, you can decrease the number of allocator types in your code, for example by using the type erasure, recovering the performance.
In terms of generated machine code, I guarantee that the anytype route will always offer the best performance. If a concrete type offers better generated machine code in a specific place of the codebase, you can just use that as the anytype parameter in that specific place. If a concrete type needs to be used in the entire codebase, to bring the code size down, then you can do that too. The anytype allows any solution, at any granularity.
The only downside of anytype is ergonomics, in its various forms. With anytype readability sucks, error messages suck, compilation times suck, and your structs will need be parameterized, which causes all those ergonomics losses to propagate.

mnemnion · August 12, 2024, 10:42pm

I agree, this isn’t a flaw in Zig, it’s just a direct consequence of how anytype works. Mainly this was an illustration of the advantage of the fat pointer Allocator we have now: flexibility and genericity, both for composition and for container types. The downside is, as you say, performance.

Sure, either choice, we can implement the other in userspace. We have fat pointer Allocators, we can get a perf boost by using anytype to pass a concrete implementation instead. If we had comptime-generic Allocators, we could get dynamics by making a fat pointer and vtable out of it, as @matklad illustrated.

This is all gesturing at the missing primitive. Somehow, we want to provide the compiler with enough information that it can make these choices for us, the same way we trust it to inline functions when that’s correct.

It isn’t about having to make the choice, to be clear, if it were then we could just git gud and forget about it. It’s because the better choice depends on the dynamics of the program, the actual surrounding usage, and what we want ideally isn’t one or the other, it’s both, depending. Again, much like inlining: I force the issue sometimes, when it seems important, but mostly we let the compiler take care of it. Reference semantics for constant parameters is another one, aliasing issues aside it’s a nice mechanism.

C++ promised that the notorious ‘sufficiently smart compiler’ would handle this for virtual functions, and it didn’t turn out that way. I don’t want to get on a tangent about C++, but I will say that I don’t think that was a law of nature, it was a consequence of insufficient design.

But this missing primitive isn’t a flaw in Zig either, I don’t know a language which has a fully satisfactory solution to this. I do think it’s in the class of problems which admit to a solution, however.

Hard agree. I don’t know that there’s any difference of opinion here, but if there is, it’s emphasis: I think the problems you refer to here are bad enough (especially struct propagation) that the Allocator solution we have is optimal for the language we have. Code with a measurable need for higher performance allocation can get it, with an anytype that embeds the no-longer-v table directly into a struct which provides the necessary function to fulfill the rest of the interface. It’s basically a cut-and-paste job, well worth it for code which needs it.

I also think that the ergonomics of comptime and anytype can be improved considerably, and this is an easier problem than solving the Interface Question satisfactorily. But it’s a bit off topic for this thread.

Sze · August 12, 2024, 11:54pm

I wonder whether Profile Guided Optimization, or other similar optimizations could possibly allow the compiler to switch between more or less static versions, basically recovering a near static performance generated version, where the profile says that only one or a few different allocators are ever used within certain code pathes.

If something like that could be used to get the different approaches a lot closer in terms of performance, I would find that preferable. While I am willing to use explicit concrete allocator types directly, it would be good if that was more of an exception that is only needed in the most extreme cases, where ideally for the other 90% cases the compiler could find a good tradeoff automatically.

Personally I haven’t done the work to find out how big the difference in performance actually is, it might make sense to have a bunch of benchmark programs to measure the difference.

Currently I can’t tell whether it is worth it to invest the time (I guess it mostly depends on what else you could accomplish with that time), basically it would be good to have more automatic testing of different variants, without needing to invest a whole bunch of time running manual experiments.