Choosing an Allocator

Hi, first of all, I would like to say that I’ve read the section Choosing an Allocator in the language documentation. I also found the following post, which answers most of my questions:

But a few questions remain.

Let me cite the language documentation first and throw in some questions:

  1. Are you linking libc? In this case, std.heap.c_allocator is likely the right choice, at least for your main allocator.

Why is std.heap.c_allocator a better choice than GPA (assuming libc is linked)?

Side question: Why is the GeneralPurposeAllocator (GPA) shown as DebugAllocator when I click on GeneralPurposeAllocator on this page?

  1. Need to use the same allocator in multiple threads? Use one of your choice wrapped around std.heap.ThreadSafeAllocator.

When would (or should) I use the same allocator in multiple threads? Is there a problem to create a GPA for each thread?

  1. Is your program a command line application which runs from start to end without any fundamental cyclical pattern (such as a video game main loop, or a web server request handler), such that it would make sense to free everything at once at the end? In this case, it is recommended to follow this pattern:
const std = @import("std");

pub fn main() !void {
    var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    defer arena.deinit();

    const allocator = arena.allocator();

    const ptr = try allocator.create(i32);
    std.debug.print("ptr={*}\n", .{ptr});
}

When using this kind of allocator, there is no need to free anything manually. Everything gets freed at once with the call to arena.deinit().

Yeah okay, but… what about the backing allocator? In that example the std.heap.page_allocator is used, which allocates at least a whole page, even if only a tiny bit of memory is requested. Does the ArenaAllocator initialized with the page_allocator then also make one syscall per allocation, or is there some additional managment (other than freeing everything at the end) added on top?

Also, why would I need to free everything at the end anyway if I’m writing a command line application and the process is terminated by the OS in the end?

  1. […]

Same question regarding choice for a backing allocator applies here.

  1. Are you writing a test, and you want to make sure error.OutOfMemory is handled correctly? In this case, use std.testing.FailingAllocator.
  2. Are you writing a test? In this case, use std.testing.allocator.

:thinking: What does it mean that “OutOfMemory is handled correctly”? That my test should handle it correctly? Or that the testing framework will handle it correctly? When to use which of those two allocators in tests?

  1. Finally, if none of the above apply, you need a general purpose allocator. Zig’s general purpose allocator is available as a function that takes a comptime struct of configuration options and returns a type. Generally, you will set up one std.heap.GeneralPurposeAllocator in your main function, and then pass it or sub-allocators around to various parts of your application.

Okay I get that. However, what are the cases where I create more than one GPA? What if I have a server that creates a thread for every request. Should I rather wrap a single allocator (e.g. a GPA) and then wrap it with a ThreadSafeAllocator and then use that allocator in every thread? Maybe with an ArenaAllocator around it, created in each thread? Or should/could each thread create an own GPA? What are the pros/cons?

Yet another question: Are there cases when I should use std.heap.page_allocator?

Thanks a lot in advance for helping me understand allocation better. :folded_hands: Maybe other newbies have similar questions when finding the right way to go when it comes to picking an allocator.

2 Likes

Because it is faster.

Since the GPA was slow, but had useful debugging features, it has been renamed to DebugAllocator, emphasing that that’s its purpose.

You could do that, but then threads won’t share the same memory pool. This could potentially increase you program’s memory consumption, but it would be faster since there would be no contention. It’s the usual speed vs space tradeoff.
Also, your GPA or whatever allocator you’re using could be drawing from a single parent allocator instead of directly from the OS. If that allocator is not thread-safe, you’d still need to synchronize it some way.

The arena does extra management. When it needs to call the backing allocator, it requests extra memory, and distributes it as necessary. Also, using the page allocator doesn’t necessarily mean you’re going to always do a syscall. You can use resize, which is very cheap for the page allocator, and avoids the syscall.

Some targets, like freestanding, don’t do automatic cleanup. But I agree that if you’re targeting an OS, this is usually the best option.

It’s referring to a piece of code that will use an allocator and, if allocation fails, it handles it in some way. For instance, if it runs out of memory, it cleans up some cached stuff. Your test can populate the cache, then call the piece of code with the failing allocator. If the cache gets properly cleaned, the test passed.

These questions are way too broad. It’s all about tradeoffs, and everything depends on the architecture of your program.

8 Likes

I would like to note that this guide is also a bit outdated. Recently the GPA was renamed to DebugAllocator and a new faster general purpose allocator std.heap.smp_allocator was introduced which, according to investigate SmpAllocator performance with respect to other popular allocators · Issue #12484 · ziglang/zig · GitHub, seems to be faster than the libc allocator, so that advice is likely obsolete.

Every time you want to free data in another thread than it is allocated. For example if you have a thread pool that consumes heap-allocated task data that needs to be freed after the task is finished.

Also note that the DebugAllocator is thread-safe by default, so you don’t even have to think about this most of the time.

3 Likes

So is it still slow? Is it intended to be slow at the expense of additional safety/debugging purposes? That would make it the opposite of “general purpose”, I guess?

Is the name GeneralPurposeAllocator deprecated then?

I think that means “yes” to both my questions.

But smp_allocator doesn’t use any “backing” allocator it seems.

Maybe just use smp_allocator everywhere if I’m fine with some memory overhead? Is that allocator thread safe, or do I have to care myself for thread safety?

I assume that smp_allocator is yet work in progress?

Is there any way to get this “extra management” (on top of a selectable backing allocator) without arena/leak tracking?

Ah, that clarifies the example. Maybe that’s worth noting in the guide.

So the failing allocator is like a dummy allocator that always fails?

Oh, I was missing that. I guess that makes it also slower (when not needing that thread safety).

It is thread-safe and as far as I’m aware it is fully working (well there is `GeneralPurposeAllocator` reports `error.OutOfMemory` despite having more than enough free memory. · Issue #18775 · ziglang/zig · GitHub, but that applies to the DebugAllocator too and requires quite extreme conditions)

I would generally suggest to still use the DebugAllocator in safe release modes to make use of it’s leak checking and other debug features. Especially if you are not used to manual memory management it can be really helpful.
But still it depends on your application, in the case of a short-running command-line application you might not even care about leaks and stuff.

Of course, but you can disable it in the config DebugAllocator(.{.thread_safe = false}).

1 Like

It started out with one idea, but it ended up morphing into something else. Yes, that name is deprecated. It is now intended only for debugging.

I don’t understand the question. The extra management done by the arena is just about requesting more memory, to reduce calls to the parent allocator, and about keeping track of its allocations so that it can free them. If you remove the second part, all that is left is requesting more memory than your current needs, for later allocations. You could do that manually, I guess. Also, the ArrayList requests more memory than it currently needs, so maybe that’s what you want.

Yes.

Yes, but you shouldn’t worry much about performance on debug builds.

What confuses me is that I would expect debugging facilities to be enabled by -O Debug and not by the (manual) choice of an allocator.

Why doesn’t there exist a real “general purpose” allocator that does additional checks only when making a debug build? Instead, I’m even supposed to manually write:

std.debug.assert(debug_allocator.deinit() == .ok);

Moreover -O ReleaseFast would drastically alter assert’s behavior, but the GPA (now called DebugAllocator) still does debugging overhead.

Perhaps I still misunderstand what -O Debug implies (or is supposed to imply). Is the standard library deliberatly not making any distinctions depending on the requested optimization level?

What I mean is that DebugAllocator and ArenaAllocator share the properties of:

  • supporting a backing allocator
  • saving memory when requesting small chunks of memory and the backing allocator only supports page-wise allocation

I wonder if there is a way to get these features without the additional debugging or free-the-whole-arena feature, respectively.

The DebugAllocator does this. In the config you will the default value is

    /// Whether to enable safety checks.
    safety: bool = std.debug.runtime_safety,

And furthermore leak traces are only enabled in debug mode:

const default_stack_trace_frames: usize = switch (builtin.mode) {
    .Debug => default_sys_stack_trace_frames,
    else => 0,
};

pub const Config = struct {
    /// Number of stack frames to capture.
    stack_trace_frames: usize = default_stack_trace_frames,

The leak check you are referring to is relatively cheap, most likely this information is determined by just checking if there are still any active allocations. It shouldn’t have any relevant overhead.

What are you trying to optimize here? There is no way that the overhead of the free-the-whole-arena feature, which just requires a linked list of the O(log(memorySize)) parent allocations, is ever going to be a bottleneck in a real-world application.

Even the overhead of the debug allocator is not that relevant in my experience, and every time it was relevant it was just hinting me at another problem where I didn’t optimize my application well enough.

2 Likes

Ah okay, but then I don’t understand why it’s not a “general purpose” allocator anymore, if it does the additional debugging checks only in -O Debug mode.

The information I have is now somewhat contradicting (at least to my understanding). I.e. does DebugAllocator only perform debugging checks in -O Debug mode, and if yes, then why isn’t it a “general purpose” allocator anymore? Is it slow for other reasons?

I’m not looking at a specific problem as of yet, I just try to understand the variety of allocators better.

Yes, exactly. The allocator itself is just using an inefficient algorithm. Not sure if that’s to make the debugging easier or just because they didn’t know how to make it better when they wrote it many years ago.

1 Like

From this issue:

  • introduce an API for default allocator selection

[…]
I will open a new issue, or perhaps simply a PR that introduces std.heap.DefaultAllocator that provides reasonable conditional compilation logic for choosing an allocator in an application’s main function.

Considering…

and that it is no longer named GeneralPurposeAllocator but DebugAllocator, I wonder what to do in practice right now. There seems no non-deprecated way to pick a generic/normal/everyday allocator as the API is work in progress at this point. :sweat_smile:

Maybe I will just stick to use the GPA under its old name GeneralPurposeAllocator and refactor once a better API is available. Or I will do something like this:

const std = @import("std");

// TODO: Use default allocator interface of standard library once available
const DefaultAllocator = if (@import("builtin").mode == .Debug) struct {
    debug_allocator: DebugAllocator,
    const DebugAllocator = std.heap.DebugAllocator(.{});
    const init = @This(){ .debug_allocator = DebugAllocator.init };
    pub fn allocator(self: *@This()) std.mem.Allocator {
        return self.debug_allocator.allocator();
    }
    pub fn deinit(self: *@This()) void {
        std.debug.assert(self.debug_allocator.deinit() == .ok);
    }
} else struct {
    const init = @This(){};
    pub fn allocator(_: *@This()) std.mem.Allocator {
        return std.heap.smp_allocator;
    }
    pub fn deinit(_: *@This()) void {}
};

pub fn main() !void {
    var def_alloc = DefaultAllocator.init;
    defer def_alloc.deinit();
    const alloc = def_alloc.allocator();
    const some_bytes: []u8 = try alloc.alloc(u8, 3);
    some_bytes[0] = 123;
    alloc.free(some_bytes); // comment this out under debug mode
}

That’s very verbose though, so maybe I’ll just wait for a fixed API.

0.14 release notes recomends

var debug_allocator: std.heap.DebugAllocator(.{}) = .init;

pub fn main() !void {
    const gpa, const is_debug = gpa: {
        if (native_os == .wasi) break :gpa .{ std.heap.wasm_allocator, false };
        break :gpa switch (builtin.mode) {
            .Debug, .ReleaseSafe => .{ debug_allocator.allocator(), true },
            .ReleaseFast, .ReleaseSmall => .{ std.heap.smp_allocator, false },
        };
    };
    defer if (is_debug) {
        _ = debug_allocator.deinit();
    };
}

to conditionaly choose between DebugAllocator and smp_allocator

2 Likes

Shouldn’t this be:

        std.debug.assert(debug_allocator.deinit() == .ok);

?

Only if you want to crash on an unreachable if it’s not .ok.
It will still print issues as that’s an implementation detail of the allocators deinit

Also, it still needs if (is_debug) otherwise it would result in the debug_allocator not being optimised out in release builds as it’s being used.

1 Like