ArenaAllocator with StringHashMap, panic or segfault

Please help me see what I’m doing wrong in this test case. Directly using an ArenaAllocator with a StringHashMap seems fine (the “good” case below), but when I attempt to wrap things in an App struct, I observe a panic or seg fault. I feel like I may be keeping a copy of a bad allocator state object or something, but I’m not sure how or why.

I think I’m supposed to pass allocators around by value, not pointer, because that’s what I’ve seen in the standard library. And it’s just a vtable, anyway. But the allocator state should be unique. In both implementations below, It seems like I have a unique ArenaAllocator, so I’m not sure what I’m doing wrong.

(I realise the pointer I’m putting in the hash map points to uninitialised memory, but it’s never accessed. Just to simplify the example.)

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();

    // when good = true, runs fine. when good = false, seg fault or panic.
    const good = true;

    if (good) {
        var arena = std.heap.ArenaAllocator.init(gpa.allocator());
        defer arena.deinit();

        var map = std.StringHashMap(*Foo).init(arena.allocator());

        const alloc = arena.allocator();
        const p = try alloc.create(Foo);

        try map.put("foo", p);
    } else {
        const simple = true;
        if (simple) {
            // attempt to give an allocator to an app object, which
            // will then create its own arena allocator using the gpa,
            // and do its own logic
            var app = App.init(gpa.allocator());
            try app.foo();
        } else {
            // also fails: pass pointer to my gpa, in case we're
            // copying values we shouldn't be
            var app = App.init2(&gpa);
            try app.foo();
        }
    }
}

const App = struct {
    arena: std.heap.ArenaAllocator,
    map: std.StringHashMap(*Foo),

    pub fn init(alloc: std.mem.Allocator) App {
        var arena = std.heap.ArenaAllocator.init(alloc);

        return .{
            .arena = arena,
            .map = std.StringHashMap(*Foo).init(arena.allocator()),
        };
    }

    pub fn init2(gpa: *std.heap.GeneralPurposeAllocator(.{})) App {
        var arena = std.heap.ArenaAllocator.init(gpa.allocator());

        return .{
            .arena = arena,
            .map = std.StringHashMap(*Foo).init(arena.allocator()),
        };
    }

    pub fn foo(self: *App) !void {
        const alloc = self.arena.allocator();

        const p = try alloc.create(Foo);

        // next line causes panic or seg fault when good = false
        try self.map.put("foo", p);
    }
};

const Foo = struct {
    int: i64,
};

It’s because std.mem.Allocator stores a pointer to it’s implementation, so the allocator you store in your hashmap points to the arena declared in App.init, which goes out of scope and leaves you with a dangling pointer.
One solution is to use std.HashMapUnmanaged and pass the arena’s allocator when required.

2 Likes

Ah, I think I see, but let me check with you:

I am extending the lifetime of the arena by returning it as a field in the App struct. So I should be able to use the arena again back in main(). No problems there.

But StringHashMap captures the current memory location of the arena state when it’s initialised with the arena’s allocator. So when I move the arena state to different bits of memory, StringHashMap is still pointing to something on the stack that went out of scope. Oops.

I’d also seen that Unmanaged worked fine in the scenario from which I took the example, but I was struggling to see why. Thank you for the help!

Yeah, that’s pretty much right. Specifically it’s the call to arena.allocator() that takes a pointer to the arena living in the functions stack frame. When you return, the pointer bint rh allocator stored by the map becomes invalid. The actual arena data gets copied when you return from the function, so it changes address. Watch out for copies of vtable interface implementations when you still have interface types pointing to them.

Now that I understand this better, it seems another alternative to using HashMapUnmanaged is to allocate space for the arena allocator’s state using the allocator provided to App.init, and then release it in deinit. It’s a pattern I hadn’t seen before, so I’ll include it here in case it helps other learners:

const App = struct {
    arena: *std.heap.ArenaAllocator,
    map: std.StringHashMap(*Foo),

    pub fn init(alloc: std.mem.Allocator) App {
        // allocate storage for arena allocator state
        var arena_state = alloc.create(std.heap.ArenaAllocator) catch unreachable;
        arena_state.* = std.heap.ArenaAllocator.init(alloc);

        return .{
            .arena = arena_state,
            .map = std.StringHashMap(*Foo).init(arena_state.allocator()),
        };
    }

    pub fn deinit(self: App, alloc: std.mem.Allocator) void {
        self.map.deinit();
        self.arena.deinit();
        alloc.destroy(self.arena);
    }
};

(edited to add a missing deinit).

You would also need to deinit the map element:

pub fn deinit(self: App, alloc: std.mem.Allocator) void {
    self.map.deinit();
    self.arena.deinit();
    alloc.destroy(self.arena);
}

Right-o, overlooked. Thank you!

std.json does something similar:

pub fn parseFromValue(
    comptime T: type,
    allocator: Allocator,
    source: Value,
    options: ParseOptions,
) ParseFromValueError!Parsed(T) {
    var parsed = Parsed(T){
        .arena = try allocator.create(ArenaAllocator),
        .value = undefined,
    };
    errdefer allocator.destroy(parsed.arena);
    parsed.arena.* = ArenaAllocator.init(allocator);
    errdefer parsed.arena.deinit();

    parsed.value = try parseFromValueLeaky(T, parsed.arena.allocator(), source, options);

    return parsed;
}

is an error discarding misuse resulting in:

undefined behavior in ReleaseFast or ReleaseSmall modes

This always panics:

var arena_state = alloc.create(std.heap.ArenaAllocator) catch @panic("OOM");
2 Likes

Got it, thanks! Seems like catch unreachable should be disallowed - it’s like declaring a variable and not using it.

I’ll take any opportunity I can to link this talk:

(it should be cued up to where catch unreachable is mentioned)

3 Likes

There can be cases where you know that an error is never going to happen, but the compiler doesn’t know that, in those cases it is a valid use. Here is an example:

// has error signature because that signature is needed for a function pointer somewhere
fn neverErrorGetVal() error{OutOfMemory}!u32 { 
    return 5;
}
fn use() void {
    const val = neverErrorGetVal() catch unreachable; // here the programmer knows that the catch is actually impossible
    ...
}

The example is a bit contrived but similar examples actually exist in real code.

2 Likes

FYI, you have catch undefined instead of catch unreachable in your example

1 Like

the map was init with the allocator from the arena, doesn’t that mean all its internal allocation will happen in the arena, so is there any need to deinit the map individually?

Possibly - but I think it’s more correct to respect the interface provided by Map, which expects a call to deinit(). It does a bit more than simply release its backing storage.