Idiomatic Scratch Arena

matklad · September 27, 2023, 12:57pm

Inspired by this excellent post

https://nullprogram.com/blog/2023/09/27/

and the piece of glue code I am working on right now, which requires a fair amount of ad-hoc temporary string allocation & manipulation.

The top quote for me is this:

typedef struct {
    char *beg;
    char *end;
} arena;

strlist *getlines(str path, arena *perm, arena scratch)
{
    ...
}

The scratch arena is passed by copy — i.e. a copy of the “header” not the memory region itself. Allocating only changes the local copy, and so cannot survive the return. The semantics are obvious to callers, so they’re less likely to get mixed up.

This is really neat! there’s not a single defer, or any other code besides the scratch argument to every function, and yet every function gets its own runtime-sized space to allocate into, which is automatically cleaned up on exit. alloca done right basically.

What would be an idiomatic way to express such per-stack-frame arena pattern in Zig? Are there example projects using something like this?

Note that this is not just using std.heap.ArenaAllocator, the trick is that each function gets its own arena.

jmc · September 27, 2023, 1:46pm

Not precisely the same thing, but passing such an arena struct as a parameter and instantiating a local copy of it lets you modify the pointers in the local copy while keeping the original untouched.

Lame example:

$ cat src/main.zig 
const std = @import("std");

const ScratchArena = struct {
    mem: [*]u8,
    capacity: usize,

    pub fn alloc(self: *@This(), size: usize) *anyopaque {
        if (size > self.capacity)
            std.builtin.panic("out of capacity", null, null);

        var ptr = self.mem;
        self.mem += size;
        self.capacity -= size;
        return ptr;
    }
};

fn demo(iteration: usize, arena: ScratchArena) void {
    if (iteration == 0) return;

    std.debug.print("-> demo[{d: >2}]: arena=.{{ .mem = {*}, .capacity = {d} }}\n", .{ iteration, arena.mem, arena.capacity });

    var local: ScratchArena = arena;
    var mem: [*]u8 = @ptrCast(local.alloc(1024));
    _ = mem;
    demo(iteration - 1, local);

    std.debug.print("<- demo[{d: >2}]: arena=.{{ .mem = {*}, .capacity = {d} }}\n", .{ iteration, arena.mem, arena.capacity });
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer if (gpa.deinit() != .ok) std.log.err("memory leak detected!\n", .{});
    const allocator = gpa.allocator();

    var buffer = try allocator.alloc(u8, std.math.pow(usize, 2, 20));
    defer allocator.free(buffer);

    var arena: ScratchArena = .{
        .mem = buffer.ptr,
        .capacity = buffer.len,
    };

    demo(5, arena);
}

$ zig build run
-> demo[ 5]: arena=.{ .mem = u8@7f4b64dd7000, .capacity = 1048576 }
-> demo[ 4]: arena=.{ .mem = u8@7f4b64dd7400, .capacity = 1047552 }
-> demo[ 3]: arena=.{ .mem = u8@7f4b64dd7800, .capacity = 1046528 }
-> demo[ 2]: arena=.{ .mem = u8@7f4b64dd7c00, .capacity = 1045504 }
-> demo[ 1]: arena=.{ .mem = u8@7f4b64dd8000, .capacity = 1044480 }
<- demo[ 1]: arena=.{ .mem = u8@7f4b64dd8000, .capacity = 1044480 }
<- demo[ 2]: arena=.{ .mem = u8@7f4b64dd7c00, .capacity = 1045504 }
<- demo[ 3]: arena=.{ .mem = u8@7f4b64dd7800, .capacity = 1046528 }
<- demo[ 4]: arena=.{ .mem = u8@7f4b64dd7400, .capacity = 1047552 }
<- demo[ 5]: arena=.{ .mem = u8@7f4b64dd7000, .capacity = 1048576 }

I guess the downside is that now we have 2x copies of the header struct living on the stack?

matklad · September 27, 2023, 2:30pm

I also don’t like that there’s now both arena and local in scope, and they can’t be mixed up (you can’t use arena for allocation, as it is const , but you still can erroneously pass it to subroutines). But yeah, that’s the closest I’ve got to this pattern as well.

LucasSantos91 · September 27, 2023, 6:55pm

Note that the above is equivalent to just []u8. That is the biggest clue to how to implement this pattern in zig. Ultimately, an arena is just a collection of objects that will be deallocated at once. A scratch memory is just a bunch of bytes that we can use however we want. The user of scratch memory should know that the memory will be reused later, so no long-living objects should be placed there. Here’s one way to do it:

fn fun(scratch: [] u8) void{
  // Scratch is just a bunch of bytes, use however you want.
  // If you want it aligned, you can either take the parameter already aligned,
  // or you can align it yourself.
  const aligned = std.mem.allignInSlice(scratch, 16) orelse return;
  
  // If you want an arena, just cast the slice.
  var arena : []Object = undefined;
  arena.ptr = @ptrCast(aligned.ptr);
  arena.len = @divFloor(aligned.len, @sizeOf(Object));

  // Now arena is just a bunch of unitialized Objects. You can initialize however many
  // you want to use. When the function returns, those objects will still be there, but the 
  // memory will be reused in later function calls.

  // If you want to use the scratch memory to allocate more than one type of object,
  // you can manually chop off the slice into pieces, and create multiple arenas or, the
  // easiest way, just give the slice to a FixedBufferAllocator.
  var fba = std.heap.FixedBufferAllocator.init(scratch);
  const allocator = fba.allocator();
  const objects = allocator.alloc(Object, 3);
  const otherObjects = allocator.alloc(OtherObjects, 3);
}

fn userOfFun() void{
  // The scratch memory can come from stack, heap, you name it.
  var scratch: [256]u8 = undefined;
  fun(&scratch);
}

Note that I didn’t deallocate any of those objects. Just let the memory be reused.
In the article you linked, the author says:

That is wrong. If the objects need deinitialization logic, you still need to run the destructor for each object. In the example above, If Object and OtherObject needed to be deinit’ed, I would still need to walk the slices and deinit each one. Deallocation and deinitialization are orthogonal.

AndrewCodeDev · September 27, 2023, 7:56pm

There is a term for this in the C++ standard - “Trivially Destructible”.

In other words, like an integer, how would one “destruct” an integer? Setting the bytes to zero is just an integer of value zero. Basically, it’s semantically meaningless to destroy an integer - you just forget about it and let something else interpret the memory in some other way.

For what the article is talking about, I think we’re talking about trivially destructible types - that’s where the arena really shines because you can just free your memory pool all at once.

You probably know all this @LucasSantos91, but I wanted to add this for the sake of discussion for anyone else reading this stuff for the first time.

matklad · September 27, 2023, 8:26pm

“This pattern” I am personally interested in here is not as much arena/FBA per se, as the trick for passing sub-arenas to subroutines. I really like how “pass arena by value” just happens to have the right semantics in C with zero additional code.

kristoff · September 28, 2023, 2:46pm

I think the closest comparison to what the author was doing in the blog post is the following:

const std = @import("std");

pub fn main() !void {
    var buf: [1024]u8 = undefined;
    var fba = std.heap.FixedBufferAllocator.init(&buf);

    std.debug.print("Before getLines: {}\n", .{fba.end_index});
    try getLines(fba);
    std.debug.print("After getLines: {}\n", .{fba.end_index});
}

fn getLines(fba: std.heap.FixedBufferAllocator) !void {
    var scratch = fba; // must be explicit

    const num_ptr = try scratch.allocator().create(usize);
    num_ptr.* = 42;

    std.debug.print("(getLines) number: {}\n", .{num_ptr.*});
}

Our arena allocator is too sophisticated to allow making random copies of it without risking breakages, and the same would apply to a C version of it. What the article calls an arena, it’s a FixedBufferAllocator for us.

Compared to the C version, the only difference is that we need to make an explicit copy of the incoming fixbuf allocator, but that’s it.

I do think that the C version has it’s own special elegance to it, but mutable function arguments also have their own fair share of related footguns and personally, I prefer making this usecase less elegant in exchange for having fewer footguns.

andrewrk · September 28, 2023, 11:52pm

It seems strange to use a fixed buffer for an arena because if you knew the upper bound you could have just put it on the stack. The only reason to not use the stack would be if your upper bound is runtime-known, in which case you can’t use a fixed buffer, you need a Zig-style arena allocator that can grow on the heap if necessary.

kristoff · September 29, 2023, 7:08am

I think their approach is to make one big arena that is expected to be big enough for the entire runtime of the program, and which is also too big for the stack, like a 200mb arena.

Although it’s not unusual for those expectations not to be met, causing breakages and UB. I’ve seen in a few C projects code that relies on a fixed buffer sized to something like 100 or 1024 bytes, which works right up until it doesn’t.

I guess it’s a C-ism that we can leave behind us.

matklad · September 29, 2023, 8:41am

I can see at least two cases where this style of FBA makes sense:

I don’t have a strict limit on every particular data structure, so I can’t statically allocate max_size of everything, like we do in TigerBeetle, but I still want to upper-bound the max heap size as a defense in depth, so that, even if I have a run-away code path which could consume all the ram, my program cleanly aborts instead of thrashing the entire computer.
I use an OS with virtual memory so I can reserve like 64GiB of virtual address space from the get-go, and get a nice, contiguous arena to allocate into.

I believe 1) was the origin for this code — it’s from a pkg-config implementation, which needs to guard against zip-bomb style inputs.

kristoff · September 29, 2023, 10:53am

This can be also achieved by having a memory limit concept in your bottom allocator. GPA supports this feature IIRC.

matklad · September 29, 2023, 11:23am

That would be more complex software though. By the same token, you could also “just use GC”, and not worry about memory management at all.

I think using FBA can be quite elegant in some contexts. A CLI utility which expects to operate in low tens megs of space, does a single mmap / VirtualAlloc at startup, and prints a clear error message when it runs out of memory, hinting at a potentially erroneous, pathological input would be a perfect use-case here.

I wouldn’t call this a C-ism to be dismissed, to me it seems more like a useful pattern to have in the toolbox. C-ism would be to recklessly malloc&free left and right

kristoff · September 29, 2023, 11:57am

I think it’s a relevant point considering that in the context of this discussion we are opting into an allocator interface, and from that point onwards, “defense in depth” wrt overallocation can be achieved regardless of how the allocator functions. The fact that it’s an FBA doesn’t bring any advantage from that perspective (ie from the perspective of achieving “defense in depth”), not even in terms of simplicity, since all you need is a counter to prevent going over a certain quota.

If you want to go with full simplicity, then you can just pass around a byte slice that you consume as you go downwards the call stack and re-slice for your subcalls.

https://github.com/redis/redis/blob/c2a4b784918587e853a50211f7d5173097ecb22a/tests/modules/getkeys.c#L115-L124

Redis is full of best effort printing / parsing behavior like the one linked above. I’ve also seen it in other projects and IMO it’s guided more by the convenience of just plopping there a small buffer, than a choice focused on optimal behavior of the program. That’s the C-ism I was referring to.

matklad · September 29, 2023, 12:00pm

Ah, sorry, I misunderstood that, thanks for clarifying!

jnordwick · October 19, 2023, 11:09am

You forgot if the allocation is large. (think threads)