Complexity of ArenaAllocator

In my current project I have an AST-Tree based on a text document.
My current setup involves creating the AST and then recursively walking the Tree to clean up memory usage.

I thought of using an ArenaAllocator instead that holds all allocations relating to the document and it can be freed.

I wanted to ask if allocation or deallocation is slower with an ArenaAllocator. Or does it have any other implications for memory segmentation?

If nothing major speaks against it it sounds like the better way of dealing with this.

if the arena doesn’t have memory available, it has to ask the sub allocator for more. So initial allocations will be slightly, but probably immeasurably, slower. But if the arena does have memory available it will be quite fast.

De-allocation will be rather fast as well, but it has the caveat that it can only free allocations in reverse order. If you attempt to free memory out of that order it will not actually free it. If that is a common occurrence than you should consider if the wasted memory is worth it.

The main benefit of an arena is you can free all the memory at once, even retaining the memory to be reused.
That will be significantly faster than traversing a tree and freeing the nodes individually.
It also can simplify code, if you can assume/ensure an arena is used, as then you can ignore freeing most things.

6 Likes

That is really interesting. That de-allocation can only happen in reverse order is good to know. I think using the ArenaAllocator is a good way to manage my problem as I don’t really think that I need any allocations that don’t survive the lifetime of the document.

1 Like

The real savings for an arena happen if your code reuses it. Arenas can be reset: this is very fast, and the arena retains all the memory it kept (or as much as you tell it to) and recycles that first.

It might still be the best pattern even if your code only frees once, just for the convenience of it, and to not have to walk an entire tree structure giving back memory one parcel at a time. But the design of the ArenaAllocator in std is oriented around amortizing the initial cost of allocation over several rounds of allocation and freeing.

6 Likes

An even better way is to use an std.ArrayList to store your nodes in an array, and use indices instead of pointers. If you can get away with it, you can also use u32 indices to save 4 bytes per index stored (this will limit the documents size you can parse to 4gb, but it’s often a reasonable limitation).

1 Like

Quite frankly, because of this I wish you could iterate over a slice/array of numbers in reverse order with for instead of needing to go through it with while.

Note that ArenaAllocator has situations where freeing does not de-allocate even if you do it in reverse order.

The common situation is allocations of varying alignment - if ArenaAllocator has to bump its internal pointer for alignment, it doesn’t know how to de-allocate back across that.

A rarer situation is if ArenaAllocator has multiple buckets of memory from the underlying allocator - it won’t de-allocate back across a bucket boundary. This usually doesn’t matter because when you call reset() it tries to get a single large bucket for next time.

1 Like

I simply do this, and keep the for:

for (0..array.len) |n| {
  const r = array.len - n - 1;
  const elem = array[r];
  // ...
}

But yeah, I wish for was a bit more flexible than this.

1 Like

I see. Since my use case is pretty oriented around having the allocation all done and be freed at the same time when it is done that is not a problem for me. But still interesting.

The Individual Tokens are tagged unions and are not them self allocated on the heap (except for “Word”).
I just pass that to the Parser and that iterate over the Token list. The resulting data structure is a tree. Not sure if I misunderstood you, if so please give me a heads up. As I am currently rewriting the parser I can’t show you the code of that. But this is the code of the Lexer till now: Workers and was hiding no one knew of. (Did not implement arena allocation yet)

I’m saying that the AST can be stored in an array instead of using an arena allocator.

Here’s how I did it for a JSON alternative data serialization language: GitHub - kristoff-it/ziggy: A data serialization language for expressing clear API messages, config files, etc.