Allocators / Memory Management

ga-taylorj · December 27, 2023, 9:24pm

Hi,

I’m trying to understand how I should be handling zig memory. My experience comes predominantly from garbage collected languages.

Looking at the ziglings, documentation etc. it seems if you’re creating library code you don’t specify allocators in your code, but rather, expose a parameter into your API which accepts an allocator. This enables users to choose what allocation strategy they want to use.

While I understand this choice as a way to ensure no assumptions are being made by the language, it seems strange to me that a data structure wouldn’t be in charge of figuring out the best memory allocation strategy for itself. However, that’s probably a larger topic to get into once I understand more.

When it comes to functions in my own code, say I have a function which needs to return a list of items. That list is not something I know the size of at compile time and as such I’m creating it on the heap. Since I want to return the list to the caller, should the caller be responsible for providing an allocator to my function? Is it more idiomatic zig to provide in / out parameters or to have return values?

Option 1

fn getList(allocator: std.mem.Allocator) anyerror!std.ArrayList([]u8) {
  var myList = std.ArrayList([]u8).init(allocator);
  // Add stuff to the list
  return myList;
}

Option 2 (I presume this works?)

fn getList(list: std.ArrayList([]u8)) void {
  // Add to the list
}

Option 3
Likely I’ve completely got things wrong and option 3 is the best option, if so, what should it look like?

Follow on question
If a caller is responsible for providing the allocators, and you don’t know how a caller is going to use your function / library, should you accept multiple allocators for each thing you want to return in order to keep things as flexible as possible?

Say I have a struct with two functions, each function gets called but one function maybe gets called very frequently and the memory needs freeing more frequently. Should I avoid a single allocator so it gives the caller the option to provide an allocator for each case and handle freeing them independently?

dude_the_builder · December 27, 2023, 9:58pm

In the case of ArrayList the ideal option would be your second one, except that you want the function parameter to be a pointer instead:

fn getList(list: *std.ArrayList([]u8)) void {

This gives the caller the flexibility to use the allocator they wish when initializing the list and even prepend items to the list if desired before passing a pointer to it to the function.

Another idiom is to pass in an allocator, create the list within the function but instead of returning the list itself, you return the slice of items and discard the list.

// Caller must free returned slice with `allocator`.
fn getSlice(allocator: std.mem.Allocator) anyerror![][]u8 {
  var myList = std.ArrayList([]u8).init(allocator);
  errdefer myList.deinit();
  // Add stuff to the list
  return try myList.toOwnedSlice();
}

Regarding accepting multiple allocators, it all depends but it’s totally acceptable if it provides maximum flexibility to the caller in situations such as your example struct. One option within a function that needs to make many temporary allocations that don’t outlive the function itself is to use an inner arena allocator using the passed-in allocator as its backing allocator. You then use the arena allocator within the function for all the short-lived allocations and with a single defer arena.deinit() you free everything all at once on function exit and the passed in allocator is good as new. Depending on the scenario, this may even provide better performance.

ga-taylorj · December 28, 2023, 1:06am

Ah yeah makes sense, thanks. I presume calling something like toOwnedSlice shallow copies the list so size of the list I’m likely to return is a factor in whether I do that or the first option.

Makes sense about having an arena allocator or other such allocator within the function to do some of the work internally. I presume you’d always have it based on an allocator that’s passed in though to avoid hiding behaviour from the user. i.e. you’d never inside a function, even if it’s just for internal use, create an arena allocator based on the page allocator.

dude_the_builder · December 28, 2023, 1:57am

toOwnedSlice basically “disconnects” the slice (pointer to allocated heap memory) from the list and re-initializes the list to empty. Then the slice is a cheap copy given it’s just a pointer and a length.

You’re definitely on the right track to mastering Zig!

Tosti · December 28, 2023, 9:26pm

On a side note

Please avoid using anyerror unless it is required. When possible, let the compiler infer the error set by specifying !T as a return type. !T and anyerror!T are different things: the former represents a specific minimal error union and the latter is an error union that may contain any error value, i.e., this type doesn’t have any compile-time restrictions on the value of the error. As a consequence, switching on anyerror!T can’t be exhaustive.

fn f() !void { // Same as error{Error}!void
    return error.Error;
}
fn g() anyerror!void {
    return error.Error;
}
pub fn main() void {
    f() catch |err| switch(err) { // OK
        error.Error => {},
    };
    g() catch |err| switch(err) { // error: else prong required when switching on type 'anyerror'
        error.Error => {},
    };
}

anyerror!T is required only for dealing with function pointers when there is no restriction on what error may be returned (in this case the compiler can’t statically determine the error set).

Sorry if it’s too much off-topic. I don’t know where should I put this comment.

ga-taylorj · December 28, 2023, 9:30pm

Ah thanks, that’s really helpful. I did wonder if I should be specifying error types or not. In other languages we specify return types for others to understand the API, others we specify them to ensure future updates to the function stick to the spec. Not sure entirely where Zig falls into this, but happy to stick with inferred types where possible.

Is there an equivalent way of having the compiler infer the return type or is this something that’s always required?

Tosti · December 28, 2023, 9:57pm

AFAIK, you have to always specify a function return type, it can’t be inferred. So there is no such thing as

auto f() { return 42; } // C++11, D syntax

The reason behind this is readability and tooling simplification.

But there is an exception. If a return type is an error union, then error set-part of this type can be inferred by using !T syntax. You still may type it explicitly, but there is a risk that the error set you’ve typed is broader than necessary.

fn f() error{ E1, E2 }!void {
    return error.E1;
}

That’s technically possible, but I don’t know any practical applications of making error sets broader than necessary. Maybe the compiler will give an error if it spots such places in the future, but currently it doesn’t.

It’s a little bit strange that on the one hand, Zig forces you to be explicit, but on the other hand, it allows you to infer error sets. I heard stories that Java folks struggled with the requirement to explicitly specify all throwed exceptions as part of the function signature. It was so tedious that in the end programmers decided to write throw(Exception) to just shut the compiler up. So it was a misfeature. Maybe Zig decided to infer error sets to avoid the Java’s throw(Exception) fate.

Hope this answers your question.

AndrewCodeDev · December 28, 2023, 10:47pm

True - anyerror can also be used on function type signatures for generic dispatching. This is sometimes helpful if you have many different functions that can be trafficked through a declared type that could return errors unique to each function that it interfaces for. It has its uses, but also consequences.

chung-leong · December 29, 2023, 3:40am

Variable-size structures do not necessarily imply the use of heap memory. In many situations, when data doesn’t need to persist beyond the current function scope, it’s more performant to use a StackFallbackAllocator. Basically, you set aside a certain amount of stack space based on a rough estimate of how much memory would be required generally. If it’s not exceeded, then you don’t incur the overhead of using the heap. If some jack-ass decides his name has 50000 characters, the fallback allocator ensures that your code wouldn’t fail.