Wildcat allocations and Allocator Scope

dee0xeed · November 23, 2023, 6:54pm

I guess questions I am going to ask, have already been asked many times.
If so, please, point me to relevant discussions.

We all know that proper way of dealing with heap allocations in Zig
is to create an allocator in the main() and then pass it to every function/subsystem
which needs heap to do it’s work.

But I can easily break the rules:

const std = @import("std");

pub fn main() !void {
    std.debug.print("result = {}\n", .{try very_bad_func()});
}

fn very_bad_func() !i32 {
    const S = struct {
        a: i32,
        b: i32,
    };
    var GPA = std.heap.GeneralPurposeAllocator(.{}){};
    var gpa = GPA.allocator();
    var s = try gpa.alloc(S, 1000);
    s[0].a = 7;
    return s[0].b;
}

This means that this Zig approach to heap allocations is nothing more than merely a convention, isn’t it?
Are there any (even hypothetic) ways to prevent writing things as in the example?
For example, is it possible to forbid creating allocators everywhere (except main) at compiler level?

IntegratedQuantum · November 23, 2023, 7:10pm

It’s just a convention. And honestly there is nothing wrong with for example having a global allocator somewhere, instead of passing it through every function call.

Are there any (even hypothetic) ways to prevent writing things as in the example?
For example, is it possible to forbid creating allocators everywhere (except main) at compiler level?

Most heap-based allocators rely on std.heap.page_allocator. So if you removed that from the standard library, all attempts to make a new allocator somewhere else would fail. But that would only make it harder to make functions like this.
Like in the end var x: [1000000000]u8 is basically heap allocation.

May I ask why you want to do this?

dee0xeed · November 23, 2023, 7:20pm

I do not, I really like creating allocators in main and then pass them everywhere.
My point was this can only be forced by consciousness of a programmer.

dee0xeed · November 23, 2023, 8:40pm

Of course. We can make functions/subsystems which on the sly create allocators, use them and (if they are good) eventually do appropriate cleanup. But such a “style” would break all the “no hidden allocations” philosophy…

IntegratedQuantum · November 23, 2023, 9:03pm

Yeah, and I think you definitely shouldn’t do this for a library.

But at the same time it makes sense to trade that philosophy for convience.
Take for example ArrayList.append: No allocator is passed, yet it allocates memory.
It obscures allocations for convenience.

It’s the same for global allocators. They obscure allocations within the project for convenience.

dee0xeed · November 24, 2023, 6:21am

That is my question: is it possible for this (very good) convention to become a strict rule (i.e forced by the compiler) someday? I think it’ll be very nice.

It’s not the case I’m talking about.
ArrayList receives allocator via it’s constructor init.
And hence we know that it needs heap.
We kinda do not know where/when it will allocate, but we definitely know it will.

IntegratedQuantum · November 24, 2023, 9:36am

It’s still an important case here. If it’s ok to store a copy of an allocator somewhere, then what prevents a bad library from storing it globally and reusing it for all of its methods? Isn’t that almost as bad as a library creating a new heap allocator?

Cases like this make it hard to even formulate a strict rule.

dee0xeed · November 24, 2023, 10:06am

I’m afraid, I do not understand what you mean… what does that mean, “store it globally”?..

IntegratedQuantum · November 24, 2023, 10:18am

var globalAllocator: Allocator = undefined;
pub fn initBadLibrary(allocator: Allocator) *LibraryMainStruct {
    globalAllocator = allocator; // Sneakily storing the allocator in a global variable.
    return try allocator.create(LibraryMainStruct); // Actually allocating things as a diversion
}

pub fn badUtilityFunction(...) ... {
    ... globalAllocator.alloc(...); // Now heap allocating behind your back.
}

dee0xeed · November 24, 2023, 10:51am

Ah, ok, got it.

Well, I can not see any difference between storing (explicitly passed) allocator in some library global variable and storing it inside LibraryMainStruct.

IntegratedQuantum · November 24, 2023, 11:09am

So

pub fn initLibrary(allocator: Allocator) *LibraryMainStruct {
    globalAllocator = allocator;
    return try allocator.create(LibraryMainStruct);
}

is acceptible, but

pub fn initLibrary(allocator: Allocator) *LibraryMainStruct {
    globalGPA = std.heap.GeneralPurposeAllocator(.{}){};
    globalAllocator = globalGPA.allocator();
    return try allocator.create(LibraryMainStruct);
}

isn’t?

dee0xeed · November 24, 2023, 11:29am

Yes, exactly.
In the first snippet we are using an allocator from a caller
In the the second one (which is a bit of a joke, I guess) we for some reason create a new one.

IntegratedQuantum · November 24, 2023, 11:44am

I think there are valid reasons for a library to create its own allocator like this. The library knows best what allocator to use and creating a GPA might be better than using e.g. an arena allocator that was passed in.

dee0xeed · November 24, 2023, 12:33pm

Ok, I got it. So, it’s extremely unlikely that in the future Zig will (somehow) enforce “no hidden heap allocations” rule/convention, right? Or?..
I can imagine some tool, that would analyse a library for potential hidden allocations.
grep GeneralPurposeAllocator or so…

IntegratedQuantum · November 24, 2023, 12:42pm

Yeah, I don’t think that zig will do this.

While external analyzation tools would be possible, I think they could at most use heuristics.
Like for example grep GeneralPurposeAllocator would find a false-positive when taking a GPA as a function parameter. And you could always work around this using
@field(std.heap, "General"++"PurposeAllocator");

dee0xeed · November 24, 2023, 1:07pm

I will mark the post with this statement as “solution”, meaning this is the answer to my question.

Sze · November 24, 2023, 8:26pm

I just want to add that creating and using an allocator locally within a function is a perfectly valid use case.

For example I have a bunch of functions that just format some text and create null terminated strings which are then passed on to raylib draw calls that display those strings on the screen, these strings are short lived and only need to exist long enough so that raylib can convert them to glyphs and batch those up to be rendered later.

These functions aren’t even allowed to fail with an error, they just catch allocation failure with catch "<out of memory>" thus if my fixedbufferallocator is too small it will render that special string instead. Why? Because it is easier for me to see where it fails if I can see that string instead of my program crashing and giving me a source location that is used with a bunch of different data, when the program continues running I can actually inspect for which kind of things the bug exists / some limit is hit.
And all these things being printed are debug output tools for my convenience. At the same time I don’t have to worry about my custom tools built for getting insight at run time crashing the program I am trying to interact with, if I have a bug in my tool it is easy to see and fix, or if the fixedbuffer needs to be bigger because I want to display more stuff I can increase the size. (I also reset the FixedBufferAllocator all the time after I used the temporary strings, so that the next one can have the maximum size)

I would argue that writing tests is the way to make sure the allocators are used properly, one tool you can use is checkAllAllocationFailures and I guess you could extend that to libraries by saying those should have tests as well.
Having general tools that help identify problems with libraries (beyond just looking at their source code) could be helpful, but I think general memory profiling applications could already be a big help in for example taking notice of a library that internally uses its own allocator reserving lots of memory.

I think the only way you could end up with a language that prevents you from using allocators in certain ways is if you would enforce some system of restrictions how allocators can be used, it seems like that would quickly result in situations where legitimate uses aren’t allowed anymore, just because the system isn’t able to proof that they are ok.

Thinking a bit more about it, I think something that could potentially make sense for zig, still feel ziggy and be helpful, without preventing valid programs, would be some kind of language feature to essentially be able to express that some subset of the program is somehow isolated from the rest of the program and can’t escape that box, I guess you could say sandboxing as a language feature. But I don’t know how easy or difficult it would be to implement that, I think so far it seems that wasm is being planned on being the solution for sandboxing needs, but it could be interesting to have a sandboxing feature that can be used without needing to know the underlying method how that is achieved. And with sandboxing you often want some kind of limits on cpu and memory, from my standpoint that would be the more appropriate way to get safe memory limits, rather than micromanaging how you can use allocators.
(But I am also not sure whether we really need a sandboxing feature as a language feature, haven’t thought that much about it, maybe it is actually better if you are forced to think about what your sandboxing needs are and in which different ways those could be implemented, do you need arbitrary programs, or just a subset of some simple language that can be proven to be safe and thus be compiled to save program, do you want to use wasm, write an interpreter for some custom lang, compile some subset to a plugin/jit, use docker etc… The more I think about it, I feel like it would be great to have different kinds of libraries that implement sandboxing for zig programs in different ways and then you can choose which one you want, even better would be if they could have similar apis. In the end I am just left with the question what is your actual goal and is whatever you are doing worth it to achieve that? I think the answer to that is heavily dependent on the needs of the project.)

That said if you want to go the managing allocators route, you probably still could put a lot of tracing code in the root allocator, or maybe even hack the compiler to also add some kind of monitoring for all fixed size buffers above some arbitrary size (to find buffers that may be used with fixedbufferallocators or equivalent things). But I think all of that would probably be code that adds project specific monitoring that is more heuristic based like @IntegratedQuantum said.

dee0xeed · November 30, 2023, 10:28am

Sure, nothing can stop me from doing it like this if I really want/need this. I started this topic because it is sometimes (always?) stated that underhanded allocations are sort of impossible, for instance:

No hidden allocation: Nothing allocates on the heap, without you knowing it and letting it happen. Zig utilizes the Allocator type to achieve this. Any function that allocates on heap receives an Allocator as parameter. Anything that doesn’t do so won’t allocate on heap, guaranteed.

It’s just not true, no guarantees at all and that may misinform people.

avestura · January 10, 2024, 5:11pm

@dee0xeed I’ve edited the blog post, hope it clears things up.

dee0xeed · January 10, 2024, 6:10pm

Ok, thanks for your attention.