Return required deinit / destroy functions from corresponding init / create functions

I have just been getting my feet wet with Zig, so apologies if this is something that has been hashed out and rejected already. One criticism I have seen of Zig is that a common pattern is to have some init or create function that requires some corresponding deinit or destroy function call. The only way that way that library writers tend to communicate that the latter should be called to library consumers is through doc comments, or through the expectation that this convention is followed, and for the consumer to expect some function to exist and to look for it.

This seems like a reasonable criticism to me, since forgetting to deinitialize things that require it is a very common problem in general. I realized the Zig language already supports a feature that mitigates this, it is just not being used in the standard library (and thus the convention is to not use this in community libraries). If functions that require a corresponding cleanup function call simply return that function, callers would be required to either explicitly ignore that return value to get the current behavior, or capture the return value and use it (probably a deferred application).

For example, I propose changing the function signature of std.mem.Allocator.alloc from

alloc(self: Allocator, comptime T: type, n: usize) Error![]T

to something like

alloc(self: Allocator, comptime T: type, n: usize) Error!struct { []T, fn () void }

and instead of calling to free and passing the return value of alloc, you would be encouraged to capture both values of the tuple, and defer a call to the latter.

const foos, const freeFoos = try allocator.alloc(Foo, 5);
defer freeFoos();
// use foos

In this example, freeFoos would be a closure over foos that calls allocator.free on it. The current behavior could still be achieved by just ignoring the second value of the tuple.

const foos, _ = try allocator.alloc(Foo, 5);
defer allocator.free(foos);
// use foos

Sorry if the syntax isn’t exactly correct, I’m on mobile and haven’t actually tested this :stuck_out_tongue:

Also FWIW, one place I have seen this pattern before is with cancelable contexts in Golang. See the collapsed example under the WithCancel documentation here: context package - context - Go Packages. Go is another language that requires return values to be handled (at least when one or more of them are handled, like the ctx value in the example), and requires the cancel value either be used (applied) or ignored (assigned to _, just like in Zig). It seems like this would be a reasonable pattern to use all over the standard library, and thus encourage community libraries to do the same. Thoughts?

1 Like

In practice it is not as big of a deal as you may think. Most of the time the worst thing that can happen is that you leak some resources which may even be desirable in certain situation.

Furthermore this problem is already basically solved with the DebugAllocator (formerly GeneralPurposeAllocator) which does leak checking with stack traces in debug mode.
All you have to do is test the function and you know that there will be no major leaks.

This is problematic in many ways:

  • Zig has no closures (and this definitely has been discussed many times), so you’d have to implement this behavior yourself for every function that uses this.
  • This solution is rather inefficient. Encouraging people to store extra runtime data (a function pointer and an element pointer) for every allocation and calling the deinit function through a function pointer (→which cannot be inlined most of the time) is a bad idea.
  • This doesn’t even cover all of the common use cases. E.g. what if you realloc the memory? Now your deinit closure is freeing the wrong thing. Oops
  • It goes against the zig zen: “Only one obvious way to do things.” Now you’ve got two ways to free memory. Which one do you choose when? These are not questions that I should not have to worry about.
  • It only solves the problem for local allocations (things that you’d free with defer), which is the one place where you won’t forget to free it as soon as you got used to this pattern. The more complex are not helped by this pattern.

This solution may make sense in Go, where having to manually deinit things is a rare occurence, but in Zig where it’s the norm to deinit things it would just get in the way more than it would help in my opinion.

9 Likes

Thanks for the reply. I’ll address each of the points:

Unintentional resource leaks seem like a pretty bad thing to me. This pattern would permit resource leaking (because, as you say, there can be reasons to want to do it), but would require you explicitly express that is what you are doing (throwing out the deallocation function with _).

This follows the first bullet of zig zen more closely than the status quo: “Communicate intent precisely.”

You’ve solved memory leaks for well tested functions; what about other resources? File descriptors? Network sockets? This pattern would apply to any resource allocation that requires a future deallocation.

This seems fairly trivial, no? In the case of alloc, it would be something like this:

pub fn alloc(self: std.mem.Allocator, comptime T: type, n: usize) std.mem.Allocator.Error!struct { mem: []T, free: fn () void } {
    const mem = self.allocAdvancedWithRetAddr(T, null, n, @returnAddress());
    return struct { mem, struct {
        pub fn free() void {
            self.free(.mem);
        }
    }.free };
}

Though ideally this would be implemented inside allocAdvancedWithRetAddr, and then the free would bubble up from there. Also, most things wouldn’t have to follow this closure pattern at all - anything that follows the init()/deinit() pattern where the latter accepts no arguments would be able to just return .deinit directly.

I am definitely not an expert on the effeciency implications of this. I figured most of the time this could be inlined to the same output as the current pattern. I’d like to better understand why it can’t be if you’re able to share more insight.

Will need to think more about this case.

Removing free from the allocator interface would solve this. Now you’re back to one way to free.

I don’t think this is true. Anything that allocates resources would follow this pattern. If you allocate resources that need to be free’d elsewhere, you would return a function that frees that resource. This includes calling the functions that were passed to you to free resources that you allocated.

I repeat @IntegratedQuantum’s statement: Zig doesn’t have closures

Also the syntax isn’t correct at all, it is just pseudo code, you can’t just capture variables with runtime control flow in comptime function body types.

I suggest you write some more Zig code, before making up new idioms for Zig, I think that way you would discover that this isn’t well suited to how Zig works.

Zig discourages the higher order functional programming style where lambdas and closures are passed around everywhere, that is one of the reasons why closures aren’t a feature of the language and there isn’t short syntax for lambda expressions.

4 Likes

They can’t cause security vulnerabilities, they are usually easy to find and easy fix. And the worst thing they may do is exhaust your resources causing a crash after a while.
So what exactly is bad about them in your opinion?

The same sort of leak detection could (and in my opinion should) be applied there too.

Well sure, but adding the same 5 lines of code to many standard library functions seems to obfuscating the actual function a lot. And being able to actually read the standard library is a pretty strong feature of Zig.

In any case where you don’t clean it in the same function the compiler will be unable to prove that this function pointer always points to the same function.

Another example would be an ArrayList of items. With your approach you have to have a separate list with all the free function closures for all items. This is not something the compiler can just optimize.

Ah, but then you forbid any manual optimizations, and make me always pay the extra runtime cost.

Well this only works if all functions (including the ones from the user) follow your convention. That seems like a pretty bold assumption, given that it isn’t enforced by the compiler.

1 Like

I’m against any language feature that encourages programmers to imagine the destruction process as the construction process in reverse. That’s a brain-dead sort of approach. In the real world we don’t destroy things that way. To reclaim space used by a hotel we just stick some dynamite into the building and set it off. We wouldn’t painstakingly uninstall every toilet and unscrew every light bulb.

9 Likes

Agreed! I recently read this article about the topic and found it quite helpful to understand it better: Untangling Lifetimes: The Arena Allocator - by Ryan Fleury

6 Likes

If you’re doing deallocation on function exit, the question should be “why is this stuff on the heap in the first place?”. The variable in question is clearly local to the function so it should really be on the stack.

I think there are plenty of reasons to allocate a variable on the heap and then deallocate on function exit. Namely, for variables that are too large for the stack (very large arrays/matrices) or for vars which don’t have a compile-time known size.

But I agree with your general points.

3 Likes

Placing variable-length data structures in the heap is an old habit from C. In Zig, we have allocators that work off stack memory, with optional fallback to using the heap. The ergonomics are just kind of poor at the moment. It’s area of the standard library that can use some strengthening out.

Why? I suppose because it doesn’t fit well in a language with explicit memory management (no GC). Any other reason, perhaps performance related?

Here’s a quote from a GitHub issue comment Andrew made a couple years ago:

In Zig, using functions as lambdas is generally discouraged. It interferes with shadowing of locals, and introduces more function pointer chasing into the Function Call Graph of the compiler. Avoiding function pointers in the FCG is good for all Ahead Of Time compiled programming languages, but it is particularly important to zig for async functions and for computing stack upper bound usage for avoiding stack overflow. In particular, on embedded devices, it can be valuable to have no function pointer chasing whatsoever, allowing the stack upper bound to be statically computed by the compiler. Since one of the main goals of Zig is code reusability, it is important to encourage zig programmers to generally avoid virtual function calls. Not having anonymous function body expressions is one way to sprinkle a little bit of friction in an important place.

Finally, I personally despise the functional programming style that uses lambdas everywhere. I find it very difficult to read and maintain code that makes heavy use of inversion of control flow. By not accepting this proposal, Zig will continue to encourage programmers to stick to an imperative programming style, using for loops and iterators.

5 Likes