I have been rethinking about this in my sleep (seriously), and I came to a (personal) conclusion as of yet. Maybe I’m right, maybe I’m wrong, and certainly it’s not up to me to decide the future of the standard library and how idiomatic Zig looks like. But nonetheless I would like to share my own insight on this matter.
TL;DR: Always use deinit(self: *T)
(possibly with noalias
). And the standard library, including the allocator interface(!), should do the same for invalidating functions. (Don’t bash me too hard, please.)
Edit: I’m not so sure anymore, but maybe you still like to read this post till the end, to see the genesis and counter-argument in the end.
Now this is is likely not going to happen (citing Andrew from the same comment back in May 2022: “I don’t want to make this change to the allocator interface.”). But for the sake of letting us understand the underlying problem better, I would like to justify my viewpoint in the following.
(Important side note / Disclaimer: If the standard library doesn’t change, then idiomatic Zig might not look like how I describe it here. Consider it a mere thought experiment then, instead of seeing it as practical advice on how to write idiomatic Zig.)
I think we have a few conflicting paradigms here.
First of all, let me cite some of the Zig Zen:
- Communicate intent precisely.
- Only one obvious way to do things.
- Runtime crashes are better than bugs.
- Compile errors are better than runtime crashes.
- Reduce the amount one must remember.
- Focus on code rather than style.
Now somewhat rephrasing the last point and combining it with what I perceived in this thread, I would dare to say:
- Don’t make things too complicated.
- Zig is a practical language: it’s gotta work in the end.
Now taking the last three points, the answer to “Which arguments for deinit” would be:
It depends. For small structs and slices: use pass-by-value. For big structs or values that will be mutated during deinitialization: use pass-by-reference. Furthermore, it may depend on semantics: If I have an (otherwise) constant handle that just gets “invalidated”, then make it a const
and use pass-by-value. If it’s some sort of mutating structure, then make it a var
and use pass-by-reference, even if this distinction technically doesn’t matter after deinitialization. Just follow your instincts.
And this is what Zig currently does and, which I assume, is idiomatic Zig as of now.
But then someone (not me, I just learned about it) came up with the idea of
self.* = undefined;
which demands making our struct variable and allowing the deinit-function to obtain a mutable reference. And this perfectly aligns with all the following paradigms:
Communicate intent precisely.
Whether this is a no-op or not, to the reader of the source, the intent will be clear. Even the function’s interface makes the intent clear in some way: Don’t expect the argument to be as it was, after you call the function.
Admittingly, if self.* = undefined;
was a no-op, then technically it doesn’t mutate, but both from a semantical and practical point of view, it does: It is effectively undefined
.
Only one obvious way to do things.
Life would be so easy if we always use mutable pointers when things get invalidated. No more thinking.
Runtime crashes are better than bugs.
Assuming #211 gets implemented some day (maybe even by a different compiler for the same language in 10, 20, 30 years from now), then this helps us catch bugs.
Compile errors are better than runtime crashes.
Assuming even more smart compilers, we might get a compile-time warning if we access a variable that has been passed to a function that has invalidated the value.
Reduce the amount one must remember.
Again, life would be easy if we could follow this simple rule of always using deinit(self: *T)
.
Did I forget something? Maybe. As an Advocatus Diaboli, I would like to take a look at a few more counter-arguments to my idea:
-
It seems to be semantically wrong obtaining mutability in a deinit
function when I don’t actually need to mutate the structure in order to invalidate or release it. Well, that is true, but seems to be more of a semantic issue. Besides, if we consider the invalidation process as some sort of mutation, we could argue that the value does undergo a mutation (which is even expressed in the language by setting self.* = undefined;
in the end).
-
It is inconsistent with the current allocator interface and, as such, would demand deep changes in Zig. Yes, but I’d (personally) like to see that happen rather sooner than later (before Zig’s standard library is stable, as this could be a real pain).
I hope sharing my thoughts on these matters won’t result in getting bashed (I’ve seen that on other forums a lot). I understand that this is (likely) a highly controversial topic (maybe even a frustrating one, depending on where you’re coming from), but I also hope that there is enough openness here to at least discuss these issues from a technical and semantical (or even philosophical) point of view. And I also don’t expect Zig to change in the way I like. I know that different people have different ideas on what’s the best way to go. So thank you all for your attention and/or reasoned participation in this discussion.
And my apologies if this post was (once again) a bit long. However, I think it’s worth saying that (I believe) the nearby answers to this question may be misleading, and that this topic really deserves some deep thinking and considerations here.
P.S.: If I missed (or misunderstand) something, then please let me know. In particular, I’d be interested in knowing if there are good reasons against a different allocator interface, i.e. one that sets a slice or pointer to undefined
after releasing memory.
P.P.S.: Maybe one reason against a different allocator interface could be that the invalidation is incomplete anyway: I may create a copy of a slice or pointer, or obtain a subslice (NOTE: I mean without copying the data it refers to), then deallocate, thus invalidating my original slice/pointer, and the (const
) copies of that slice/pointer are effectively invalidated also, even though they are const
and not set to undefined
. So the whole self.* = undefined;
mechanism is a half-baked thing anyway?
Demonstration:
const std = @import("std");
pub fn odd_free(alloc: std.mem.Allocator, ptr_or_slice: anytype) void {
alloc.free(ptr_or_slice.*);
// The following marks the slice, not the memory pointed-to by the
// slice, as undefined:
ptr_or_slice.* = undefined;
}
pub fn main() !void {
var da = std.heap.DebugAllocator(.{}).init;
defer _ = da.deinit();
const alloc = da.allocator();
var my_slice: []u8 = try alloc.dupe(u8, "Hello World!");
const my_subslice: []u8 = my_slice[0..5];
std.debug.print("{s}\n", .{my_subslice});
// Note the `&` here:
odd_free(alloc, &my_slice);
// Now `my_slice` is a `var` and set to `undefined`,
// but `my_subslice` is not. That's odd, indeed.
}
So reading the arguments in #6322 and #9814 gives some more insight in this issue as well. Perhaps it really matters on the nature of the struct (or datum), and whether it’s considered to be a const
handle, or a var
structure. Some structs could be seen as being “handles” that are usually copied.
It’s still difficult for me to grasp the difference here. But I feel like it’s (at least in part) semantics that matter (even though switching from pass-by-value to pass-by-reference, or vice versa, may have an impact on performance). Which brings me back to:
And that brings me back to believe that things aren’t actually as bad, and that the allocator interface is fine (even if some things could be made a little bit more consistent in the standard liberary, but that’s not really an urgent matter now).
Anyway, some (future) guidance on this topic would be helpful, as it’s really difficult for me (as a newbie) to pick the right way in each case. Perhaps thinking less helps, but I’m not sure. I feel like other people struggle with this issue as well.
And my apologies if I was too quick with my critique
. I get the feeling that the people who work on this are going the right way and have a good intuition. Keep up the good work. 