Which arguments for deinit (or other invalidating functions)?

Thanks for clarification. What I tried to say is that a caller of a function that takes an argument by value (x: T) can rely on the value not being changed (disregarding what happens under the hood).

For functions that invalidate an argument, this guarantee is somewhat meaningless: It doesn’t matter if the argument gets changed or not, as it’s invalid anyway.

Now if I understand right, the optimization of internally using a reference (Parameter Reference Optimization, “PRO”) can be done in less cases than originally believed (see #5973).


Now I brought up a lot of hypothetical questions. But what do I do in practice with current-day Zig?

So what should I actually do when making my own deinit function?

  • Pick self: T or self: *T depending on whether the function’s implementation needs to modify the state of the struct before the struct is rendered unusable?
  • Pick self: T or self: *T depending on the size of the struct?
  • Pick self: T or self: *T based on semantical questions?
  • Always pick self: T?
  • Always pick self: *T?
  • Always pick noalias self: *T
  • Decide by personal preference and don’t care so much?
  • Wait till there is a consensus regarding what std will do and follow that practice, and, until then, do as I please?

Maybe that question was answered above:

Did you mean, “If you can deinitialize somthing which is constant, then pass by reference value […]”?

I assume you did. But then this leaves me with some questions yet, as I can always deinitialize something that is constant, even if I need to mutate it, can’t I? This is because the contract of a deinitialization function is that I won’t use the value anymore. So I can just write something like var this = self; and then do my work, because:

Maybe what you mean is that if I don’t need to mutate the structure or a copy of it(!), then I should pass by value or, if the struct is big, pass by immutable reference?

So in other words: The function’s signature should depend on my implementation’s needs and the size of the struct then?

But you also say:

Should I like/want to set it to undefined? What’s best practice here? If I always want to do it (is it justified to want it?), should I always use self: *T then? There seemed to be a consensus that this isn’t always the best way, so maybe you can understand my confusion.

Also, if there are reasons to use self: *T, are there any reasons against using noalias self: *T (other than that it is longer and doesn’t feel idiomatic)?

My final question: Is there currently any indication or plan of how std will handle this in future?


Addendum: Re-reading the issue comment I cited here…

…I wonder if I misunderstood, and perhaps he meant the exact opposite: indicating openness to move from self: T to self: *T (and not the other way around). The grammar is a bit ambiguous (at least to me as a non-native speaker). :man_shrugging: @andrewrk, maybe you can clarify.

Type systems exist to enforce certain invariants. Programs always have more invariants than a type system will express, and languages differ in what invariants become part of the type system. Rhetorically, I’m going to treat program invariants and type invariants as disjoint, although it’s more accurate to treat the latter as a subset of the former.

You’re approaching ‘validity’ as though it were, or should be, a type invariant. In Zig, it isn’t. Not accessing an instance after deinitializing it is just a good idea, often this essential for program correctness but by no means always. I would question code which used an allocator field on a de-initialized instance, it seems simpler to treat “don’t read from anything after deinit” as a program invariant. Make a copy of the allocator first instead. That kind of thing.

But more seriously, and I’m afraid that until you understand this, you will make much less sense than you think you are making: you are ignoring an important type-level invariant in Zig.

A var and a const do not have the same type in ways which matter. Observe:

var u_var: usize = 0;
const u_ptr1 = &u_var; // Type: *usize
const u_const: usize = 0;
const u_ptr2 = &u_const; // Type: *const usize

It’s quite uncommon for a composite type to be valid both as a const and as a var, for this reason. Zig’s type system rigorously enforces this, because it doesn’t allow you to leave a var unmutated. The uncertainty you’re basing your rather long posts around simply does not exist.

This single fact answers everything you’ve been overthinking in this thread. I urge you to use Zig more, and think about it less until you understand it better.

2 Likes

I understand this as: “validity” is more a matter of convention. This is, at least, what I read in Thread.detach’s docs, for example:

Once called, this consumes the Thread object […]

This is not reflected in the type system. That is okay, and I’m willing to adopt to that style of programming. I feel like Zig is like this in a lot of places, and actually it’s one of the reason why it feels good to me: not overcomplicating things. (Yeah, my posts might indicate otherwise.)

I hope I got this right.

I didn’t want to say it was good style or an aim to exploit these possibilities. I just tried to analyze the theoretical possibilities/boundaries, in order to get a better understanding.

Now I’m a bit lost, indeed. Until now, I thought a type was something like u8, u16. Maybe it’s a matter of terminology. If I understand right, then const u8 would be a different type than u8? Maybe you could see it that way, but I guess that depends on the definition of “type”?

I tried:

const T = const u8;

But I get:

type.zig:1:11: error: expected expression, found 'const'
const T = const u8;
          ^~~~~

But I see how const u8 could be seen as a type in a certain context, e.g. what you can obtain when dereferencing a *const u8 pointer. Until now, I didn’t use the terminology “type” for that, but maybe that’s a more academic understanding of “type”, which can be helpful in this context.

I hope I didn’t get that fully wrong.

What do you mean with “composite type”? A non-primitive type, like a struct? And what does “valid” mean here? I lack context (or knowledge) to follow. Can’t I always coerce a u8 into const u8? I mean: I can always do read access on a non-const u8 (i.e. use it in places where it can’t mutate) as long as there is
at least one write access. Example:

const std = @import("std");

pub fn main() void {
    var x: u8 = 1;
    const y: u8 = 2;
    // Now I can use `x` in a place where no write access is allowed:
    const z = x + y;
    _ = z;
    // As long as there is some write-access elsewhere:
    x = y;
}

So it’s clear what to do? Maybe you mean that it depends on semantics, i.e. pick deinit(self: T) if the struct T is, from a semantical point of view, a constant, and pick deinit(self: *T) if the struct T is, from a semantical point of view, something that usually mutates?

And this is why a thread handle or file handle gets methods like Thread.detach(self: Thread) or File.close(self: File), respectively?

So maybe I was seeing things in a too technical fashion then?

Does this then mean:

I feel like it’s not really what you tried to say, is it? :unamused_face:

Now why do I feel reminded of martial arts training? :thinking:

Maybe I need to program more practical examples to get a good “feeling” for when to use what, but I guess I’m too analytical to stop trying to understand certain patterns. And I suck at martial arts.

In my defense, I think I’m not the only person who’s puzzled about this, and given the various issues on GitHub that circle around these questions, I think my uncertainty isn’t that far-fetched.

Thanks so far for your hints/guidance.

FWIW, this is my current answer for this topic (haven’t read the entire thread, though). If you personally think self.* = undefined; is worth doing in your deinit functions, do it. Otherwise, don’t.

(personally, I typically don’t, but that might change after debug safety feature: runtime undefined value detection · Issue #211 · ziglang/zig · GitHub is implemented. After that, self.* = undefined; will have more of a noticeable upside in everyday use, but deinit will still just be a convention, so it’ll still be a personal judgement call)

3 Likes

Hm, I don’t think there is much to add to the deinit discussion.

However,

reads to me, like the restrict keyword would give the compiler some promise in a multi-threaded environment or similar? But from everything I have seen so far, the restrict and noalias keywords are only about aliasing of several pointers. So, if you only have a single pointer in a function as in deinit(self: *T), you are not promising the compiler anything new by adding noalias (except maybe you are accessing some container level pointer in deinit() :scream:).

Thanks for sharing, it makes me feel like some matters are still in flow and subject to change, people try out their own styles, and there hasn’t been a specific/unique coding style evolved yet that I should apply to myself.

It’s a bit contradictory, however, to:

But perhaps some things just take time.

Maybe I have to admit that I don’t actually consider doing it because it’s “worth it” but because I would like to learn good coding style (in Zig) and understand the language and paradigms more.

For me, Zig (and my ability to use it) is in a state where I play and experiment with it. I don’t intend to use it productively as of yet. I expect a lot of std to break in the coming months (especially with async/await and the new I/O system coming up, which is great).

In the future, I also would like to distinguish between the language and its implementation. I know Zig isn’t there yet as there is no official language definition that is independent of its (reference) implementation, right? (Side note: Rust just recently got a normative(-ish) reference in April 2025, 10 years after their 1.0 release.)

So for me it doesn’t matter so much whether “Zig does respect self.* = undefined; assignments”, but more:

  • whether Zig will possibly move to where it matters in Debug mode in the future
  • or whether it is idiomatic to use it (even if it’s a no-op).

I personally (from my current understanding) would like to see it being idiomatic, but I’m not the one to decide it. Also, if it was idiomatic, it might have a lot of unforseen consequences: Should we write allocator.free(&slice) instead of allocator.free(slice) in the future? I doubt this would happen, and it might go against the spirit of the language to keep things simple.

As a demonstration (note: not idiomatic!):

const std = @import("std");

pub fn odd_free(alloc: std.mem.Allocator, ptr_or_slice: anytype) void {
    alloc.free(ptr_or_slice.*);
    // The following marks the slice, not the memory pointed-to by the      
    // slice, as undefined:
    ptr_or_slice.* = undefined;
}

pub fn main() !void {
    var da = std.heap.DebugAllocator(.{}).init;
    defer _ = da.deinit();
    const alloc = da.allocator();
    var my_slice: []u8 = try alloc.dupe(u8, "Hello World!");
    // Note the `&` here:
    odd_free(alloc, &my_slice);             
}

So maybe all this self.* = undefined; is very much against the spirit of Zig? I don’t know! But I’d hope to see an answer to that question some day, which makes me stop wondering in each singular case.

1 Like

As far as I understand, it doesn’t just apply to multi-argument calls. Maybe that’s what you mean with container level pointers?

Let me give an example:

const std = @import("std");

const A = struct {
    i: i32,
};

var global_a = A{ .i = 5 };

pub fn foo() void {
    global_a.i += 1;
}

pub fn bar(noalias a: *const A) void {
    const i = a.i;
    foo();
    std.debug.assert(i == a.i);
}

pub fn main() void {
    bar(&global_a); // this is illegal, considering `foo` uses it!
}

Here, bar expects that the data pointed-to by the argument a will not be accessed through any other means (here by the foo function, which has side-effects). Thus, even if a is a const pointer, the memory changed. That is illegal.

So how does this relate to deinit and other invalidating functions?

When I call deinit(x), then I usually promise (not necessarily enforced by the type system, but by convention) that I won’t use x anymore. Let’s try to formalize this a bit:

“Calling deinit(x) means that at the moment where I call it, no other function will use x from that point on (except this pass to deinit).”

This means that if I create a pointer from it, by calling deinit(&x) (which is syntactically the same in method-style notation: x.deinit()), then there is already the guarantee that &x has no aliases (i.e. no other pointer refers to x), because if there were, then I would break the convention/contract that is specified in the documentation of deinit.

Thus, it should be safe to write fn deinit(noalias self: *T) instead of fn deinit(self: *T). And as shown above, it may have a meaning also when only a single argument is passed to a function.

At least that’s my current understanding so far.

Yes, your example is exactly what I meant. The questions is, how often does (and should) something like this happen with deinit()?

No, you must only guarantee that x and anything it may point to is not accessed through any other reference within that function. I.e. that something like the access to global_a in foo() in your example is not happening. You are free to call foo() and use other pointers to global_a before and after that call to bar().

Specifying x to be noalias does not have any implications about what should happen outside of the call to deinit() (this would be different if the function had several parameters with pointers). Other pointers referring to x can exist during the call to deinit(). The programmer writing deinit() would only have to make sure they do not use them within deinit() by accident. The programmer using deinit() can access them before and after that call, no matter if the parameter x has noalias or not.

1 Like

Never. And that is my point (that I tried to make in the second post in this thread): It should always be safe to specify noalias self: *T instead of self: *T for a function that “assumes ownership of self and renders it unusable”.

Let’s rephrase it: I must guarantee that x is not accessed through any other reference while the function is running.

So it’s illegal if bar calls foo and then foo can modify the local a thorugh global_a.

Yes, there I would agree.

That depends on what you mean with “outside” and “inside”. Nobody should access x though other pointers while deinit is running (whether through some call that deinit makes, which then uses a global variable, or whether by another thread).

And as I said above: It’s reasonable to assume that this doesn’t happen, so it should be okay to pass noalias self: *T to a deinit function.

But this doesn’t have necessarily anything to do with deinit getting more than one pointer (at least not the way I understood it so far).

I would disagree insofar that specifying noalias is a constraint for the caller, not for the callee! The caller must ensure that there are no aliases used while calling the function. That, of course, may depend on whether that function does things like accessing global state, whether there are threads running, etc. But in the end, it’s a responsibility for the caller to ensure that there are no aliases that could be used. (Again, that’s just my understanding here.)

Yes, here I agree.

this is where I disagree. The concept of aliasing has nothing to do with threads.
It is ok to add noalias, but you do not gain anything either in this case.

Well, I’m not sure, but the way I see it is:

Omitting restrict (or noalias) doesn’t give you thread safety, you still need synchronization.

But the means by which a value behind a restrict (or noalilas) pointer is modified, doesn’t matter: It’s illegal to modify the value in any other way than going through the pointer (while the function is running).


Of course, if the function doesn’t call any other function with side-effects and doesn’t access global state (and doesn’t perform any synchronization), then you’re right, and noalias effectively does nothing, I guess, if the function has only one argument. So maybe it’s not necessary to specify it for simple functions as the compiler could deduce it on it’s own? (But this is where I really don’t know how smart compilers are.)


A (toy) example where another thread might use an alias:

var mutex = std.Thread.Mutex;

const T = struct {
    counter: i32,
    pub fn deinit(self: *T) void {
        mutex.lock();
        const i = self.counter;
        mutex.unlock();
        _ = std.Thread.yield(); // another thread could modify self.counter
        mutex.lock();
        std.debug.assert(i == self.counter); // may fail if other thread uses alias
        mutex.unlock();
    }
};

Of course, you’d never do something like this in a real-world deinit function. But that’s not the reason why I think deinit could receive self: *T with noalias. The reason for noalias is that there should be no other (used) pointer to the deinitialized value by the time deinit is called.

yes, if you do not access global state and only have a single pointer as parameter, you cannot have aliases to that pointer. I maintain that thread synchronization is an orthogonal problem. I never saw a function with a single restrict qualified pointer as parameter.

We definitely should not make the user of every function, that has a restrict/noalias in their call signature, responsible for checking whether the function internally might alias their given parameters through some global state. Instead the writer of the function should make this “impossible”. The user should only be responsible for what they are passing in.

1 Like

Yeah sure, but that’s not the point here. The question was:

  • Can I use deinit(noalias self: *T)? I would answer that question with: Yes. That is because whatever deinit does, there should be no alias (neither passed through other arguments nor by a global state that could be accessed somehow).
  • Does it have any advantage to do it (edit: when compared to the variant without noalias)? I don’t know enough about compiler optimization to answer that question.

In particular, I also don’t know if there are cases where noalias self: *T is less performant than self: T (pass-by-value), but probably there are.


But all that has been derailing the discussion a bit (no offense, that is fine, at least to me). The whole point of this thread was: “Which arguments for deinit”. And my current answer is:

  • We don’t know.
  • It depends.
  • Maybe it changes when #211 is implemented.
  • Do what you feel best with.
  • Don’t think too much.

(No offense to anyone for sharing their honest opinions/feelings on that matter. I do appreciate! Yet I hoped there was an easier/clear answer to that question.)


Regarding

  • We don’t know.

I would like to cite @andrewrk here, when responding to the proposal to make File.close take self: *File:

I don’t have a conclusion or any guidance at this time. I’m focused on other parts of the project and am postponing these API consistency considerations. I think we have other priorities currently and the resolution to this problem will reveal itself once we focus on it.

This is from May 2022, and I think it will still take some time yet until these things are settled.

2 Likes

I have been rethinking about this in my sleep (seriously), and I came to a (personal) conclusion as of yet. Maybe I’m right, maybe I’m wrong, and certainly it’s not up to me to decide the future of the standard library and how idiomatic Zig looks like. But nonetheless I would like to share my own insight on this matter.

TL;DR: Always use deinit(self: *T) (possibly with noalias). And the standard library, including the allocator interface(!), should do the same for invalidating functions. (Don’t bash me too hard, please.)

Edit: I’m not so sure anymore, but maybe you still like to read this post till the end, to see the genesis and counter-argument in the end.

Now this is is likely not going to happen (citing Andrew from the same comment back in May 2022: “I don’t want to make this change to the allocator interface.”). But for the sake of letting us understand the underlying problem better, I would like to justify my viewpoint in the following.

(Important side note / Disclaimer: If the standard library doesn’t change, then idiomatic Zig might not look like how I describe it here. Consider it a mere thought experiment then, instead of seeing it as practical advice on how to write idiomatic Zig.)

I think we have a few conflicting paradigms here.

First of all, let me cite some of the Zig Zen:

  • Communicate intent precisely.
  • Only one obvious way to do things.
  • Runtime crashes are better than bugs.
  • Compile errors are better than runtime crashes.
  • Reduce the amount one must remember.
  • Focus on code rather than style.

Now somewhat rephrasing the last point and combining it with what I perceived in this thread, I would dare to say:

  • Don’t make things too complicated.
  • Zig is a practical language: it’s gotta work in the end.

Now taking the last three points, the answer to “Which arguments for deinit” would be:

It depends. For small structs and slices: use pass-by-value. For big structs or values that will be mutated during deinitialization: use pass-by-reference. Furthermore, it may depend on semantics: If I have an (otherwise) constant handle that just gets “invalidated”, then make it a const and use pass-by-value. If it’s some sort of mutating structure, then make it a var and use pass-by-reference, even if this distinction technically doesn’t matter after deinitialization. Just follow your instincts.

And this is what Zig currently does and, which I assume, is idiomatic Zig as of now.

But then someone (not me, I just learned about it) came up with the idea of

self.* = undefined;

which demands making our struct variable and allowing the deinit-function to obtain a mutable reference. And this perfectly aligns with all the following paradigms:


Communicate intent precisely.

Whether this is a no-op or not, to the reader of the source, the intent will be clear. Even the function’s interface makes the intent clear in some way: Don’t expect the argument to be as it was, after you call the function.

Admittingly, if self.* = undefined; was a no-op, then technically it doesn’t mutate, but both from a semantical and practical point of view, it does: It is effectively undefined.

Only one obvious way to do things.

Life would be so easy if we always use mutable pointers when things get invalidated. No more thinking.

Runtime crashes are better than bugs.

Assuming #211 gets implemented some day (maybe even by a different compiler for the same language in 10, 20, 30 years from now), then this helps us catch bugs.

Compile errors are better than runtime crashes.

Assuming even more smart compilers, we might get a compile-time warning if we access a variable that has been passed to a function that has invalidated the value.

Reduce the amount one must remember.

Again, life would be easy if we could follow this simple rule of always using deinit(self: *T).


Did I forget something? Maybe. As an Advocatus Diaboli, I would like to take a look at a few more counter-arguments to my idea:

  • It seems to be semantically wrong obtaining mutability in a deinit function when I don’t actually need to mutate the structure in order to invalidate or release it. Well, that is true, but seems to be more of a semantic issue. Besides, if we consider the invalidation process as some sort of mutation, we could argue that the value does undergo a mutation (which is even expressed in the language by setting self.* = undefined; in the end).

  • It is inconsistent with the current allocator interface and, as such, would demand deep changes in Zig. Yes, but I’d (personally) like to see that happen rather sooner than later (before Zig’s standard library is stable, as this could be a real pain).

I hope sharing my thoughts on these matters won’t result in getting bashed (I’ve seen that on other forums a lot). I understand that this is (likely) a highly controversial topic (maybe even a frustrating one, depending on where you’re coming from), but I also hope that there is enough openness here to at least discuss these issues from a technical and semantical (or even philosophical) point of view. And I also don’t expect Zig to change in the way I like. I know that different people have different ideas on what’s the best way to go. So thank you all for your attention and/or reasoned participation in this discussion.

And my apologies if this post was (once again) a bit long. However, I think it’s worth saying that (I believe) the nearby answers to this question may be misleading, and that this topic really deserves some deep thinking and considerations here.

P.S.: If I missed (or misunderstand) something, then please let me know. In particular, I’d be interested in knowing if there are good reasons against a different allocator interface, i.e. one that sets a slice or pointer to undefined after releasing memory.

P.P.S.: Maybe one reason against a different allocator interface could be that the invalidation is incomplete anyway: I may create a copy of a slice or pointer, or obtain a subslice (NOTE: I mean without copying the data it refers to), then deallocate, thus invalidating my original slice/pointer, and the (const) copies of that slice/pointer are effectively invalidated also, even though they are const and not set to undefined. So the whole self.* = undefined; mechanism is a half-baked thing anyway? :man_shrugging: Demonstration:

const std = @import("std");

pub fn odd_free(alloc: std.mem.Allocator, ptr_or_slice: anytype) void {
    alloc.free(ptr_or_slice.*);
    // The following marks the slice, not the memory pointed-to by the
    // slice, as undefined:
    ptr_or_slice.* = undefined;
}

pub fn main() !void {
    var da = std.heap.DebugAllocator(.{}).init;
    defer _ = da.deinit();
    const alloc = da.allocator();
    var my_slice: []u8 = try alloc.dupe(u8, "Hello World!");
    const my_subslice: []u8 = my_slice[0..5];
    std.debug.print("{s}\n", .{my_subslice});
    // Note the `&` here:
    odd_free(alloc, &my_slice);
    // Now `my_slice` is a `var` and set to `undefined`,
    // but `my_subslice` is not. That's odd, indeed.
}

So reading the arguments in #6322 and #9814 gives some more insight in this issue as well. Perhaps it really matters on the nature of the struct (or datum), and whether it’s considered to be a const handle, or a var structure. Some structs could be seen as being “handles” that are usually copied.

It’s still difficult for me to grasp the difference here. But I feel like it’s (at least in part) semantics that matter (even though switching from pass-by-value to pass-by-reference, or vice versa, may have an impact on performance). Which brings me back to:

  • It depends.

And that brings me back to believe that things aren’t actually as bad, and that the allocator interface is fine (even if some things could be made a little bit more consistent in the standard liberary, but that’s not really an urgent matter now).

Anyway, some (future) guidance on this topic would be helpful, as it’s really difficult for me (as a newbie) to pick the right way in each case. Perhaps thinking less helps, but I’m not sure. I feel like other people struggle with this issue as well.

And my apologies if I was too quick with my critique :folded_hands:. I get the feeling that the people who work on this are going the right way and have a good intuition. Keep up the good work. :+1: