Convention for init / deinit

kudu · June 25, 2024, 11:59am

A few months ago I doscovered Zig and I like it. I started to learn, also by following the posts on Ziggit.

I want to iterate over the lines of a text in memory where the lines can end with LF, CR or CRLF, but not with LFCR. So, I wrote an iterator:

pub const LineIterator = struct {
    buffer: []const u8,
    index: usize = 0,

    const Self = @This();

    pub fn init(buffer: []const u8) Self {
        return .{ .buffer = buffer };
    }

    pub fn next(self: *Self) ?[]const u8 {
        if (self.index == self.buffer.len) return null;
        const slice = self.buffer[self.index..];
        const n = for (slice, 0..) |c, j| {
            if (c == '\n' or c == '\r') break j;
        } else {
            self.index += slice.len;
            return slice;
        };
        self.index += n + 1;
        if (slice[n] == '\r' and slice.len > n+1 and slice[n+1] == '\n')
            self.index += 1;
        return slice[0..n];
    }
};

What is the convention for init / deinit ?
Is it okay if my struct has an init but no deinit? Or should I rather replace init by a function outside of the struct ? (like e.g. std.mem.SplitIterator)

dimdin · June 25, 2024, 12:25pm

You can have init, deinit, both of them or none.

init initializes a struct, when its initialization is not trivial.
```
pub fn init(allocator: Allocator) Self
```
init might return Self or !Self when errors are possible.
deinit releases resources used by struct.
```
pub fn deinit(self: *Self) void
```
deinit can have extra arguments required to perform its job
```
pub fn deinit(self: *Self, allocator: Allocator) void
```

Yes, it is fine. deinit is used to release struct resources, if there is nothing to release there is no need for deinit.

It is a valid option, especially if the function outside of the struct have something that you need to pass to init.

dude_the_builder · June 25, 2024, 12:32pm

This is a very practical guideline. I usually don’t add an init if all you have to initialize is just one field (like in the Iterator):

var iter = Iterator{ .buffer = &array };

Although not a convention that I know of, I’ve also seen the initialization function named new when there’s no deinit.

var iter = Iterator.new(&array);

kudu · June 25, 2024, 12:44pm

Thank you both for the clarification (and for the videos @dude_the_builder)

mnemnion · June 25, 2024, 4:18pm

The question you want to ask is: what scenarios need to be supported?

If the LineIterator will always be used with a stack-allocated buffer, or if it’s always ok to create a buffer on the heap and then defer a free of the buffer, there’s no need for a separate deinit function.

But if you want LineIterator to work with a heap-allocated buffer, and you want to return LineIterators from a function, you’ll probably want both an init function which receives a number of bytes and an Allocator, and also a deinit which takes an Allocator and uses it to free the underlying buffer. It’s a bad habit to make calling sites keep track of what fields are pointing to heap memory and then free those fields manually. But again, if the buffer itself is always either an array on the stack, or exists as its own variable in scope which can be handler with defer free(buffer);, then there’s no compelling reason to have a deinit.

If the LineIterator might even end up somewhere where the original Allocator isn’t available, now you might start thinking in terms of having a LineIteratorUnmanaged, with a LineIterator which has a LineIteratorUnmanaged as a field, as well as a pointer to the Allocator which allocated the buffer. This doesn’t look to me like a data structure where that would be very helpful, but you might consider the middle ground with an init and deinit function, so that the LineIterator doesn’t have to be consumed in the same scope in which it’s created.

AndrewCodeDev · June 25, 2024, 11:03pm

I’m going to play bad-cop here a little bit and say that in general, iterators shouldn’t be in the business of allocating or managing memory. They should be focused on traversing through a data structure that handles memory itself that can spawn iterators to said memory.

mnemnion · June 25, 2024, 11:59pm

That’s a reasonable meta, sure. I meant it more generally than that, as when one should or shouldn’t have a deinit function. I agree that it should be rare for an iterator to survive longer than the while loop that consumes it. And for this application it doesn’t make sense for the iterator to be doing allocation in any case, since the data is already going to exist.

A follow-up observation: I wouldn’t call a []const u8 .buffer, which generally means something which is going to temporarily hold data in a mutable way. Someone reading your code like, say, me, might mistake the purpose of that field on this basis. .text or .lines will make it clearer what’s going on.

I’d say that most structs which have a .buffer are going to want a way to free that buffer’s memory. But you’re right, an iterator shouldn’t (generally) be passed around between functions, or be responsible for the memory it’s traversing.

kudu · June 26, 2024, 7:21am

Thanks for your helpful replies. I understand that the buffer must not be freed or go out of scope as long as the iterator is used, or or even any of the lines it returned.

I think this also applies to the std.mem.SplitIterator which I tried to use first:

const text = "a\nb\n";
var iter1 = std.mem.splitAny(u8, text: "\n\r");
var iter2 = std.mem.splitSequence(u8, text, "\r\n");

But I needed a combination of these two delimiter types.
The field is called .buffer bacause I used the code of SplitIterator as a “template”. But you @mnemnion are right, I will rename it to .text (for u8).

Thanks again for your help.

AndrewCodeDev · June 26, 2024, 7:45am

If you happen to need something more complex than this, I’ve got some regex iterators (both for splitting and matching) that you might like: GitHub - andrewCodeDev/Fluent: Fluent interface for REGEX, iteration, and algorithm chaining.

mnemnion · June 26, 2024, 4:15pm

This is understandable, because the std.mem functions are type generic, so they need a generic word for the contiguous memory they operate on.

It would be a useful convention, I think, if .buffer always meant []T (including specific T, such as u8, when that makes sense) and .region for []const T.

But I don’t think it’s all that important. If we zoom out a bit and look at the whole standard library, many chunks of memory are starting life as a mutable, then being passed as constant to various functions, so a case could be made for using a consistent name for both.

g41797 · July 2, 2025, 7:36am

Usually deinit returns void
Is it big NO-NO to return error if something wrong happened?

swenninger · July 2, 2025, 7:51am

The Zig Zen states

[...]
Resource allocation may fail; resource deallocation must succeed.
[...]

dimdin · July 2, 2025, 8:09am

It is a good practice to design your API so that deinit always succeed.

There are cases that deinit can fail and you cannot handle the failure there.
In this case you can return an error forcing the caller to handle the failure condition.
Usually deinit is called using defer to handle an error condition and cannot use return or try.
The options are:

to ignore the error defer _ = deinit();
to panic defer deinit() catch @panic("...");
to catch and handle somehow the error defer deinit() catch std.log.error("...", .{});

mnemnion · July 2, 2025, 4:50pm

Hard agree.

I don’t think this name should be used for a fallible function. Call it something else like finalize, or whatever makes sense (for a BufferedWriter this is flush, for example).

Maybe even stick to the fallible operations within that function, and do the infallible things afterward, in a deinit, so it can be deferred.

It’s not good when functions with conventional names do things which the name indicates they won’t. A deinit with any return type other than void is one of those, if it’s an error union, especially so.

There is at least one exception, the DebugAllocator returns an enum from deinit. A rule of thumb, rather than a law of nature.