Should I bother with deinit()?

I recently attempted to write a project in Zig, but was thwarted by this issue in zig-sqlite (I didn’t try very hard).

I ended up writing it in Go (see here), which I think was better for my sanity, but I’d like to at least partially make a Zig version for Edutainment purposes.

As a GC boy, I am new to memory management, and my question relates to whether or not deinit() functions are code clutter when using an Arena Allocator.

The project is a simple, short-lived CLI program that at most has to:

  1. Scan some folders
  2. Decrypt a sqlite db file.
  3. Load the sqlite db.
  4. Perform a handful of basic CRUD operations
  5. Save the db to disk
  6. Re-encrypt
  7. Shutdown

As I was handling some JSON conversion for a field, I ended up writing a few tests which required that I de-allocate memory properly, and it occurred to me that putting deinit() methods on my structs might be a waste of time.

IIUC, when using an arena allocator, not needing to manage lifetimes of structs inside that arena is one of the benefits, and all the calls you make to free are largely just slowing you down (at least in this context).

The way I see it, I have three choices:

  1. Put deinit() methods on everything that needs it, even though they’ll never be called in production. This clutters the codebase with things that aren’t strictly needed, but might make writing tests easier.
  2. Add comments documenting that these structs have memory that they aren’t cleaning up, but it’s okay because we expect them to be used with an arena allocator.
  3. Do nothing and have streamlined code that arguably has a flaw in it, but isn’t relevent to the actual use case.

Which should I go with? Are there any I missed?

Cheers :smiling_face_with_sunglasses:

Here’s the relevant code, any/all code review is welcome!

/// These are rows in the table, each one represents a backup
/// of an on-disk .env file
pub const EnvFile = struct {
    path: []const u8,
    // Storing the full path and the full dir is
    // a waste of space, but its a WIP.
    dir: []const u8,
    remotes: JsonStringSlice,
    sha256: []const u8,
    contents: []const u8,

    pub fn deinit(self: EnvFile, alloc: std.mem.Allocator) void {
        self.remotes.deinit(alloc);
    }
};

/// A list of strings, JSON serialized for zig-sqlite
pub const JsonStringSlice = struct {
    data: [][]const u8,

    pub const BaseType = []const u8;

    // AI Generated
    pub fn bindField(self: JsonStringSlice, alloc: std.mem.Allocator) !BaseType {
        var string = std.ArrayList(u8).init(alloc);
        try std.json.stringify(self.data, .{}, string.writer());

        return string.toOwnedSlice();
    }

    pub fn readField(alloc: std.mem.Allocator, value: BaseType) !JsonStringSlice {
        const parsed = try std.json.parseFromSlice(
            [][]const u8,
            alloc,
            value,
            .{},
        );
        defer parsed.deinit();
   
        // AI Generated
        var owned_data = try alloc.alloc([]const u8, parsed.value.len);
        for (parsed.value, 0..) |str, i| {
            owned_data[i] = try alloc.dupe(u8, str);
        }

        return .{ .data = owned_data };
    }

    pub fn deinit(self: JsonStringSlice, alloc: std.mem.Allocator) void {
        for (self.data) |str| {
            alloc.free(str);
        }
        alloc.free(self.data);
    }
};

test "JsonStringSlice parses" {
    const gpa = std.testing.allocator;

    const input =
        \\["foo", "bar"]
    ;

    const slice = try JsonStringSlice.readField(gpa, input);
    defer slice.deinit(gpa);

    try std.testing.expectEqualStrings(slice.data[0], "foo");
    try std.testing.expectEqualStrings(slice.data[1], "bar");
}

test "EnvFile doesn't leak" {
    const gpa = std.testing.allocator;

    const input =
        \\["foo", "bar"]
    ;

    const slice = try JsonStringSlice.readField(gpa, input);

    const contents =
        \\APP_ENV=prod
    ;

    var hasher = std.crypto.hash.sha2.Sha256.init(.{});
    hasher.update(contents);
    const hash = hasher.finalResult();

    const f = EnvFile{
        .path = "/home/user/project/.env",
        .dir = "/home/user/project/",
        .remotes = slice,
        .sha256 = &hash,
        .contents = contents,
    };
    defer f.deinit(gpa);
}

const std = @import("std");
const sqlite = @import("sqlite");

As long as the functions are not part of an API you expect other people to use (with their allocator of choice) I think it’s perfectly fine to omit deinit functions (although most of the deinit functions I write are about 5 LOC so not too much clutter there).

What I often end up doing is calling the Allocator arg either gpa or arena to indicate what kind of allocator a function expects (I think I got that from a blog post @kristoff wrote but I can’t find it right now).

However a deinit function is obviously strictly necessary if you’re dealing with resources that aren’t part of your arena like file handles that need to be closed.

If it’s a one-shot program like yours seems to be just leaking every allocation is also a viable strategy, DebugAllocator won’t be of much use then though.

EDIT: found it

4 Likes

FWIW in one-shot cmdline tools I just do a single deinit on the toplevel allocator (and also only to avoid the leak warning from the DebugAllocator).

If your code doesn’t benefit from recycling memory, then fine-grained freeing isn’t worth it, since when your process is killed all the memory goes back to the OS anyway (unless you’re on an Amiga of course!).

For libraries it’s alltogether different though, those should behave properly.

Also, obligatory: Memory Leaks & The Ultimate Garbage Collection - Discussions - Retro Computing

6 Likes

Most people don’t make explicit tests for leaks, they write tests for correctness, and the testing allocator detects leaks.

For deinit, the main reason we like allocators in Zig is because you abstract your business logic away from memory allocation. Software changes all the time and having that flexibility of choosing your allocation scheme can be useful, it also increases reusability. For example, that json parsing code from the stdlib can very simply run on a microcontroller, something that doesn’t even have an OS.

I love this solution, it’s subtle, but makes intention really clear without having to write a lengthy comment about memory management. Thanks!

1 Like

If you’re just learning manual memory management, I think it’s best to learn the discipline of understanding ownership. Relying an an Arena so you can just “not think about it”, is shortchanging your learning process. You can look at an arena as an optimization. Write a correct program first, then optimize. Once advanced, then you can design for arenas or other specialized allocators. But get the fundamentals down first.
In a library, probably best to let the library user decide allocation policy and not force it on them (or the later you).

10 Likes

I strongly agree with this sentiment, I think it’s a very good practice to meticulously free memory and close handles even when not technically needed because the OS will clean everything up when the process terminates anyway.

Shoutout to std.process.cleanExit(), which does nothing in debug builds but calls exit(0) in release builds. If you tack this on at the end of your main function you can use all the defers you want to practice properly deinitializing resources, without any overhead from unnecessary cleanup in release builds. The Zig compiler itself uses cleanExit() (and a handful of similar functions for handling fatal errors).

10 Likes

Tbf though, the whole point of memory arenas is to not think about invidual object ownership and lifetimes, but instead grouping individual object lifetimes and ownerwhip into few buckets. E.g. all allocations in the arena are owned by the arena and have the same lifetime that ends when the Arena is reset or discarded.

E.g. when using manual memory management you explicitly shouldn’t think about or deal with ownership and lifetime of individual objects, but instead think about what objects have the same lifetime boundaries and then put them into the same arena.

Manual memory management isn’t about mentally tracking each individual object (e.g. plastering your code with many individual free calls) - compilers and garbage collectors are better at this sort of housekeeping, but instead about structuring your code in a way that this inividual tracking isn’t needed.

6 Likes

Very true, that’s manual memory management lesson two, My argument is about pedagogy for newbies.

But when first starting out, doing it manually for a while with just a gpa helps develop that understanding of lifetimes. I see many people telling new users just to throw everything in an arena, which doesn’t help develop a sense of what objects have a lifetime affinity and belong there, and which have shorter or longer lifetimes and don’t belong there.

An arena is a bit of high-carbohydrate fast food. It’s a little too easy. Too many short-lifetime things added to your longer lasting arena (because it’s the easy allocator you have at hand), can lead to inefficient memory usage. Irrelevant for a one-shot tool tool, vital for a long-running app. This is where we need good tooling to observe the use of memory over time.

5 Likes

Just clarification - I agree with probably best because it does not mean always

If allocation policy means allocator, It depends on library

It’s OK for zig std library or similar style where code is visible , but for library with own internal multithreading,
I’d like to force(ask) user to give “me” malloc compatable allocator
we already had long discussion

I agree with not doing arenas immediately - not just for learning, but because there are “smells” that can guide you for where to use them. One smell is looping to free things in deinit(), another is watching for when you have to do a bunch of individual if (obj) free(obj) types of things in your deinit to free allocated objects in your struct. Those cases when to use arenas are easier to see if you go through the pain of individual allocs and frees first.

Thank you everyone for the excellent suggestions and discussion. The conclusion I’ve come to is that were this production code, it would be reasonable to exclude the deinit logic, but since this is an educational venture, it’s worthwhile to include them.

To the various people that suggested ditching the arena entirely and doing it the hard way- is this a reasonable thing to try?

The zig-sqlite docs have this to say:

NOTE: when you do allocate in bindField or readField make sure to pass a std.heap.ArenaAllocator-based allocator.

The binding or reading code does not keep tracking of allocations made in custom types so it can’t free the allocated data itself; it’s therefore required to use an arena to prevent memory leaks

So the best non-arena solution I can think of (as the only structs in my program are Db, Config, and EnvFile), would be to pass an arena allocator to sqlite, copy all the structs into a different sort of allocator, and then handle that memory “properly”. That feels a bit too crazy to me. Is there a more sensible middle ground?

Thanks

Depends on what exactly it means, if all the allocated memory is accessible via the custom type, then you dont need an arena. If there is allocated memory that is not accessible via the custom type, then you do need an arena.