How to fix memory leak?

ziggon · March 17, 2025, 7:31pm

I’m working with a tagged union defined like this:

pub const Lex = union(enum) {
    EOF,
    // ... other variants ...
    SLASH,
    STRING: []const u8,
    NUMBER: []const u8,
    // ... other variants ...
};

I have a tokenization function that processes numbers like this:

'0'...'9' => {
    var num_builder = std.ArrayList(u8).init(allocator);
    defer num_builder.deinit();
    while (i < input.len and (std.ascii.isDigit(input[i]) or input[i] == '.')) : (i += 1) {
        try num_builder.append(input[i]);
    }
    const num_str = try num_builder.toOwnedSlice();
    try list.append(.{ .NUMBER = num_str });
    continue;
},

My concern is about memory management. I use ArrayList to build the number string and then call toOwnedSlice() to get an owned slice, which I store in my Lex.NUMBER variant. However, I’m unsure how to properly clean up the memory allocated by toOwnedSlice(). The defer num_builder.deinit() only cleans up the ArrayList’s internal buffer, but not the slice returned by toOwnedSlice(), right?

When I run my application, I get a memory leak reported:

/usr/lib/zig/std/array_list.zig:483:41: 0x1061580 in addOne (main)
            try self.ensureTotalCapacity(newlen);
                                        ^
/usr/lib/zig/std/array_list.zig:262:49: 0x104f35e in append (main)
            const new_item_ptr = try self.addOne();
                                                ^
codecrafters/codecrafters-interpreter-zig/src/token.zig:187:43: 0x104e7c3 in tokenize (main)
                    try num_builder.append(input[i]);
                                          ^
codecrafters-interpreter-zig/src/main.zig:51:42: 0x10579b3 in main (main)
        const tokens = try token.tokenize(allocator, file_contents, &is_err);

How should I modify my code to prevent this leak?
Should I be managing the Lex.NUMBER slice’s memory differently?

pachde · March 17, 2025, 7:37pm

You just need to define some kind of “deinit” for your token list that takes the allocator used to allocate them.

Parsers often use an arena for the tokens so that they are freed in a block all at once. Very often, tokens are not built, copied and allocated, and instead slices are used that point back into the original buffer the source file was loaded from (which must of course live as long as you want the tokens to live).

ziggon · March 17, 2025, 7:39pm

I created deinit for Token class:

    pub fn deinit(self: *const Lex, allocator: std.mem.Allocator) void {
        switch (self.*) {
            .NUMBER, .STRING => |val| {
                allocator.free(val);
            },
            .IDENTIFIER => |val| {
                allocator.free(val.iden);
            },
            else => {},
        }
    }

and called it after using tokens:

        const tokens = try token.tokenize(allocator, file_contents, &is_err);
        defer {
            for (tokens.items) |tkn| {
                tkn.deinit(allocator);
            }
            tokens.deinit();
        }

not very beautiful but working

dbushell · March 17, 2025, 7:54pm

When you call toOwnedSlice

const num_str = try num_builder.toOwnedSlice();

You will need to call allocator.free(num_str); to free the memory. The defer num_builder.deinit(); is no longer necessary.

dbushell · March 17, 2025, 8:02pm

sorry yes looks like you’re doing that in your deinit functions.

That is generally how I’ve done it too.

There is a very interesting video from Andrew Kelly on “Programming without pointers” that shows a difference approach — though it’s way over my head to understand!

ziggon · March 17, 2025, 8:07pm

do you remember any well written parser in Zig which I can use to learn from about memory management approaches they usually use?

mnemnion · March 17, 2025, 8:30pm

For staged allocation, like parsing, an arena is often a good choice. Usually there’s no reason to free tokens individually, they’re around for awhile and then they’re gone, this is often true of other transient data as well.

It does mean that whatever is going to leave the parsing stage needs to be allocated separately, but that tradeoff is often net positive.

dustin · March 18, 2025, 2:37am

There’s lots of great knowledge in the json parser in stdlib.