Enable to reuse a slice

Context:

I would like to read a file line by line and store each line as an element of a slice named lines.

I manage to do this with the code below (main.zig), however there is a problem when I want to use the slice: each element contains the last line of the file (see output below), i.e. the string “Test line 3”.

In other words, the contents of the slice looks good inside the loop iterating other each line of the file (“Fisrt loop” in section 3 in the code) but not during further usage (“Second loop” in section 4).

main.zig
const std = @import("std");

pub fn main() !void {
    //#0. Create a junk file for test
    var file = try std.fs.cwd().createFile(
        "junk_file.txt",
        .{ .read = true },
    );
    defer file.close();

    var str_from_i: []const u8 = undefined;
    var buf: [1024]u8 = undefined;
    var bytes_written = @as(void, undefined);
    for (1..4) |i| {
        bytes_written = try file.writeAll("Test line ");
        str_from_i = try std.fmt.bufPrint(&buf, "{}", .{i});
        bytes_written = try file.writeAll(str_from_i);
        if (i < 3) {
            bytes_written = try file.writeAll("\n");
        }
    }

    //#1. Get file size (source: https://stackoverflow.com/a/72334950/18251410)
    file = try std.fs.cwd().openFile("junk_file.txt", .{ .mode = .read_only });
    const file_size = (try file.stat()).size;

    //#2. Generate a slice with an allocator to store the lines (source: https://www.reddit.com/r/Zig/comments/13pjtp3/comment/jl9w6ng/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button)
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();
    defer {
        _ = gpa.deinit();
    }
    const dynamic_size: usize = file_size;
    var lines = try allocator.alloc([]u8, dynamic_size);
    defer allocator.free(lines);

    //#3. Read the file line by line (source: https://stackoverflow.com/a/68879352/18251410)
    file = try std.fs.cwd().openFile("junk_file.txt", .{});

    var buf_reader = std.io.bufferedReader(file.reader());
    var in_stream = buf_reader.reader();
    var i: u8 = 0;
    while (try in_stream.readUntilDelimiterOrEof(&buf, '\n')) |line| : (i += 1) {
        lines[i] = line;
        std.log.debug("First loop: lines[{}] = `{s}'", .{ i, lines[i] });
    }

    //#4. Second print of lines slice
    const number_of_lines = i;
    for (0..number_of_lines) |index| {
        std.log.debug("Second loop: lines[{}] = `{s}'", .{ index, lines[index] });
    }
}
Output
$ zig run main.zig
debug: First loop: lines[0] = `Test line 1'
debug: First loop: lines[1] = `Test line 2'
debug: First loop: lines[2] = `Test line 3'
debug: Second loop: lines[0] = `Test line 3'
debug: Second loop: lines[1] = `Test line 3'
debug: Second loop: lines[2] = `Test line 3'

Questions:

  1. Why the output is not the same in both loops?
  2. Is there a way to obtain the expected behavior (i.e. same output in both loops)?

N.B. I’m a complete newbie regarding zig and other similar “low level” languages (only notions about pointers).

zig version: 0.14.0-dev.2293+6d781e095

1 Like

in_stream.readUntilDelimiterOrEof(&buf, '\n') places the bytes read from the input stream in the memory located at &buf. This never changes across iterations, so for every line you’re effectively appending the same pointer to the lines array.

Try using readUntilDelimiterOrEofAlloc() instead, which will allocate dedicated memory for each line read from the input stream, and then remember to free this memory before you’re done with the lines array.

Since you want to have file to be fully in memory splited by lines I suggest just reading entire file to begin with. And later you can create slices to full file content.

const file_content = try std.fs.cwd().readFileAlloc(alloc, "junk_file.txt", 1000*1000*1000);
defer alloc.free(file_content);

const lines = std.ArrayList([]const u8).init(alloc);
defer lines.deinit();

var tokenizer = std.mem.tokenizeScalar(u8, file_content, '\n'); // or splitScalar if you want to leave empty lines
while (tokenizer.next()) |line| {
    try lines.append(line);
}

for (lines.items, 0..) |line, index| {
    std.log.debug("Second loop: lines[{}] = `{s}'", .{ index, line });
}
4 Likes

`Thanks a lot for your help!

Unfortunately, I don’t know how to free the memory because I’ve just passed an mem.Allocator constant to the readUntilDelimiterOrEofAlloc() function…

I’ve tried this:

    var gpa2 = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator2 = gpa2.allocator();
    defer {
        _ = gpa2.deinit();
    }
    const max_size: usize = 1024;
    while (try in_stream.readUntilDelimiterOrEofAlloc(allocator2, '\n', max_size)) |line| : (i += 1) {
        lines[i] = line;
        std.log.debug("First loop: lines[{}] = `{s}'", .{ i, lines[i] });
    }
But I get this error:
$ zig run main.zig
debug: First loop: lines[0] = `Test line 1'
debug: First loop: lines[1] = `Test line 2'
debug: First loop: lines[2] = `Test line 3'
debug: Second loop: lines[0] = `Test line 1'
debug: Second loop: lines[1] = `Test line 2'
debug: Second loop: lines[2] = `Test line 3'
error(gpa): memory address 0x7fd537287000 leaked: 
/usr/local/bin/zig/lib/std/array_list.zig:475:67: 0x10d849e in ensureTotalCapacityPrecise (main)
                const new_memory = try self.allocator.alignedAlloc(T, alignment, new_capacity);
                                                                  ^
/usr/local/bin/zig/lib/std/array_list.zig:452:51: 0x10d1a80 in ensureTotalCapacity (main)
            return self.ensureTotalCapacityPrecise(better_capacity);
                                                  ^
/usr/local/bin/zig/lib/std/array_list.zig:486:44: 0x10b6a2b in ensureUnusedCapacity (main)
            return self.ensureTotalCapacity(try addOrOom(self.items.len, additional_count));
                                           ^
/usr/local/bin/zig/lib/std/array_list.zig:305:42: 0x10a1a49 in appendSlice (main)
            try self.ensureUnusedCapacity(items.len);
                                         ^
/usr/local/bin/zig/lib/std/array_list.zig:358:33: 0x108393b in appendWrite (main)
            try self.appendSlice(m);
                                ^
/usr/local/bin/zig/lib/std/io.zig:360:27: 0x1066ccd in typeErasedWriteFn (main)
            return writeFn(ptr.*, bytes);
                          ^

error(gpa): memory address 0x7fd537287020 leaked: 
/usr/local/bin/zig/lib/std/array_list.zig:475:67: 0x10d849e in ensureTotalCapacityPrecise (main)
                const new_memory = try self.allocator.alignedAlloc(T, alignment, new_capacity);
                                                                  ^
/usr/local/bin/zig/lib/std/array_list.zig:452:51: 0x10d1a80 in ensureTotalCapacity (main)
            return self.ensureTotalCapacityPrecise(better_capacity);
                                                  ^
/usr/local/bin/zig/lib/std/array_list.zig:486:44: 0x10b6a2b in ensureUnusedCapacity (main)
            return self.ensureTotalCapacity(try addOrOom(self.items.len, additional_count));
                                           ^
/usr/local/bin/zig/lib/std/array_list.zig:305:42: 0x10a1a49 in appendSlice (main)
            try self.ensureUnusedCapacity(items.len);
                                         ^
/usr/local/bin/zig/lib/std/array_list.zig:358:33: 0x108393b in appendWrite (main)
            try self.appendSlice(m);
                                ^
/usr/local/bin/zig/lib/std/io.zig:360:27: 0x1066ccd in typeErasedWriteFn (main)
            return writeFn(ptr.*, bytes);
                          ^

error(gpa): memory address 0x7fd537287040 leaked: 
/usr/local/bin/zig/lib/std/array_list.zig:475:67: 0x10d849e in ensureTotalCapacityPrecise (main)
                const new_memory = try self.allocator.alignedAlloc(T, alignment, new_capacity);
                                                                  ^
/usr/local/bin/zig/lib/std/array_list.zig:452:51: 0x10d1a80 in ensureTotalCapacity (main)
            return self.ensureTotalCapacityPrecise(better_capacity);
                                                  ^
/usr/local/bin/zig/lib/std/array_list.zig:486:44: 0x10b6a2b in ensureUnusedCapacity (main)
            return self.ensureTotalCapacity(try addOrOom(self.items.len, additional_count));
                                           ^
/usr/local/bin/zig/lib/std/array_list.zig:305:42: 0x10a1a49 in appendSlice (main)
            try self.ensureUnusedCapacity(items.len);
                                         ^
/usr/local/bin/zig/lib/std/array_list.zig:358:33: 0x108393b in appendWrite (main)
            try self.appendSlice(m);
                                ^
/usr/local/bin/zig/lib/std/io.zig:360:27: 0x1066ccd in typeErasedWriteFn (main)
            return writeFn(ptr.*, bytes);
                          ^


Full code:
const std = @import("std");

pub fn main() !void {
    //#0. Create a junk file (just for demonstration)
    var file = try std.fs.cwd().createFile(
        "junk_file.txt",
        .{ .read = true },
    );
    defer file.close();

    var str_from_i: []const u8 = undefined;
    var buf: [1024]u8 = undefined;
    var bytes_written = @as(void, undefined);
    for (1..4) |i| {
        bytes_written = try file.writeAll("Test line ");
        str_from_i = try std.fmt.bufPrint(&buf, "{}", .{i});
        bytes_written = try file.writeAll(str_from_i);
        if (i < 3) {
            bytes_written = try file.writeAll("\n");
        }
    }

    //#1. Get file size (source: https://stackoverflow.com/a/72334950/18251410)
    file = try std.fs.cwd().openFile("junk_file.txt", .{ .mode = .read_only });
    const file_size = (try file.stat()).size;

    //#2. Generate a slice with an allocator to store the lines (source: https://www.reddit.com/r/Zig/comments/13pjtp3/comment/jl9w6ng/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button)
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();
    defer {
        _ = gpa.deinit();
    }
    const dynamic_size: usize = file_size;
    var lines = try allocator.alloc([]u8, dynamic_size);
    defer allocator.free(lines);

    //#3. Read the file line by line (source: https://stackoverflow.com/a/68879352/18251410)
    file = try std.fs.cwd().openFile("junk_file.txt", .{});

    var buf_reader = std.io.bufferedReader(file.reader());
    var in_stream = buf_reader.reader();
    var i: u8 = 0;
    var gpa2 = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator2 = gpa2.allocator();
    defer {
        _ = gpa2.deinit();
    }
    const max_size: usize = 1024;
    while (try in_stream.readUntilDelimiterOrEofAlloc(allocator2, '\n', max_size)) |line| : (i += 1) {
        lines[i] = line;
        std.log.debug("First loop: lines[{}] = `{s}'", .{ i, lines[i] });
    }

    //#4. Second print of slice
    const number_of_lines = i;
    for (0..number_of_lines) |index| {
        std.log.debug("Second loop: lines[{}] = `{s}'", .{ index, lines[index] });
    }
}

Much more cleaner indeed! Thanks a lot!

Just a correction:

I’ve used var lines instead of a const lines because I got the following error at the line try lines.append(line);:

main.zig:37:18: error: expected type '*array_list.ArrayListAligned([]const u8,null)', found '*const array_list.ArrayListAligned([]const u8,null)'
        try lines.append(line);
            ~~~~~^~~~~~~
Full working code:
const std = @import("std");

pub fn main() !void {
    //#0. Create a junk file (just for demonstration)
    var file = try std.fs.cwd().createFile(
        "junk_file.txt",
        .{ .read = true },
    );
    defer file.close();

    var str_from_i: []const u8 = undefined;
    var buf: [1024]u8 = undefined;
    var bytes_written = @as(void, undefined);
    for (1..4) |i| {
        bytes_written = try file.writeAll("Test line ");
        str_from_i = try std.fmt.bufPrint(&buf, "{}", .{i});
        bytes_written = try file.writeAll(str_from_i);
        if (i < 3) {
            bytes_written = try file.writeAll("\n");
        }
    }

    var gpa2 = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator2 = gpa2.allocator();
    defer {
        _ = gpa2.deinit();
    }

    const file_content = try std.fs.cwd().readFileAlloc(allocator2, "junk_file.txt", 1000 * 1000 * 1000);
    defer allocator2.free(file_content);

    var lines = std.ArrayList([]const u8).init(allocator2); //note the modification from const to var
    defer lines.deinit();

    var tokenizer = std.mem.tokenizeScalar(u8, file_content, '\n'); // or splitScalar if you want to leave empty lines
    while (tokenizer.next()) |line| {
        try lines.append(line);
    }

    for (lines.items, 0..) |line, index| {
        std.log.debug("Second loop: lines[{}] = `{s}'", .{ index, line });
    }
}

If you want to do it that way, you also need to iterate over lines and allocator.free() each of them before you leave the function scope (in addition to the allocator.free(lines) for the entire array you already had).

For example, the defer statement could become:

    defer {
        for (lines) |line| allocator.free(line);
        allocator.free(lines);
    }

For bonus points you might want to keep track of how many lines have actually been allocated, in the event that readUntilDelimiterOrEofAlloc() fails halfway through.

(Or just read the whole thing in memory as @AndrewKraevskii suggests :laughing:)

Thanks a lot!

  1. I still have a memory leak with this new defer statement (see code below). But I am going to learn more about allocators and come back here :sweat_smile:!
Full Code
const std = @import("std");

pub fn main() !void {
    //#0. Create a junk file (just for demonstration)
    var file = try std.fs.cwd().createFile(
        "junk_file.txt",
        .{ .read = true },
    );
    defer file.close();

    var str_from_i: []const u8 = undefined;
    var buf: [1024]u8 = undefined;
    var bytes_written = @as(void, undefined);
    for (1..4) |i| {
        bytes_written = try file.writeAll("Test line ");
        str_from_i = try std.fmt.bufPrint(&buf, "{}", .{i});
        bytes_written = try file.writeAll(str_from_i);
        if (i < 3) {
            bytes_written = try file.writeAll("\n");
        }
    }

    //#1. Get file size (source: https://stackoverflow.com/a/72334950/18251410)
    file = try std.fs.cwd().openFile("junk_file.txt", .{ .mode = .read_only });
    const file_size = (try file.stat()).size;

    //#2. Generate a slice with an allocator to store the lines (source: https://www.reddit.com/r/Zig/comments/13pjtp3/comment/jl9w6ng/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button)
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();
    defer {
        _ = gpa.deinit();
    }
    const dynamic_size: usize = file_size;
    var lines = try allocator.alloc([]u8, dynamic_size);
    defer {
        for (lines) |line| allocator.free(line);
        allocator.free(lines);
    }

    //#3. Read the file line by line (source: https://stackoverflow.com/a/68879352/18251410)
    file = try std.fs.cwd().openFile("junk_file.txt", .{});

    var buf_reader = std.io.bufferedReader(file.reader());
    var in_stream = buf_reader.reader();
    var i: u8 = 0;
    var gpa2 = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator2 = gpa2.allocator();
    defer {
        _ = gpa2.deinit();
    }
    const max_size: usize = 1024;
    while (try in_stream.readUntilDelimiterOrEofAlloc(allocator2, '\n', max_size)) |line| : (i += 1) {
        lines[i] = line;
        std.log.debug("First loop: lines[{}] = `{s}'", .{ i, lines[i] });
    }

    //#4. Second print of slice
    const number_of_lines = i;
    for (0..number_of_lines) |index| {
    std.log.debug("Second loop: lines[{}] = `{s}'", .{ index, lines[index] });
    }
}

Error message
$ zig run main_jmc.zig                                                                                                                                  
debug: First loop: lines[0] = `Test line 1'
debug: First loop: lines[1] = `Test line 2'
debug: First loop: lines[2] = `Test line 3'
error(gpa): memory address 0x7fee983d8000 leaked: 
/usr/local/bin/zig/lib/std/array_list.zig:475:67: 0x10d7e9e in ensureTotalCapacityPrecise (main_jmc)
                const new_memory = try self.allocator.alignedAlloc(T, alignment, new_capacity);
                                                                  ^
/usr/local/bin/zig/lib/std/array_list.zig:452:51: 0x10d1480 in ensureTotalCapacity (main_jmc)
            return self.ensureTotalCapacityPrecise(better_capacity);
                                                  ^
/usr/local/bin/zig/lib/std/array_list.zig:486:44: 0x10b642b in ensureUnusedCapacity (main_jmc)
            return self.ensureTotalCapacity(try addOrOom(self.items.len, additional_count));
                                           ^
/usr/local/bin/zig/lib/std/array_list.zig:305:42: 0x10a15e9 in appendSlice (main_jmc)
            try self.ensureUnusedCapacity(items.len);
                                         ^
/usr/local/bin/zig/lib/std/array_list.zig:358:33: 0x10835cb in appendWrite (main_jmc)
            try self.appendSlice(m);
                                ^
/usr/local/bin/zig/lib/std/io.zig:360:27: 0x1066e8d in typeErasedWriteFn (main_jmc)
            return writeFn(ptr.*, bytes);
                          ^

error(gpa): memory address 0x7fee983d8020 leaked: 
/usr/local/bin/zig/lib/std/array_list.zig:475:67: 0x10d7e9e in ensureTotalCapacityPrecise (main_jmc)
                const new_memory = try self.allocator.alignedAlloc(T, alignment, new_capacity);
                                                                  ^
/usr/local/bin/zig/lib/std/array_list.zig:452:51: 0x10d1480 in ensureTotalCapacity (main_jmc)
            return self.ensureTotalCapacityPrecise(better_capacity);
                                                  ^
/usr/local/bin/zig/lib/std/array_list.zig:486:44: 0x10b642b in ensureUnusedCapacity (main_jmc)
            return self.ensureTotalCapacity(try addOrOom(self.items.len, additional_count));
                                           ^
/usr/local/bin/zig/lib/std/array_list.zig:305:42: 0x10a15e9 in appendSlice (main_jmc)
            try self.ensureUnusedCapacity(items.len);
                                         ^
/usr/local/bin/zig/lib/std/array_list.zig:358:33: 0x10835cb in appendWrite (main_jmc)
            try self.appendSlice(m);
                                ^
/usr/local/bin/zig/lib/std/io.zig:360:27: 0x1066e8d in typeErasedWriteFn (main_jmc)
            return writeFn(ptr.*, bytes);
                          ^

error(gpa): memory address 0x7fee983d8040 leaked: 
/usr/local/bin/zig/lib/std/array_list.zig:475:67: 0x10d7e9e in ensureTotalCapacityPrecise (main_jmc)
                const new_memory = try self.allocator.alignedAlloc(T, alignment, new_capacity);
                                                                  ^
/usr/local/bin/zig/lib/std/array_list.zig:452:51: 0x10d1480 in ensureTotalCapacity (main_jmc)
            return self.ensureTotalCapacityPrecise(better_capacity);
                                                  ^
/usr/local/bin/zig/lib/std/array_list.zig:486:44: 0x10b642b in ensureUnusedCapacity (main_jmc)
            return self.ensureTotalCapacity(try addOrOom(self.items.len, additional_count));
                                           ^
/usr/local/bin/zig/lib/std/array_list.zig:305:42: 0x10a15e9 in appendSlice (main_jmc)
            try self.ensureUnusedCapacity(items.len);
                                         ^
/usr/local/bin/zig/lib/std/array_list.zig:358:33: 0x10835cb in appendWrite (main_jmc)
            try self.appendSlice(m);
                                ^
/usr/local/bin/zig/lib/std/io.zig:360:27: 0x1066e8d in typeErasedWriteFn (main_jmc)
            return writeFn(ptr.*, bytes);
                          ^

thread 96063 panic: Invalid free
/usr/local/bin/zig/lib/std/heap/general_purpose_allocator.zig:663:21: 0x106536c in freeLarge (main_jmc)
                    @panic("Invalid free");
                    ^
/usr/local/bin/zig/lib/std/heap/general_purpose_allocator.zig:876:31: 0x10432cc in free (main_jmc)
                self.freeLarge(old_mem, log2_old_align, ret_addr);
                              ^
/usr/local/bin/zig/lib/std/mem/Allocator.zig:98:28: 0x103da63 in free__anon_2341 (main_jmc)
    return self.vtable.free(self.ptr, buf, log2_buf_align, ret_addr);
                           ^
/home/srigal/cryptshare/1_Scripts/SAAAAAAAAA/src/main_jmc.zig:36:42: 0x103c652 in main (main_jmc)
        for (lines) |line| allocator.free(line);
                                         ^
/usr/local/bin/zig/lib/std/start.zig:617:37: 0x103b8c2 in posixCallMainAndExit (main_jmc)
            const result = root.main() catch |err| {
                                    ^
/usr/local/bin/zig/lib/std/start.zig:248:5: 0x103b49f in _start (main_jmc)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x0 in ??? (???)
Aborted

  1. In the mean time I have follow-up questions: When you wrote:

If you want to do it that way

do you see “better” approaches?

  1. For bonus points you might want to keep track of how many lines have actually been allocated, in the event that readUntilDelimiterOrEofAlloc() fails halfway through.

This assume that I know the number of lines in the file, right (to be sure I understand)?

There are problems with your new code, but first here you go with a patch of my changes over your code and the revised sources:

Patch
$ diff -Naur original.zig main.zig 
--- original.zig	2024-12-02 10:44:39.976921577 +0000
+++ main.zig	2024-12-02 10:44:13.243951405 +0000
@@ -8,55 +8,44 @@
     );
     defer file.close();
 
-    var str_from_i: []const u8 = undefined;
-    var buf: [1024]u8 = undefined;
-    var bytes_written = @as(void, undefined);
     for (1..4) |i| {
-        bytes_written = try file.writeAll("Test line ");
-        str_from_i = try std.fmt.bufPrint(&buf, "{}", .{i});
-        bytes_written = try file.writeAll(str_from_i);
+        var buf: [1024]u8 = undefined;
+        const str = try std.fmt.bufPrint(&buf, "Test line {d}", .{i});
+        try file.writeAll(str);
         if (i < 3) {
-            bytes_written = try file.writeAll("\n");
+            try file.writeAll("\n");
         }
     }
 
     //#1. Get file size (source: https://stackoverflow.com/a/72334950/18251410)
-    file = try std.fs.cwd().openFile("junk_file.txt", .{ .mode = .read_only });
-    const file_size = (try file.stat()).size;
+    const file_size = (try file.stat()).size; // <- incorrect
 
     //#2. Generate a slice with an allocator to store the lines (source: https://www.reddit.com/r/Zig/comments/13pjtp3/comment/jl9w6ng/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button)
     var gpa = std.heap.GeneralPurposeAllocator(.{}){};
+    defer if (gpa.deinit() != .ok) std.debug.print("memory leak detected!", .{});
     const allocator = gpa.allocator();
+
+    var lines = try allocator.alloc([]u8, file_size);
+    var allocated_lines: usize = 0;
     defer {
-        _ = gpa.deinit();
-    }
-    const dynamic_size: usize = file_size;
-    var lines = try allocator.alloc([]u8, dynamic_size);
-    defer {
-        for (lines) |line| allocator.free(line);
+        for (0..allocated_lines) |i| allocator.free(lines[i]);
         allocator.free(lines);
     }
 
     //#3. Read the file line by line (source: https://stackoverflow.com/a/68879352/18251410)
-    file = try std.fs.cwd().openFile("junk_file.txt", .{});
+    try file.seekTo(0);
 
     var buf_reader = std.io.bufferedReader(file.reader());
-    var in_stream = buf_reader.reader();
+    const in_stream = buf_reader.reader();
     var i: u8 = 0;
-    var gpa2 = std.heap.GeneralPurposeAllocator(.{}){};
-    const allocator2 = gpa2.allocator();
-    defer {
-        _ = gpa2.deinit();
-    }
-    const max_size: usize = 1024;
-    while (try in_stream.readUntilDelimiterOrEofAlloc(allocator2, '\n', max_size)) |line| : (i += 1) {
+    while (try in_stream.readUntilDelimiterOrEofAlloc(allocator, '\n', 1024)) |line| : (i += 1) {
         lines[i] = line;
         std.log.debug("First loop: lines[{}] = `{s}'", .{ i, lines[i] });
     }
+    allocated_lines = i;
 
     //#4. Second print of slice
-    const number_of_lines = i;
-    for (0..number_of_lines) |index| {
+    for (0..allocated_lines) |index| {
         std.log.debug("Second loop: lines[{}] = `{s}'", .{ index, lines[index] });
     }
 }
Revised source code
const std = @import("std");

pub fn main() !void {
    //#0. Create a junk file (just for demonstration)
    var file = try std.fs.cwd().createFile(
        "junk_file.txt",
        .{ .read = true },
    );
    defer file.close();

    for (1..4) |i| {
        var buf: [1024]u8 = undefined;
        const str = try std.fmt.bufPrint(&buf, "Test line {d}", .{i});
        try file.writeAll(str);
        if (i < 3) {
            try file.writeAll("\n");
        }
    }

    //#1. Get file size (source: https://stackoverflow.com/a/72334950/18251410)
    const file_size = (try file.stat()).size; // <- incorrect

    //#2. Generate a slice with an allocator to store the lines (source: https://www.reddit.com/r/Zig/comments/13pjtp3/comment/jl9w6ng/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button)
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer if (gpa.deinit() != .ok) std.debug.print("memory leak detected!", .{});
    const allocator = gpa.allocator();

    var lines = try allocator.alloc([]u8, file_size);
    var allocated_lines: usize = 0;
    defer {
        for (0..allocated_lines) |i| allocator.free(lines[i]);
        allocator.free(lines);
    }

    //#3. Read the file line by line (source: https://stackoverflow.com/a/68879352/18251410)
    try file.seekTo(0);

    var buf_reader = std.io.bufferedReader(file.reader());
    const in_stream = buf_reader.reader();
    var i: u8 = 0;
    while (try in_stream.readUntilDelimiterOrEofAlloc(allocator, '\n', 1024)) |line| : (i += 1) {
        lines[i] = line;
        std.log.debug("First loop: lines[{}] = `{s}'", .{ i, lines[i] });
    }
    allocated_lines = i;

    //#4. Second print of slice
    for (0..allocated_lines) |index| {
        std.log.debug("Second loop: lines[{}] = `{s}'", .{ index, lines[index] });
    }
}
  1. You don’t need to reopen the file every time to operate on it (if you’re still holding a reference to the original std.fs.File object, which you do here). I removed that.
  2. I’m not sure what you’re doing with var bytes_written = @as(void, undefined); but that is a useless variable (a variable of type void cannot store any information). Just remove it.
  3. You do:
    const file_size = (try file.stat()).size;  // <- size in bytes
...
    const dynamic_size: usize = file_size;
    var lines = try allocator.alloc([]u8, dynamic_size);  // <- allocates that many pointers to `[]u8`

which is allocating as many []u8 entries as there are bytes in the file; you want one per line, so this is going to allocate way more elements than needed.
4. Due to the above point, many of those []u8s are going to be uninitialized, so the cleanup I suggested earlier doesn’t work there as you may be trying to free random memory locations:

    defer {
        for (lines) |line| allocator.free(line);
        allocator.free(lines);
    }
  1. Not sure why you initialize a second GPA.
  2. The const number_of_lines = i; does precisely what I was suggesting in my last post, i.e. keep track of how many lines were actually allocated. If you make this into a variable and declare it just before the defer statement which deallocates memory, then you can use that to keep track of where you need to stop calling allocator.free():
    var allocated_lines: usize = 0;
    defer {
        for (0..allocated_lines) |i| allocator.free(lines[i]);
        allocator.free(lines);
    }
    ...
    while (try in_stream.readUntilDelimiterOrEofAlloc(allocator, '\n', 1024)) |line| : (i += 1) {
      ...
    }
    allocated_lines = i;

    for (0..allocated_lines) |index| {
        ...
    }

Yeah, again, since you’re effectively reading the entire file anyway, @AndrewKraevskii’s approach of just reading the whole contents in memory and then simply splitting on newlines is the easiest and more robust (and likely faster too?).

Additionally, as I pointed out above, you cannot know ahead of time how much memory you need to reserve because you don’t know how many lines are going to be in the file. (OK, you do know in this toy example, but that’s not going to be true in the general case!)

In order to allocate the right amount of memory with base allocator.alloc() you may need to allocate room for some elements, keep track of the elements you’ve filled and the remaining capacity, and once capacity is exhausted reallocate memory with a larger capacity, etc… Basically, you’re going to be manually doing the work of std.ArrayList() yourself. See @AndrewKraevskii’s post which also uses std.ArrayList() to make this aspect simpler as well.

2 Likes

Thanks a lot for your complete response.
However, there is a problem in the rendering of your post, I’m not able to figure out which lines have been modified in the patch code (sorry I’ve tried).
Also, the Revised source code is hidden…
Again, thanks a lot for your patience!

Bah, sorry, I completely messed up the markup on the previous post. I updated it now.

The other piece of this puzzle is that you don’t need a variable to capture the return value of a function which returns void. A fn doTheThing(self: *@This()) void {...} can just be called:

something.doTheThing();

Also, if a function returns a value and you don’t need that value, it can be discarded:

_ = something.returnsSomeValue();

This is legal for a void return as well, but you don’t need it there.

Last but not least: when you do need a void value, @as(void, undefined) is not how you create it. It’s simply {}, that’s the spelling of ‘instance of void’ in Zig.

3 Likes

Thanks again for taking the time to explain all of this. It was very instructive for me!
Another thing about allocator behavior, if I understand correctly, the line var lines = try allocator.alloc([]u8, file_size); doesn’t really “use” memory unless I assign some content to the variable line. Like if it just “reserve” a bunch of memory adresses but without locking this memory when exiting the current block scope. Then, we just (must) have to free the memory that we effectively use and not all the memory we allocate, right? For me, “allocate” means “use” but I think I missunderstood the concept…

Think of it in the literal sense of “allocation”: you request some memory, and the allocator gives it to you for you to use as needed. That memory stays allocated until freed, regardless of whether it is effectively used or not.

When manual memory management is employed, as is the case for Zig, allocations/deallocations need to be paired (each allocation has to be followed by exactly one deallocation) for the program to be correct. Allocating without deallocating leads to memory leaks.

2 Likes

You already solve the issue but I still don’t get why the output is not the same in the two for loops in my original code :sweat_smile:

I understand that the memory located at &buf contains the whole text file but the capture into the var line concerns only one line right? And the print of var lines is also correct inside the first loop.

To me it looks like at the exit of the loop, the content of all elements of var lines is overridden by the content of the last line only…

If you still have some of your kind patience left, I would be curious to know more about it.

Thank you very much for taking the time to explain this!

It was very instructive for me and now it’s clearer in my mind, I feel silly to not have figured out myself that if the compiler expect a void I could just avoid the variable before the function call :smiling_face:

1 Like

That’s what it looks like, but the problem was (as I previously mentioned):

  • In each iteration, you populated the contents of buf, then appended &buf to lists
  • The memory location of buf never changed across iterations, so every time you were adding the same pointer inside lists (as []u8 is a slice: slices are basically just a (pointer; length) pair)
  • In the final loop you printed the same string three times, because all three elements of lists pointed to the same memory location and thus contained the same contents.
1 Like