Odd Slice Behavior with Huge Local Function Variable

I’ve been writing some Advent code for fun in Zig, and had the goal of doing it all with no heap usage. To support this I wrote a useful struct that stores all the lines in a file with std.BoundedArray, but lets you access it via a slice of the lines ([]const []u8). Note that there isn’t a second const in the type, so modifying the bytes you received contained within this struct is considered fair game.

I think I’ve set up everything correctly so that there’s no unintentional references to dangling memory, but encountered some very odd behavior the code below demonstrates:

const std = @import("std");

pub fn BoundedArrayBackedLines(max_line_width: usize, max_num_lines: usize) type {
    return struct {
        _internal_buffer: OverallBuffer,
        _outer_slice_mem: [max_num_lines][]u8,

        const Self = @This();
        pub const BufferPerLine = std.BoundedArray(u8, max_line_width);
        pub const OverallBuffer = std.BoundedArray(BufferPerLine, max_num_lines);

        fn fromFile(fp: []const u8) !Self {
            const fh = try std.fs.cwd().openFile(fp, .{});
            defer fh.close();

            var ret: Self = undefined;
            ret._internal_buffer = OverallBuffer.init(0) catch unreachable;

            var line_idx: usize = 0;
            while (true) {
                var line_buffer = try std.BoundedArray(u8, max_line_width).init(0);
                fh.reader().streamUntilDelimiter(line_buffer.writer(), '\n', null) catch |err| switch (err) {
                    error.EndOfStream => break,
                    else => return err,
                };
                const item = try ret._internal_buffer.addOne();
                item.* = try BufferPerLine.init(0);
                try item.*.appendSlice(line_buffer.constSlice());
                ret._outer_slice_mem[line_idx] = item.*.slice();
                line_idx += 1;
            }

            return ret;
        }

        pub fn slice(self: *Self) []const []u8 {
            return self._outer_slice_mem[0..self._internal_buffer.len];
        }
    };
}

fn someFunction(slice: []const []u8) usize {
    std.debug.print("This is also fine:\n", .{});
    for (slice) |ln| {
        std.debug.print("{s}\n", .{ln});
    }

    // Huge variable chucked on the stack seems to nullify the bytes of the input slice!
    var big_var = std.BoundedArray(u8, 100000).init(0) catch unreachable;
    big_var.append(10) catch unreachable;
    big_var.append(12) catch unreachable;

    std.debug.print("This isn't (raw bytes):\n", .{});
    for (slice) |ln| {
        std.debug.print("{any}\n", .{ln});
    }

    return big_var.slice()[0] + big_var.slice()[1];
}

pub fn main() !void {
    var lines = try BoundedArrayBackedLines(100, 100).fromFile("test_file.txt");

    std.debug.print("This is fine:\n", .{});
    for (lines.slice()) |ln| {
        std.debug.print("{s}\n", .{ln});
    }

    std.debug.print("But this function works...? {d}\n", .{someFunction(lines.slice())});
}

The code more or less explains it, but my slice is getting “zeroed out” unintentionally when I throw a huge variable on the stack in a function that uses this slice. This makes me think I’m somehow referencing temporary memory, but I’ve double and triple checked and I can’t see where it would be happening… The output of this code for me (Zig version 0.13.0, x86 Linux platform) is as follows:

This is fine:
hello
from
test
file
This is also fine:
hello
from
test
file
This isn't (raw bytes):
{ 0, 0, 0, 0, 0 }
{ 0, 0, 0, 0 }
{ 0, 0, 0, 0 }
{ 0, 0, 0, 0 }
But this function works...? 22

The contents of test_file.txt is:

hello
from
test
file

Any help is greatly appreciated!

You are using BoundedArray. The contents of a BoundedArray are stored directly inside the struct. So if the containing struct is located on the stack, like here

var ret: Self = undefined;
...
return ret;

then any pointers you take from the BoundedArray contents become invalid after returning.
And you do take pointers to the content here and store them inside of ret:

 // item is stored in ret's BoundedArray → on the stack
const item = try ret._internal_buffer.addOne();
...
ret._outer_slice_mem[line_idx] = item.*.slice();

Now I see two ways to fix this here:

  1. You could use an allocator and allocator.create(Self) to make sure that the result is stored on a fixed location on the heap.
  2. You could pass in a pointer to the init function, where you want to have the object in the end. Note that this is a bit more risky, as it just puts the responsibility of not moving the thing in memory one layer up:
// main()
var lines: BoundedArrayBackedLines(100, 100) = undefined;
try lines.fromFile("test_file.txt");
...
fn fromFile(ret: *Self, fp: []const u8) !void {

Thank you, just needed a second set of eyes. Indeed, the internal storage arrays of the BoundedArray get copied over successfully, but the original pointer I stored with ret._outer_slice_mem[line_idx] = item.*.slice(); is indeed invalid. Moving that pointer assignment to the slice() function like so ensures the pointer is correct:


        fn fromFile(fp: []const u8) !Self {
            const fh = try std.fs.cwd().openFile(fp, .{});
            defer fh.close();

            var ret: Self = undefined;
            ret._internal_buffer = OverallBuffer.init(0) catch unreachable;

            while (true) {
                var line_buffer = try std.BoundedArray(u8, max_line_width).init(0);
                fh.reader().streamUntilDelimiter(line_buffer.writer(), '\n', null) catch |err| switch (err) {
                    error.EndOfStream => break,
                    else => return err,
                };
                const item = try ret._internal_buffer.addOne();
                item.* = try BufferPerLine.init(0);
                try item.*.appendSlice(line_buffer.constSlice());
            }

            return ret;
        }

        pub fn slice(self: *Self) []const []u8 {
            for (self._internal_buffer.slice(), 0..) |*itm, idx| {
                self._outer_slice_mem[idx] = itm.*.slice();
            }
            return self._outer_slice_mem[0..self._internal_buffer.len];
        }

Although slightly annoying it has to “rebuild” the array every time, I’m going to have to noodle on that.

Another option would be to not store the slice at all and instead use an iterator. This would mean that you could just return self._internal_buffer.slice()[idx] in the iterator.next() function without caching it anywhere.

1 Like

Standard solution fot this is to use indices instead of pointers.
You don’t need to store the size of the slice, you can assume that lines are one after the other, so you just need the index where the next line starts.

2 Likes

@LucasSantos91 and @IntegratedQuantum maybe a better higher-level follow-up to both of your suggestions would be is there a “standard” way to create a 2D array accessible via [] operators without relying on the heap? The desired behavior I was going for here is being able to easily access elements via arr[row_idx][col_idx] which this accomplishes via arr.slice()[row_idx][col_idx]. Obviously a “normal” array type accomplishes this, however I want something that is dynamically sized but with comptime specified maximum bounds.

Using indices would you still be able to return a slice of slices? Or would this be changing the method entirely to be more like array_variable.get(row_index, col_index) where internally you work out how to index into your internal storage buffer.

You could do either one. Calculating the index into the internal storage could be seen as a lazy slicing. If you take the lazy slice and eagerly accumulate the results into a buffer, you get a slice of slices.
Same could be said about the iterator that @IntegratedQuantum proposed. Take the iterator and eagerly accumulate the results, and you get a slice of slices.

1 Like