Pointers to Temporary Memory

Assigning a pointer to temporary variables can leave the pointer in an invalid state after the temporary falls out-of-scope.

This subject is closely related to object lifetimes: Documentation - The Zig Programming Language

Example 1: Temporary variable within function scope

fn foo() *const usize {
    // reserve memory on the stack for a usize called bar
    var bar: usize = 42;

    // once we exit this function, bar is out of scope,
    // invalidating the address of the returned pointer.
    return &bar;
}

Example 2: Temporary value from function parameter

fn foo(bar: usize) *const usize {
    // note that bar is passed by value, not pointer. This means
    // that bar will exist only within the function scope.

    // once we exit this function, bar is out of scope,
    // invalidating the address of the returned pointer.
    return &bar;
}

Example 3: Invalid temporary from init function

// let's imagine that a user wants a type that contains it's own
// memory arena. From there, they want to assign an allocator
// to it's internal arena to be used with other data structures

const Foo = struct {

    const Self = @This();

    arena: std.heap.ArenaAllocator,
    arena_allocator: std.mem.Allocator,

    // note that "Self" here means we are returning a new value
    pub fn init(backing_allocator: std.mem.Allocator) Self {

        // creating temporary arena with backing allocator
        var tmp_arena = std.heap.ArenaAllocator.init(backing_allocator);

        // this example is incorrect because the arena member variable 
        // is a copy and has a different address to the tmp_arena, so the 
        // allocator() call is to a different arena.

        return Self {
            .arena = tmp_arena, 
            .arena_allocator = tmp_arena.allocator()
        };
    }
};

One may believe that the problem was the tmp_arena variable and try to solve the issue by assigning from one member variable to another. This is still incorrect.

const Foo = struct {

    const Self = @This();

    arena: std.heap.ArenaAllocator,
    arena_allocator: std.mem.Allocator,

    // note that "Self" here means we are returning a new value
    pub fn init(backing_allocator: std.mem.Allocator) Self {

        // In this example, the user is trying to connect the member
        // variables to themselves to avoid the initial temporary arena.

        var self = Self {
            .arena = std.heap.ArenaAllocator.init(backing_allocator),
            .arena_allocator = undefined,
        };

        // here the user tries to assign from another member variable
        self.arena_allocator = self.arena.allocator();

        // the self variable is still a temporary. The memory of self
        // will go out of scope after we exit this function, causing the
        // allocator's pointer to reference invalid memory.
        return self;
    }
};

Instead, here is one way to approach this that will leave everything in a valid state.

const Foo = struct {

    const Self = @This();

    // remove the arena_allocator member variable
    allocator: std.mem.Allocator,

    pub fn init(allocator: std.mem.Allocator) Self {
        // copy parameter's pointer
        return Self { .allocator = allocator };
    }
};

// later...

// create an arena in the scope where it will be used
var arena = std.heap.ArenaAllocator.init(backing_allocator);

// pass a pointer from the arena into Foo's init function.
var foo = Foo.init(arena.allocator());

Here’s another approach where the init function takes a pointer to an instance of the struct to be initialized. This is an idiom seen often in C.

const std = @import("std");

const Foo = struct {
    const Self = @This();

    arena: std.heap.ArenaAllocator = undefined,
    arena_allocator: std.mem.Allocator = undefined,

    // We pass in a pointer to a mutable Foo. It could be
    // on a stack frame higher up the call stack or on the
    // heap.
    pub fn init(
        backing_allocator: std.mem.Allocator,
        self: *Self,
    ) void {

        // No problems here given that everything is placed
        // in memory that outlives this function's scope.
        self.arena = std.heap.ArenaAllocator.init(backing_allocator);
        self.arena_allocator = self.arena.allocator();

        // You can quickly confirm a pointer's address by
        // using the `{*}` format specifier.
        std.debug.print("init: {*} {*} {*}\n", .{
            self, // already a pointer
            &self.arena,
            &self.arena_allocator,
        });
    }

    pub fn deinit(self: *Self) void {
        // Free any memory allocated by the arena.
        self.arena.deinit();
    }
};

pub fn main() !void {
    // Let's use our old friend the GPA.
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // Here Foo is instantiated in main's stack.
    var foo = Foo{};
    defer foo.deinit();

    // Even if Foo is in main's temporary memory, this call
    // is OK because there's no way Foo can become invalid
    // before init returns. Foo "outlives" the call to
    // init, or in other words, its lifetime is longer.
    Foo.init(allocator, &foo);

    // Let's confirm the addresses are the same.
    std.debug.print("main: {*} {*} {*}\n", .{
        &foo,
        &foo.arena,
        &foo.arena_allocator,
    });
}

In a sample run, this produces the output (note output will differ between runs and machines):

init: main.Foo@16b69f030 heap.arena_allocator.ArenaAllocator@16b69f030 mem.Allocator@16b69f050
main: main.Foo@16b69f030 heap.arena_allocator.ArenaAllocator@16b69f030 mem.Allocator@16b69f050

Example 4: Slicing a copy of an array on stack

Freely adapted from this topic.

const std = @import("std");
const log = std.debug.print;

const ToyStr = struct {

    const CAP: usize = 9;
    buf: [CAP]u8 = undefined,

    // note that self is passed by value
    fn sliceMeNice(self: ToyStr, from: usize, to: usize) []const u8 {
        log("inside: {s}\n", .{self.buf[from .. to]});
        return self.buf[from .. to];
    }
};

pub fn main() !void {
    var ts = ToyStr{};
    @memcpy(ts.buf[0..], "aaabbbccc");
    const s = ts.sliceMeNice(0,6);
    log("outside: {s}\n", .{s});
}

Against expectation, this program does not output “aaabbb” in the main() function:

$ ./toy-str-fg 
inside: aaabbb
outside: bb

We are passing an instance of ToyStr by value, so sliceMeNice works with the copy.
It makes a slice of the copy, but after returning from this function stack state is changed.
The slice still points to the same place in stack, but there is no more valid copy at that place.

To make the program work correctly, just pass an instance of ToyStr by reference:

fn sliceMeNice(self: *ToyStr, from: usize, to: usize) []const u8 {
                     ^
10 Likes

This document needs more examples and solutions/workarounds.

Another approach to the same problem would be to create a single instance of the Foo type on reserved memory (either on the heap or a buffer), and assign the variables through the pointer and then return the pointer to the offsite memory.

Something like…

pub fn init(allocator: std.mem.Allocator, args...) *Self {...

It would be best to use an example other than a foo.allocator here though because it could get confusing with two allocators in the same init function.

Please add your examples to the Doc by editing the first post if you can think of more.

1 Like

I don’t know if it suits. This is what std.json does.

pub fn parseFromTokenSource(
    comptime T: type,
    allocator: Allocator,
    scanner_or_reader: anytype,
    options: ParseOptions,
) ParseError(@TypeOf(scanner_or_reader.*))!Parsed(T) {
    var parsed = Parsed(T){
        // allocate a memory for holding ArenaAllocator
        .arena = try allocator.create(ArenaAllocator),
        .value = undefined,
    };
    errdefer allocator.destroy(parsed.arena);
    // create an ArenaAllocator into that memory
    parsed.arena.* = ArenaAllocator.init(allocator);
    errdefer parsed.arena.deinit();

    parsed.value = try parseFromTokenSourceLeaky(T, parsed.arena.allocator(), scanner_or_reader, options);

    // after return, `parsed.arena` is still valid since it is created by the allocator outside the function
    return parsed;
}
// Generic Type
pub fn Parsed(comptime T: type) type {
    return struct {
        arena: *ArenaAllocator,
        value: T,

        pub fn deinit(self: @This()) void {
            // get the allocator that we passed to `parseFromTokenSource`
            const allocator = self.arena.child_allocator;
            // clean all the memory allocated arena
            self.arena.deinit();
            // let what created `self.arena` frees `self.arena`
            allocator.destroy(self.arena);
        }
    };
}
2 Likes

Honestly, I think that’s a great example! It’s a very clever set of maneuvers.

If you want to add comments to the lines to make it crystal clear about what’s happening like I have in the examples above, then I think we could find it a home :slight_smile:

I think the angle you’d want to take is an example of initializing fields using an allocator that doesn’t have temporaries. Sort of like a “this is a more advanced example of initializing fields and here’s why it doesn’t make something invalid.”

While I’m typing this though, maybe this deserves a spot on a best practices page?

Edit - I just really like the example. I think definitely think we can find a place for it eventually if not here.

2 Likes

There is an (open) issue, concerning these footguns.

4 Likes

This seems a bit related?: Language Reference: Lifetime-and-Ownership
Maybe we can add the link somewhere?

2 Likes

I think this is a good one (I mean “this does not work - why ?”).
Making a copy of an array on stack, slicing it and returning the slice.

1 Like

Agreed, add it!

I wonder if there’s a foolproof method of detecting the returning of stack address. Having such a check as part of runtime safety could be quite helpful for programmers more used to higher-level languages.

2 Likes

Done, added example #4.

1 Like