Learning zig: stack and dangling pointers

tgirod · December 1, 2023, 5:01pm

Hello there,

I’m learning zig, coming from higher level languages (go, mostly). I’m trying to get a clear understanding of stack allocations, and their lifetime attached to their stack frame. I think I get the basic principle: if I declare a var inside a function, the value is allocated on the stack, attached to the frame of the current function. When the function exists, the stack is discarded, with all memory attached.

Now, here is an example:

const std = @import("std");

pub fn main() void {
    var val: u64 = 23;
    // val is allocated in main stackframe

    std.debug.print("{} @ {}\n", .{ val, &val });
    // prints 23 @ u64@7ffc5047be10

    const b = Box.init(val);
    // passing val to init (by address or value, zig compiler is making the choice)

    std.debug.print("{} @ {}\n", .{ b.value.*, b.value });
    // prints 23 @ u64@7ffc5047bdf8
    // I thought here I would get u64@7ffc5047bde0, and some random value because of the dangling pointer, but apparently not?
}

const Box = struct {
    value: *const u64,

    fn init(v: u64) Box {
        std.debug.print("{} @ {}\n", .{ v, &v });
        // prints 23 @ u64@7ffc5047bde0 not the same address as val
        // v is allocated inside in init stackframe?
        return Box{
            .value = &v, // storing the address of a stack allocated value, should result in a dangling ptr?
        };
    }
};

So obviously there is something I don’t understand … can anyone explain why b.value does not point to an invalid address? What am I missing here?

AndrewCodeDev · December 1, 2023, 5:20pm

In this line of code, you’re making a copy of the parameter v

fn init(v: u64) Box { ...

This is only reserved memory while the function call is still open. Once you leave the scope of the function, that memory can be repurposed. So by taking a pointer to it .value = &v, you’re now referencing memory that will be repurposed.

So yes, that will be a dangling pointer. Here’s the thing though - the address may still be valid, but what lives at that address has no guarantees anymore.

More importantly though, you’re printing the address the pointer contains. You aren’t printing the value of what is being pointed to. To print the value of what the pointer is actually pointing to, you need to dereference it: b.value.* (the star operator dereferences the value).

And welcome to the forum, @tgirod

AndrewCodeDev · December 1, 2023, 5:37pm

I’m going to add an addendum here because I think it may be helpful. Remember, pointers are just fundamentally an integer - it keeps track of a numerical address. So let’s write some pseudo code…

val: int = 5; // has value 5 and assume it lives at address 12345

ptr: int = 12345; // has value 12345, assume it lives at 123XX

The variable ptr is an integer that just so happens to contain a number that is equivalent to the address of the variable val. No matter what happens to val, ptr will still hold that number until we change it. We can print that number just fine… when we try to go get the memory that it’s assigned to is where we get the problem (the dereference operation). That memory may have been repurposed and now even exist outside of the memory segment that our computer has assigned to our program (aka, a segfault).

dee0xeed · December 1, 2023, 5:42pm

relevant issue
it’s kinda closed as a duplicate of another one.

tgirod · December 1, 2023, 5:59pm

hey @AndrewCodeDev, thanks for the quick reply!

So, I got some of it right - v is discarded at the end of init, so &v is a dangling pointer. But there are still things mysterious.

Here is another example:

const std = @import("std");

const Box = struct {
    value: *const u64,

    fn one(v: u64) Box {
        std.debug.print("address of parameter v {}\n", .{&v});
        return .{
            .value = &v,
        };
    }

    fn two(v: u64) Box {
        var tmp = v;
        std.debug.print("address of var tmp {}\n", .{&tmp});
        return .{
            .value = &tmp,
        };
    }
};

pub fn main() void {
    const one = Box.one(23);
    std.debug.print("box one {}\n\n", .{one});
    const two = Box.two(23);
    std.debug.print("box two {}\n", .{two});
}

resulting in:

> zig run mem.zig
address of parameter v u64@7ffe80c80188
box one mem.Box{ .value = u64@7ffe80c80198 }

address of var tmp u64@7ffe80c80190
box two mem.Box{ .value = u64@7ffe80c80190 }

So here, when I’m using the address of the parameter directly inside of the Box, it gets changed. Whereas if I copy the value to tmp and use that address, it stays the same.

So I guess the compiler is doing something clever when using the address of a parameter in a function?

AndrewCodeDev · December 1, 2023, 7:06pm

I played around with the example on godbolt, but I’m starting to think that RLS is coming into play here. What version of Zig are you on? I’ll play around with the example more later because that’s actually an interesting little difference you found there.

example.Box.initOne:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     rax, rdi
        mov     qword ptr [rbp - 16], rsi
        mov     qword ptr [rbp - 8], rsi
        lea     rcx, [rbp - 8]
        mov     qword ptr [rdi], rcx
        add     rsp, 16
        pop     rbp
        ret

And the second one…

example.Box.initTwo:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     rax, rdi
        mov     qword ptr [rbp - 16], rsi
        mov     qword ptr [rbp - 8], rsi
        lea     rcx, [rbp - 8]
        mov     qword ptr [rdi], rcx
        add     rsp, 16
        pop     rbp
        ret

Look identical on Zig Trunk, so that may mean they don’t have the latest version on godbolt.

EDIT: It’s the latest version.

permutationlock · December 1, 2023, 7:14pm

I won’t claim to know exactly what is going on or why it does this, but if we do the following we get a different address printed each time:

pub fn one(v: u64) void {
    std.debug.print("{}\n", .{&v});
    std.debug.print("{}\n", .{&v});
    std.debug.print("{}\n", .{&v});
}

pub fn main() void {
    one(12);
}

u64@7ffd8bdf6700
u64@7ffd8bdf6710
u64@7ffd8bdf6720

An integer argument like this is generally going to be passed in a register, so to take its “address” the compiler actually needs to copy it to somewhere on the stack. It seems that each time an address is required, a new copy is being put on the stack.

AndrewCodeDev · December 1, 2023, 7:24pm

That would certainly make sense in this case. It also explains why the two assembly code snippets are the same because it would automatically promote it to the same address the OP is seeing. Just another reason to not take pointers to variables outside of their intended scope

permutationlock · December 1, 2023, 7:29pm

Now this has me thinking: what is a situation where you should take the address of a parameter in Zig?

AndrewCodeDev · December 1, 2023, 7:30pm

Good question - I think maybe the topic for another thread?

tgirod · December 1, 2023, 7:55pm

Wow, thanks for the deep dive folks! It’s really another world when you get this close to the metal …

permutationlock · December 2, 2023, 7:36pm

It appears that the behavior you observed is a current compiler bug.

tgirod · December 2, 2023, 7:37pm

Good to know, and to be reminded that the paint is still fresh

fatihpense · December 4, 2023, 4:44pm

I’m also learning, the dangling stack pointer issue bit me as well

It doesn’t explain your issue, but to add relevant information.

One solution is to use allocator to create the struct. I love how explicit it is in Zig.

const parsed = allocator.create(std.json.Parsed(BookQuote))

I had to extend the lifetime of objects to reuse them from Webassembly host language. Here is the video and source code for that: