Diving deep into anonymous struct literals

Anonymous struct literals are widely used throughout Zig stdlib, especially implementing vtable in all allocators. Documentation, unfortunately, has a lot to be desired and this thread carries over and expands the conversation that started in another thread that became too long.

Some fundamental questions first:

  1. Where do &.{...} live? On stack, in heap, in data or text segment?
  2. How long do they live? Local, static?
  3. If I instantiate the same anonymous struct literal twice, do I get two distinct objects with identical content (at two different addresses) or I will get a single unique object (de-dupped) with both instances pointing to the same address?

The following example shows that the story is complex and, IMHO, goes against Zig philosophy of “locality” – what you see on one screen is what you get. Meaning you do not need to chase where values of all the fields in the struct literal are coming from in order to see where it is allocated. Unfortunately, as it stands now one has to carefully chase them.

I am still learning Zig and have holes in understanding how it works and would like to hear from others on the subject.

//! example.zig
const std = @import("std");
const stdout = std.io.getStdOut().writer();

const S = struct {
    a: *const isize,
    b: *const isize,
};

// Memory for Anonymous Struct literals is allocated in relationship with memory of its fields.
// If all the fields are constants that live in data or text segment than the struct literal will live there
// and its multiple instantiations are de-duped so that taking address returns the same value.
// If at least one of fields is a local value (a `var`) then struct literal lives on stack
// and every incarnation of the same struct literal will have its own distinct address.

pub fn main() !void {
    const cn: isize = 42; // c1 == c2
    var vn: isize = 123;  // v1 != v2
    var c1 : *const S = &.{.a=&cn, .b=&cn}; _ = &c1;
    var c2 : *const S = &.{.a=&cn, .b=&cn}; _ = &c2;
    var v1 : *const S = &.{.a=&vn, .b=&vn}; _ = &v1;
    var v2 : *const S = &.{.a=&vn, .b=&vn}; _ = &v2;
    var cv1: *const S = &.{.a=&cn, .b=&vn}; _ = &cv1;
    var cv2: *const S = &.{.a=&cn, .b=&vn}; _ = &cv2;
    try stdout.print("c1 == c2 -> {}\n", .{c1==c2});
    try stdout.print("v1 == v2 -> {}\n", .{v1==v2});
    try stdout.print("cv1==cv2 -> {}\n", .{cv1==cv2});
    try stdout.print("&cn = {*}\n", .{&cn});
    try stdout.print("c1 = {*}\n", .{c1});
    try stdout.print("c2 = {*}\n", .{c2});
    try stdout.print("&vn = {*}\n", .{&vn});
    try stdout.print("v1 = {*}\n", .{v1});
    try stdout.print("v2 = {*}\n", .{v2});
    try stdout.print("cv1= {*}\n", .{cv1});
    try stdout.print("cv2= {*}\n", .{cv2});
}

Running it on MacOS-13.6.3 with 0.12.0-dev.1834+f36ac227b results in

c1 == c2 -> true
v1 == v2 -> false
cv1==cv2 -> false
&cn = isize@1010b40f8
c1 = struct-lit-00.S@1010b40e8
c2 = struct-lit-00.S@1010b40e8
&vn = isize@16ee12998
v1 = struct-lit-00.S@16ee129b8
v2 = struct-lit-00.S@16ee129d0
cv1= struct-lit-00.S@16ee129e8
cv2= struct-lit-00.S@16ee12a00
6 Likes

I believe anonymous literals are a red herring here. What matters is whether the value is comptime or not. I believe

const stable_ptr = &comptime arbitrary-expr

either gets the desired guarantee here, or prints a clear error as to why the thing can’t be comptime.

6 Likes

I believe @matklad is right here. Notice the output of the program:

    const cn: isize = 42; // c1 == c2
    var vn: isize = 123; // v1 != v2
    @compileLog(&.{ .a = &cn, .b = &cn });
    @compileLog(&.{ .a = &cn, .b = &cn });
    @compileLog(&.{ .a = &vn, .b = &vn });
    @compileLog(&.{ .a = &vn, .b = &vn });
    @compileLog(&.{ .a = &cn, .b = &vn });
    @compileLog(&.{ .a = &cn, .b = &vn });

is

literals.zig:19:5: error: found compile log statement
    @compileLog(&.{ .a = &cn, .b = &cn });
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Compile Log Output:
@as(*const struct{comptime a: *const isize = 42, comptime b: *const isize = 42}, .{.a = 42, .b = 42})
@as(*const struct{comptime a: *const isize = 42, comptime b: *const isize = 42}, .{.a = 42, .b = 42})
@as(*const struct{a: *isize, b: *isize}, [runtime value])
@as(*const struct{a: *isize, b: *isize}, [runtime value])
@as(*const struct{comptime a: *const isize = 42, b: *isize}, [runtime value])
@as(*const struct{comptime a: *const isize = 42, b: *isize}, [runtime value])

From what I know, comptime values are always interned. This is required from the correct semantics of things like generic functions.

1 Like

This I know. My problem with struct literals (anonymous or not) is different. In order to understand whether a given struct literal is interned or not I have to follow all pointer fields to where they are defined. Being var or const makes all the difference. And when one changes certain variable from/to var or const will have ripple effect affecting all struct literals referencing its address. And compiler does not help. I would rather prefer to explicitly declare with comptime keyword if I want a guarantee that the struct literal is interned. And if it references address on the stack and cannot be interned it should be an error. Without comptime keyword struct literal can be either interned or on stack.

EDIT: I :heart: Zig !!! Just checked out and Zig already implements my desired behavior. If I write comptime &.{...} Zig guarantees comptime evaluation and interning, without comptime it is context dependent.

8 Likes

– The following posts were merged from the topic that originated this thread –


Literals are not stored in the same way that runtime values are stored. Check out the following example:

pub fn main() !void {

    const x = "I am the first string.";
    const y = "I am a different string.";
    const z = "I am the first string.";

    std.debug.print(
        \\
        \\ address x: {*}
        \\ address y: {*}
        \\ address z: {*}
        \\
        , .{ 
            x.ptr,
            y.ptr,
            z.ptr,
        });
}

Here’s the output on my system:

address x: u8@10c6f76
address y: u8@10c6f8d
address z: u8@10c6f76

You can see that the first and third address are the same - they’ve been condensed by the compiler.

The address of the literal is the same (that prevents v-tables from being copied).

When you make an allocator function, you get a different literal for each function that you instantiate. Each class type (like the ArenaAllocator) share the same instantiation across instances. So the GPA allocator function has a different literal than the ArenaAllocator. See this example…

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const gpa_alloc1 = gpa.allocator();
    const gpa_alloc2 = gpa.allocator();

    var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    const arena_alloc1_1 = arena.allocator();
    const arena_alloc1_2 = arena.allocator();

    // new arena allocator
    var arena2 = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    const arena_alloc2_1 = arena2.allocator();
    const arena_alloc2_2 = arena2.allocator();

    std.debug.print(
        \\
        \\
        \\gpa_alloc1 {*}
        \\gpa_alloc2 {*}
        \\
        \\arena_alloc1_1 {*}
        \\arena_alloc1_2 {*}
        \\
        \\arena_alloc2_1 {*}
        \\arena_alloc2_2 {*}
        \\
    , .{
        gpa_alloc1.vtable,
        gpa_alloc2.vtable,
        arena_alloc1_1.vtable,
        arena_alloc1_2.vtable,
        arena_alloc2_1.vtable,
        arena_alloc2_2.vtable,
    });
}

This prints:

gpa_alloc1 mem.Allocator.VTable@10d5288
gpa_alloc2 mem.Allocator.VTable@10d5288

arena_alloc1_1 mem.Allocator.VTable@10d52a0
arena_alloc1_2 mem.Allocator.VTable@10d52a0

arena_alloc2_1 mem.Allocator.VTable@10d52a0
arena_alloc2_2 mem.Allocator.VTable@10d52a0

The arenas are the same because the literal they are referencing is the same one per function call. Even for different arenas. The gpa has it’s own allocator function, and hence they’re the same too.

@slonik-az I agree that the documentation is poorly worded.

4 Likes

String literals are comptime known and they reside in text segment of the memory. Their address is cast in stone. No surprises here. But allocator function is not comptime. It is called at runtime. I can go in and reassign gpa_alloc.vtable.free function pointer. Will this change affect my allocator instance only or all instances in this class of allocators? What is the lifetime of the object vtable pointer points to?

It changes it for every allocator of the same type because they’re referring to the same literal. The same type instances refer to the same function call that has the same literal.

fn badFree(_: *anyopaque, _: []u8, _: u8, _: usize) void {
    return;
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};

    const allocator1 = gpa.allocator();

    const good_free = allocator1.vtable.free;

    const vtable_ptr = @constCast(allocator1.vtable);

    vtable_ptr.free = &badFree;

    const allocator2 = gpa.allocator();

    std.debug.print(
        \\ 
        \\ vtable address 1: {*}
        \\ vtable address 2: {*}
        \\
        \\ free address 1: {*}
        \\ free address 2: {*}
        \\
        \\ good free address: {*}
        \\ bad free address: {*}
        \\
        , .{  
            allocator1.vtable,
            allocator2.vtable,
            allocator1.vtable.free,
            allocator2.vtable.free,
            good_free,
            &badFree
        });
}

Here’s the output:

 vtable address 1: mem.Allocator.VTable@10d5388
 vtable address 2: mem.Allocator.VTable@10d5388

 free address 1: fn (*anyopaque, []u8, u8, usize) void@1025c50
 free address 2: fn (*anyopaque, []u8, u8, usize) void@1025c50

 good free address: fn (*anyopaque, []u8, u8, usize) void@1029410
 bad free address: fn (*anyopaque, []u8, u8, usize) void@1025c50

You can see that the new allocator also picked up the bad free. That literal is acting much like a var field acts in a struct.

2 Likes

Here is an example where anonymous struct’s address is different in every incarnation

// file a.zig
const std = @import("std");
const stdout = std.io.getStdOut().writer();

const S = struct {
    a: *i32,
};

pub fn main() !void {
    var b: i32 = 42;
    var s1: *const S = &.{.a=&b}; _ = &s1;
    var s2: *const S = &.{.a=&b}; _ = &s2;
    try stdout.print("s1 = {*}\n", .{s1});
    try stdout.print("s2 = {*}\n", .{s2});
}

run with zig run a.zig

s1 = a.S@16fb02a50
s2 = a.S@16fb02a60

Am I correct in thinking that the anonymous struct above is not a literal because it references local var on stack? Does the anonymous struct literal need to be known at comptime?

1 Like

I followed an example in GodBolt and it looks like we’re getting __anon offsets for the following example:

const Data = struct {
   str: []const u8,
   num: i32,
};

fn get_struct() *const Data {
   const foo = Data{
      .str = "ok",
      .num = 42,
   };
   return &foo;
}

This goes to the following instructions on Zig trunk:

example.get_struct:
        push    rbp
        mov     rbp, rsp
        movabs  rax, offset __anon_3337
        pop     rbp
        ret

…and if we follow this __anon_3337, we get our struct data…

__anon_3332:
        .asciz  "ok"

__anon_3337:
        .quad   __anon_3332
        .quad   2
        .long   42
        .zero   4

At __anon_3337 we have the address of the string literal “ok” (__anon_3332), the length (2 for the slice), the i32 equal to 42 and some skip bytes. It looks like comptime constants and anonymous literals can get stored next to string literals in the data section.

In this case, you actually have 2 anonymous literals that are independent of each other - one referenced by s1, the other referenced by s2. I’m not positive if the interning pool de-duplicates literals in the same way it does strings or if that has specific requirements on type characteristics (such as “all const data”, etc… as anything variable would screw-up the de-duplication process).

Let’s see if we can get closer by using something similar to what you’ve drafted:

const S = struct {
    ptr: *usize,
};

pub fn anonPtr(ptr: *usize) *const S {
    return &.{
        .ptr = ptr
    };
}

pub fn main() !void {
    var a: usize = 42;
    var b: usize = 43;
    var s1 = anonPtr(&a); _ = &s1;
    var s2 = anonPtr(&b); _ = &s2;
    std.debug.print(
        \\
        \\address of s1: {*}
        \\address of s2: {*}
        \\
        \\value of s1: {}
        \\value of s2: {}
        \\
        , .{
           s1, s2, s1.ptr.*, s2.ptr.* 
        });
}

This prints:

address of s1: main.S@7ffca2868508
address of s2: main.S@7ffca2868508

value of s1: 43
value of s2: 43

So we can see that they both have the same address, and now they both point to 43 (which was the last value assigned to the literal). So the literal’s values are not constant nor comptime and we get the same result.

We can only deduce here that it’s the same literal - that’s also what’s happening in the virtual table case.

4 Likes

It seems that one cannot consistently rely upon anonymous struct literal de-duplication. Whether or not you end up with one or several distinct instances depends on how it was initialized as shown below.
Try commenting/uncommenting const b: and var b: lines to see the impact on s1 and s2

const std = @import("std");
const stdout = std.io.getStdOut().writer();

const S = struct {
    a: *const i32,
};

pub fn main() !void {
    // const b: i32 = 42; // s1 == s2
    var b: i32 = 42; // s1 != s2
    var s1: *const S = &.{.a=&b}; _ = &s1;
    var s2: *const S = &.{.a=&b}; _ = &s2;
    try stdout.print("s1 == s2 -> {}\n", .{s1==s2});
    try stdout.print("&b = {*}\n", .{&b});
    try stdout.print("s1 = {*}\n", .{s1});
    try stdout.print("s2 = {*}\n", .{s2});
}

Initializing from const b results in a single deduplicated struct literal sitting somewhere in data segment (guessing from addresses). Initializing from var b results in two struct literals on the stack (guessing from addresses).

With `var b: ...`
$ zig run tmp/struct-lit-00.zig
s1 == s2 -> false
&b = i32@16efbaa2c
s1 = struct-lit-00.S@16efbaa38
s2 = struct-lit-00.S@16efbaa48
With `const b: ...`
$ zig run tmp/struct-lit-00.zig
s1 == s2 -> true
&b = i32@100b740f0
s1 = struct-lit-00.S@100b740e8
s2 = struct-lit-00.S@100b740e8

I am using 0.12.0-dev.1834+f36ac227b on MacOS-13.6.3

1 Like

I wouldn’t expect the struct literal to be deduped here since it’s initialized with a pointer to a variable on the stack which doesn’t have a static (ie. constant) address. I’d imagine any type of literal optmizations would be theown out of the window if there are pointers to stack involved.

That said, the first time I saw these literals used in the std library, I was quite surprised. It looks like a bug unless you’re familiar with this pattern.

yes, the reason for this behavior is quite clear. But it also means that looking at &{...} one needs to scrutinize where values of all the fields are coming from and are they constants or mutable and where they live - on stack, heap, data segment, etc. Kind of goes against Zig philosophy.

3 Likes

Been lurking for a while, but this topic caused me to think some thoughts and I’m curious… could this issue be solved at the tooling level? Say for example that zls did an analysis of anon struct literals and recursively tracked down the comptime-ness of their fields and then told the text editor to highlight internable ones one color and others another color? If that could work, would it be a good idea?

If this were too slow, it could be an editor action that a programmer could request on particular struct literals as necessary.

This would solve the visibility problem for me as a coder, though by not fixing the issue at a syntactic level it might be a cause of future troubles…

@dhw9406 welcome to the party.

Can you define what you mean by

Which issue in particular are you referring to? Readability/reporting or something else?

The local reasoning about what my code is doing; given an anon struct literal on my screen, how do I know if it’s static or not, without (manually) tracking down all of its field definitions?

I think your idea of adding something to zls to highlight them differently could be useful if zls can figure that out quickly enough.

In status quo you can use @compileLog to print the value, if it depends on runtime values that will print something like this:

pub fn main() !void {
    var n: i32 = 3;
    if (true) {
        n = 5;
    }

    @compileLog(n);
}

Output:

temp4.zig:7:5: error: found compile log statement
    @compileLog(n);
    ^~~~~~~~~~~~~~

Compile Log Output:
@as(i32, [runtime value])

So the [runtime value] tells you that it can’t be static, but it would be cool to get that info in an interactive way from the editor.

I’m wondering if this is the right behavior for a new C language.

In c return &S{ ... } is invalid, because the returned pointer is a dangling pointer, so I guess we can take this as UB?

in Zig return &S{ ... } should be also invalid, but Zig improve it when all fields are comptime value, the pointer point a fix address at data section.

Things get more interesting when some fields are runtime value, the returned pointers are still the same, but point to stack.

I try the following C example:

#include <stdio.h>

typedef struct {
  int* a;
} S;


S* anonPtr(int* ptr) {
  S s = (S){ .a = ptr };
  return &s;
}

int main() {
  int a  = 5;
  int b  = 6;
  S* s1 = anonPtr(&a);
  S* s2 = anonPtr(&b);
  printf("%p--%p--%p\n", &a, s1, s2);
  printf("%d--%d\n", *(s1->a), *(s2->a));
  return 0;
}

Compile it on my macOS(using cc) will output:

anonymous-struct.c:10:11: warning: address of stack memory associated with local variable 's' returned [-Wreturn-stack-address]
  return &s;
          ^
1 warning generated.
0x16bc1e6c8--0x16bc1e670--0x16bc1e670
1807870208--1807870208

As we can see, s1 and s2 are still the same, but the value they point to are garbage now.

So Zig is indeed the same with C, and could prevent garbage in some way. Hooray!