Execution model for comptime

I found this function in std/enums.zig

pub inline fn valuesFromFields(comptime E: type, comptime fields: []const EnumField) []const E {
    comptime {
        var result: [fields.len]E = undefined;
        for (&result, fields) |*r, f| {
            r.* = @enumFromInt(f.value);
        }
        const final = result;
        return &final;
    }
}

To me it looks like we return a reference to a locally defined array. I think if we were to do this with integers in c, this would be undefined behavior. But here we dot it with an array of types at comptime. So I guess the execution model of comptime has to be different than that of runtime. Is this code really well defined? If so what is the correct mental model to have of comptime execution?

2 Likes

Welcome to the forum!

You’re in luck! The comptime semantics changed a while back, and mlugg wrote a post about how it works now: https://ziggit.dev/t/comptime-mutable-memory-changes/3702 (which explains the final = result part)

As for returning pointer to locals in inline fn in general, that currently works in the current compiler implementation, but is not legal

7 Likes

Thanks! Will study the links.

1 Like

Another thing I think is relevant here, given that the parameters are comptime, is that inline in Zig is semantic, which states that “Unlike normal function calls, arguments at an inline function callsite which are compile-time known are treated as Compile Time Parameters. This can potentially propagate all the way to the return value”

If we extract the code, but remove inline:

const std = @import("std");
pub fn valuesFromFields(comptime E: type, comptime fields: []const std.builtin.Type.EnumField) []const E {
    comptime {
        var result: [fields.len]E = undefined;
        for (&result, fields) |*r, f| {
            r.* = @enumFromInt(f.value);
        }
        const final = result;
        return &final;
    }
}
pub fn main() void {
    const E = enum { A, B, C };
    const x = valuesFromFields(E, @typeInfo(E).@"enum".fields);
    std.debug.print("{any}\n", .{x});
}

…we’ll get a compile error:

error: function called at runtime cannot return value at comptime
        return &final;

This can be fixed by either adding back inline, or by putting comptime in front of the call to valuesFromFields

3 Likes

Just to verify my understanding. The code I referenced in the standard library currently works but is illegal according to mluggs comment on the issue. So this is a bad implementation by the standard library even when the function is semantically inlined?

So there is no magic garbage collection or object persistence at comptime? The only reason in worked in this example was that the function was semantically inlined and thus no reference to a local variable is actually semantically returned?

A somewhat related question which has been confusing me is where anonymous structs are stored when you only return a reference to them. It must depend on usage right? For example if you call a function with the syntax foo(&.{“bar”, “baz”}) then I see multiple possibilities. Nothing is dynamic about the anonymous struct so we could just reference to a global static tuple, which is created at compile time. On the other hand, if it contained some dynamic data like foo(&.{“bar”, my_local_int}) I imagine a anonymous struct has to be created on the stack and then a reference to it is passed to the function.

Howerer what happens when you return a reference to a runtime anonymous struct. For example in std.Io.Writer

/// Writes to `buffer` and returns `error.WriteFailed` when it is full.
pub fn fixed(buffer: []u8) Writer {
    return .{
        .vtable = &.{
            .drain = fixedDrain,
            .flush = noopFlush,
            .rebase = failingRebase,
        },
        .buffer = buffer,
    };
}

I do not understand where the vtable anonymus struct is stored since we only return a reference to it. Is it stored on the stack in the calling functions stack frame? What is the lifetime for how long the returned reference is valid? Because it has to be stored on the stack somewhere right, since no allocator is used and using static storage does not seem possible to me in this situation? Does that mean that if the calling function returns this reference, will it become invalidated since that stack frame is now overriden?

2 Likes

I didn’t actually realize that function bodies will coerce to function pointers, I’ve been writing &fixedDrain this whole time. Does not seem to be documented. Anyway.

The VTable is a constant value where all fields are compile-time known. Those go into static memory. They’re also interned, so duplicates of the same value end up with just one copy. I don’t know if that’s a language semantic we can rely on, or just a fact about the compiler as it is now, or even if absolutely everything gets interned (strings do).

It’s equivalent to this:

const fixed_vtable: Writer.VTable = .{
    .drain = fixedDrain,
    .flush = noopFlush,
    .rebase = failingRebase,
};

pub fn fixed(buffer: []u8) Writer {
    return .{
        .vtable = &fixed_vtable,
        .buffer = buffer,
    };
}

We do the same thing with "constant strings" all the time, I can see why it’s less clear in a case like this however.

2 Likes

Okay, yes in my example I now see that it can be known at comptime. But I played around with an example of my own. Turns out the compiler allows returning references to local runtime constructed anonymous structs.

const std = @import("std");

const MyThing = struct {
    num: i32,
    foo: []const u8,
};

fn bad(i: i32) *const MyThing {
    return &.{ .num = i, .foo = "hello" };
}

fn fine(i: i32) *const MyThing {
    _ = i;
    return &.{ .num = 7, .foo = "hello" };
}

pub fn main() void {
    const bar = bad(7);
    // const bar = fine(7);
    std.debug.print("hello {}\n", .{bar});
}

if you call fine() then it is comptime known and compiles and runs as expected. However if you call bad, we get a runtime crash in debug mode

❯ zig run  reference_to_anonymous_struct.zig
hello .{ .num = -1431655766, .foo = { General protection exception (no address available)
/nix/store/iyv5lnq9cwlsixg357zv8zcp4pv2q7ml-zig-0.15.2/lib/zig/std/Io/Writer.zig:1358:29: 0x1142b01 in printValue__anon_23104 (std.zig)
                for (value, 0..) |elem, i| {
                            ^
/nix/store/iyv5lnq9cwlsixg357zv8zcp4pv2q7ml-zig-0.15.2/lib/zig/std/Io/Writer.zig:1333:33: 0x1142411 in printValue__anon_23069 (std.zig)
                try w.printValue(ANY, options, @field(value, f.name), max_depth - 1);
                                ^
/nix/store/iyv5lnq9cwlsixg357zv8zcp4pv2q7ml-zig-0.15.2/lib/zig/std/Io/Writer.zig:1300:34: 0x1141d33 in printValue__anon_22884 (std.zig)
                return printValue(w, ANY, options, value, max_depth);
                                 ^
/nix/store/iyv5lnq9cwlsixg357zv8zcp4pv2q7ml-zig-0.15.2/lib/zig/std/Io/Writer.zig:1340:71: 0x11408c5 in printValue__anon_22823 (std.zig)
                .@"enum", .@"union", .@"struct" => return w.printValue(fmt, options, value.*, max_depth),
                                                                      ^
/nix/store/iyv5lnq9cwlsixg357zv8zcp4pv2q7ml-zig-0.15.2/lib/zig/std/Io/Writer.zig:700:25: 0x1140064 in print__anon_22781 (std.zig)
        try w.printValue(
                        ^
/nix/store/iyv5lnq9cwlsixg357zv8zcp4pv2q7ml-zig-0.15.2/lib/zig/std/debug.zig:231:23: 0x113fbaf in print__anon_22682 (std.zig)
    nosuspend bw.print(fmt, args) catch return;
                      ^
/home/oskar/Coding/_molijoxer/zig_buggy_stuff/reference_to_anonymous_struct.zig:20:20: 0x113e884 in main (reference_to_anonymous_struct.zig)
    std.debug.print("hello {}\n", .{bar});
                   ^
/nix/store/iyv5lnq9cwlsixg357zv8zcp4pv2q7ml-zig-0.15.2/lib/zig/std/start.zig:618:22: 0x113dabd in posixCallMainAndExit (std.zig)
            root.main();
                     ^
/nix/store/iyv5lnq9cwlsixg357zv8zcp4pv2q7ml-zig-0.15.2/lib/zig/std/start.zig:232:5: 0x113d351 in _start (std.zig)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x0 in ??? (???)
fish: Job 1, 'zig run  reference_to_anonymous…' terminated by signal SIGABRT (Abort)

Quite a nasty footgun that such a small change can make the difference between a well defined program and runtime undetectable illegal behavior. So I guess what is going on is that the bad function get compiled to the equivalent of

fn bad2(i: i32) *const MyThing {
    const tmp: MyThing = .{ .num = i, .foo = "hello" };
    return &tmp;
}

and this is an obvious return of reference to local variable bug. I saw that there is an open issue which intends to detect these at compile time in the future and I guess it would also be able to detect these references to anonymous structs.

I was imagining that the compiler could potentially do some fancy reasoning about result location semantics of the anonymous reference and place it in the caller stack frame instead of the callee’s stack frame. But that does not appear to be the case.

Seems clever, but would that be a good thing? Would it not be better to discourage the temptation to return a local reference? Should such be seen as a super power or a breach of reasonable expectations? I’d expect efforts in the compiler, to detect and error, would be more to the benefit of mankind. The comptime exception seems reasonable, even if a little eyebrow-raising, as in your own experience.

Fun fact: on master, if you change const to var, the compiler catches that it’s returning a reference to a local variable and errors out.

1 Like

The intention is to turn that into “checked illegal behavior”. It’s illegal already, the compiler is not able to catch it yet. It’s a tough problem, and it’s not possible to catch every possible case of a reference to stack memory escaping its valid scope. Not without lifetimes, and Zig is not going to have those.

But something simple like you’ve illustrated, that’s being actively worked on.

The change doesn’t look as small with more experience with the language. Functions are constant (static) and have a defined place in memory, so a VTable struct constructed directly from the names of functions will be as well. It’s possible to have runtime-known function pointers, but they wouldn’t be direct references to the names of those functions.

I think it’s better style to make v-tables into declarations and return them that way, including that it’s a bit clearer that they’re static. That said, the concept of comptime ‘knowability’ permeates the language. It takes some experience to get a knack for, but it’s not actually mysterious, and it will profoundly affect your understanding of how Zig works when you do.

That would be breaking the type contract. If you say you’re returning a Foo, there will be room on the stack for a Foo, if you say you’re returning a *Foo, there will be room for one of those. Normally you want that, because *Foo-returning functions allocate as a general rule. Detecting that your *Foo comes from an about-to-be-invalid Foo is cause for the compiler to complain, we all want that, but just “making it work” would be magic, which Zig doesn’t do.

Just keep asking: where are the bytes? You’ll get the hang of it!