Init block to const-cast - does it copy?

While I agree that in general we shouldn’t use “Zig should work like Language X” as a criterion, I think it’s worth understanding why those languages do it the way they do.

Function calls are a real thing: they move stack pointers, have preludes and returns, and so on. A block isn’t a function call: yes, it’s a scope, but it’s not a visible one, it isn’t inline with the rest of the code, and it (optimization aside) has an effect, and the effect it has is not compatible with extending the lifetime of function-local variables past the function call.

Within a function scope, all the variables are visible: visible to the programmer, that is. So it makes sense to me (and to the authors of the aforementioned programming languages) that taking a reference out of one of those block scopes does not oblige the compiler to break your code’s intention. I simply don’t understand why we would want that.

3 Likes

This feels dirty. I know it works right now, but it really doesn’t seem right to assume pointers to variables in a block should be valid beyond the block itself. It’d be ideal if returning a value from a block would simply be copy-less whenever the optimizer thinks it should be.

Maybe it feels dirty due to an allegiance to curley-braces, but I have to agree (strongly) with @mnemnion - it could be seen as a “pattern” to use blocks within functions like this, where one of the main ingredients is returning the reference of the result, for use in the function. Since it’s a clean way to avoid a big copy, and a clean way to scope internals in the block, and a nice way to wind up with a const, it seems like a rather valuable idiom.

though I guess I can’t immediately see a problem with this, either. However, the & really adds a bit of clarity: you’re declaring, “I want the reference to this thing” - it’s visually clear that a copy isni’t happening. The other way (the compiler deciding to return a reference by default, or when efficient), you have to just guess or test in order to know (… or post, like like a schmo did to start this thread, since it only “seemed” likely, and the schmo was unsure that there was “no magic”). I.e., I prefer the visible “magic” of allowing a reference to an otherwise-block-scoped var to the invisible magic of “maybe” (or even “always”) returning a reference rather than a copy.

2 Likes

If you can identify why it doesn’t feel right to you, that may help in understanding the tradeoff for the feature it provides.

For me, it doesn’t feel quite right because I’ve gotten used to the Rust rule that owned values are dropped, and any references to those values are invalidated, at the end of a block. That’s necessary for Rust’s lifetime-checked version of RAII but it doesn’t seem to apply to Zig. Zig defer/errdefer statements do run at the end of their enclosing block, but that has nothing to do with lifetimes since the compiler doesn’t manage lifetimes – the defer statement can be whatever you want. So I’m happy to let those feelings fade away.

2 Likes

I think my reasoning is similar - a block creates a scope, and a scope should meet consistent expectations, regardless of whether it’s from a function, control flow, or a block. That way, refactoring a block out into its own function, or in lining a function as a block, won’t come with nasty hidden consequences.

One of those expectations - exemplified by function scope - is that it’s UB to use a pointer to a variable defined within the scope once outside of it.

This feels like a natural extension of the more general idea that variables do not ‘exist’ and cannot be accessed outside of their scope.

I also think it’s a reasonable thing to make it UB due to the potential for some niche optimizations (reusing stack space within a function?)

3 Likes

Well, a sequel demands a little attention.

I moved my code from a harness, and out to global const space. I tweaked here and there to comptime everything. All good except this (break :init &c_ - well, the real code is different, but this is the critical equivalent). The compile error? “global variable contains reference to comptime var". This makes sense, except for the fact that all that moves on into runtime is a const, not an actual variable. Here is the reference explanation I found - @mlugg posted back in 2024, with his compiler update at the time. His prescription for “solving” the problem is to “copy first”:

pub const my_name: []const u8 = name: {
    var buf: [5]u8 = undefined;
    @memcpy(&buf, "mlugg");
    const final = buf; // if you don't do this, then you'll get the error
    break :name &final;
};

In our case, of course, we can just omit the ‘&’, which forces a copy (as discussed earlier in this thread, and verified on godbolt).

BUT - here’s the interesting thing… since this is now comptime domain, that copy is NOT a runtime copy, so godbolt tells me that, in this case, break :init c_ in a comptime block is “like” break :init &c_ in a runtime block, in that the assembly doesn’t contain the blob of mov lines. I may not be saying this right, but: “a comptime copy happens at comptime, so you won’t see any “copy” in the assembly code, which (only) indicates what will happen at runtime.”

In short, break :init &c_ will not work in a comptime block (er, with a var, in a block, which is global), and, furthermore, break :init c_, though it does a copy, does the copy at comptime, so you don’t wind up with copy stuffs in the binary, because there’s no runtime copying that has to happen. Interesting. Great. Please correct any details I’ve misinterpreted.

1 Like

This use case is the reason I have been continuously following this issue.

As for the present, I think the appropriate approach is this: do not deliberately hide the instance inside a block, but after initialization, provide it with an immutable interface and continue to use that interface.

    var c_instance = [_]u8{1, 2, 3};
    {
        const c: []u8 = &c_instance;
        c[1] = 5;
    }
    // at this point, c is never change.
    {
        const c: []const u8 = &c_instance;
    }

The compiler options were not set correctly. It should be “-O ReleaseFast”.

No. With ReleaseFast (and ReleaseSafe), the copy from the var to the const is optimized away.

4 Likes

I think this captures the essence of the differing intuitions about what’s best here.

Rust scopes are very much a matter of lifetimes. Very common problems of the borrow XOR mutable kind are solved with scopes, which are created strictly to bound lifetimes in the code. There is considerable angst about how lifetimes and scopes should interact, example: the way an if let bounds lifetimes is broadly considered a bug, because it continues through the else and you generally do not want it to.

It’s a valid way to think about scopes, but I don’t think it’s a good way for Zig scopes to work. Zig has a number of nice mechanisms for using blocks to return values, labeled breaks, use of if / for / while as expressions, etc.

Returning references using this mechanism may not be mandatory to good code, but it’s intuitive, and in my opinion at least, it expresses the code’s intention better than relying on copy elision to work.

Someone who thinks of identifiers as carrying object identity might naturally conclude that the lifetime of the object should be bound to the reference to it: but this is a bad way to think about Zig. Zig uses data identity: all identifiers are references to some specific bytes in memory, and it’s very important to know where those bytes are, and what’s happening to them through the code.

That is not, in and of itself, an argument for or against scoping data lifetimes to blocks. One could easily argue that, as a matter of type in the broad sense, a block-scoped lifetime is the easiest to reason about. It does bound actions through defer, after all: it’s not purely lexical, it’s the lifetime of something.

It’s certainly the only sane thing for Rust to do: but Rust tracks lifetimes, so the stricture this imposes allows for elaborate things to be written in exchange.

I assess languages primarily in terms of mechanical sympathy, which in this context is about being able to reason about the actual result of writing source code, the things which plausibly will or will not happen. Zig is right at the top of the game there.

From that perspective, I think Zig should agree with C: there’s a reason that automatic lifetimes correspond to function scope, and that reason is the stack. In the C standard, automatic lifetimes are an abstraction, but here in ‘the real world’, they’re a data stack. Inlining happens, copy elision happens, all sorts of “as if” things happen, but ultimately there’s a real stack, it gets deeper as function calls nest, then it gets shallow, then it gets deep, and the second one is overwriting the same memory as the first one. That is the reason why we can’t pass references up the function call stack, even though it would be nice.

There are some issues with how the status quo Zig compiler communicates with LLVM about data liveness, so the optimizer is not reusing unreachable memory inside of scopes the way that it should. This is not a good reason to define block scopes the way that Rust does.

Changing how things actually work now will predictably have one effect: it will break a bunch of code, and not necessarily in ways which are easy to find and correct.

What would we get in return? I acknowledge we would get “it would better accord with how some users feel that things should work” but I’m not willing to value that.

What practical positive outcome would we get from the change? I don’t accept “block scopes would work like function calls” as positive either: function calls are real, the stack is real, you can’t write good Zig without knowing how the stack works, and once you do, there’s no mystery.

What’s the upside? I don’t see it.

5 Likes

I’ve changed my mind and agree. I like the practicality of the pattern. I now think it’s ok to pass a reference out of a block through the block return value. Any block memory not dominated by a reference passing through the block return I think could still be eligible to be optimized without an enormous additional complexity given the limited scope of analysis. I agree that a line can be drawn at the function boundary.

The question whether this initialization pattern can be used with inline functions still remains. People have differing intuitions what an inline function means – is it just a named reusable block with no independent stack frame, and hence the same rules apply (you can pass a reference to local var out), or should inline functions preserve function semantics?

1 Like

That one has been decided: inline functions behave like not-inline functions for these purposes.

I waffled on the question myself, but I think this was the right decision. If Zig didn’t have semantic inlining, it wouldn’t be something to waffle about, but I think making inline behave like other kinds of function is the right call there. “You can’t return references to locals from a function” should be a complete sentence, basically.

2 Likes

That’s great. That further clarifies that in Zig, a function is not just a fancy block. We can make meaningful distinctions between functions and blocks without confusion.

What is lost is an intuitive way to make reusable blocks, but that would be a distinct language proposal. Not sure that there’s enough of a use case for the idea.

1 Like

There is an issue megathread on this general subject. It was closed due to arrogance.

>> replies with laughing face <<

Totally agree, but I’m now inspired to ask: it’s only valuable for blocks… not for, e.g., for/if/etc., right? We wouldn’t want to assign a variable declared just outside an if to the address of a const/var declared within the if, would we? I can’t see the advantage of that like I can for init blocks. Perhaps I could see the same advantage for a switch assignment. Oh, I guess there are if-assignments, too. Perhaps I could feel the value here for any assignments that depend on a block; within the block, a “local” var could be referenced beyond that sub-scope, to be available to the function containing the block.

Are there nesting concerns to consider? I don’t know enough about compiler internals, but am aware, as you say, that it’s impossible to pass references up a function call stack (since the stack memory needs to be available as the depth returns from shallow to new-deep); the same is not at issue when we’re talking about nested blocks within a function, right? So, then, theoretically, a var declared deep in an if-for-while-for-if, then const-referenced for “return” all the way up that stack, though still within the function, could still be a sound piece of memory through the end of the scope of that function… right?

Right. You can think of all locals in a function as being in the same “storage scope”, ignoring optimizations when stack space can be shared, or deallocated if not referenced. Or you can imagine that all stack space needed by the function is allocated when called (and this may be what is currently happening).

I think they fixed that with the last edition.

This makes great sense. I hope it stays this way.