A counter proposal to @mlugg's "make by-ref capture semantics consistent" proposal

@mlugg’s “make by-ref capture semantics consistent” proposal: Proposal: make by-ref capture semantics consistent · Issue #23509 · ziglang/zig · GitHub, which proposes to make all implicit-ref captures explicit.

How about make all captures all implicit-ref? Even for array captures.
In the following code, the first two captures compile okay, the 3rd one fails to compile.

const print = @import("std").debug.print;

pub fn main() void {
    var o: ?u32 = 0;
    if (o) |*x| x.* = 123;
    print("{any}\n", .{ o }); // 123
    
    var arr: [2]u32 = .{0, 0};
    for (&arr) |*x| x.* = 123;
    print("{any}\n", .{ arr }); // { 123, 123 }
    
    var arr2: [2]u32 = .{0, 0};
    _ = &arr2;
    for (arr2) |*x| x.* = 123; // error
}

As @mlugg’s proposal states, here is an inconsistency. Different from that proposal, I would argue that using for (anArray) to iterate array elements is not a necessary syntax. And the current by-value semantic causes confusions for new Zig programmers:

const assert = @import("std").debug.assert;

pub fn main() void {
    var arr: [2]u32 = .{0, 0};
    for (arr, 0..) |x, i| {
        if (i == 0) arr[1] = 123;
        if (i == 1) assert(x == 123); // panic
    }
}

Is it a good idea to let for (anArray) also implicit-ref? For those cases which do need a copy of anArray, we can make the copy before the loop. (No more implicitness here, the current by-value semantic also using an implicit copy of the iterated array. Remove one and add one, so zero increase.)

This will make the semantics consistent and the language change small and keep the language simple.

Another idea is to disallow iterating arrays at all.

2 Likes

Since this is basically a comment on the linked issue, it might be more helpful to the core team to treat the issue as “copy-on-write” and open a linked issue on Codeberg to continue discussion?

It is the first time I see if (o) |*x| x.* = 123; or for (&arr) |*x| x.* = 123; written that way and it is honestly pretty confusing. I don’t understand why the creation of a pointer is done via the dereference operator * on the capture block instead of the address-of operator & in the operand.

I don’t know enough yet to form an opinion but I give my two cents anyway because no one can stop me (and if you can, don’t):

// if example => if "opt" has a value, then change it to 456

var opt: ?u32 = 123;

if (opt) |*x| x.* = 456;              // makes me uncomfortable
    
if (opt) |_| opt.? = 456;             // a little redundant
    
opt = if (opt != null) 456 else null; // makes sense

opt = if (opt != null) 456;           // literally perfect but does not compile
1 Like

well, both are done.

for (arr) |x| loops over the array as a value; semantically it copies arr. what it yields are elements of the (copied) array.

for (&arr) |x| loops over the array by reference; semantically the &arr functions as it always does, taking the address of the array. what it yields are elements of the (referenced) array. similar to how variable names declared as function arguments are always const, x is const

for (&arr) |&x| is unambiguous from a parsing perspective and could do the work that is currently done by for (&arr) |*x|, but for my money, I like it less than the status quo. The reason is that we are not actually taking the address of x; we are introducing a new variable name, x, but we are doing so by taking a pointer. In that way, I like to think of *x as being a distant cousin of the x: *T syntax.

opt = if (opt != null) 456; not working is a little surprising to me, actually. I guess it probably would work if you wrote

opt = if (opt != null) 456 else null; although it probably is not what you intend, since opt still has type ?u32 and you seem to intend to shadow it with a variable of type u32, which is probably a Zig non-starter

1 Like

Ternary if / if-expression without an else is strange and doesn’t make any sense, if you just want to assign 456 if opt isn’t null you can use:

if(opt != null) opt = 456;
3 Likes

probably this should have been a reply to the thread and not to me :slight_smile:

1 Like

Am I missing something? I don’t see how compiler can optimize such loops and captures should be “volatile” in a sense.

I expected this works, but it doesn’t.

opt orelse _ = 456;

BTW, when using a |*x| capture, we can read the old value and set a new value.

1 Like

Sorry, could you elaborate more? I have a difficulty in understand this.

To satisfy

pub fn main() void {
    var arr: [2]u32 = .{0, 0};
    for (arr, 0..) |x, i| {
        if (i == 0) arr[1] = 123;
        if (i == 1) assert(x == 123); // panic
    }
}

a compiler should load value from memory on each iteration. It’s highly unexpected and restrictive behavior for low level language.

Do you mean load the whole array for one time? CPU cache related? I’m not an expert on this, but if the answer is true, I think what you said is reasonable.

After thinking for some time, I think my proposal is not a good one. Now, for (anArray), for (anArrayPointer), for (aSlice), and for (aMayItemPointer, 0..n) all use by-value semantics. Changing the semantic to arrays creates a new inconsistency.

I personally agree with a few points below:
It’s absolutely necessary to explicitly distinguish between value capture and reference capture.
I hope value capture is the default behavior, not reference capture.
I don’t want the default behavior to change depending on the type that’s caught.

The current Zig already meets my three key requirements.

Nitpicking a bit, |*x| might look a bit weird. As a C programmer, I can smile knowingly and understand what it means: the captured result can be obtained by dereferencing x, so x is the reference capture. But now we’re in Zig, and we don’t have *x to express dereferencing syntax, only x.*. Sometimes I wonder why it’s not |x.*| here.

1 Like

The variable x is declared in the capture. It’s type is a pointer, so *xis similar at least to the syntax for pointer params, locals, etc, although the type is not included. x.* OTOH is an expression that dereferences, not a declaration.

2 Likes

We already have & to generate references/pointers.

var x: u8 = 0;   // x is a value

const y = &x;    // we use & before a value to create a reference
y.* = 2;         // we use * after a reference to access a value

As I understand it, the capture block of an if statement is the thing that unpacks it and offer it’s value (if found).

We just saw that to create a reference to a value we must apply the & operator before it. So logically it should be written as:

var opt: ?u32 = 123;

if (opt) |&x|    // we use & before a value to create a reference
    x.* = 456;   // we use * after a reference to access a value

Sadly, the x identifier means something different inside the capture block (value of opt) and in the if block (reference to the value of opt).

& returns a pointer to its operand. |&x| makes sense mnemonically but not logically.

Isn’t

if (opt) |&x|    // we use & before a value to create a reference
    x = 456;

more logical?

Perhaps

for (arr) |x:*| x.* =123;

But what I think I want is

|x: *u32|

and

|x: u32|
1 Like

i think

if (opt) |*x| x.* = 456;

is the most logical. since x is scoped into just the other side of the if, your proposal of assigning to it like a var variable makes it less obvious that it actually affects the outer scope.