Better optionals ergonomics

ivanhernandez · October 17, 2023, 12:02am

As a newcomer to Zig, I will be probably missing something important… But I would say, there is some problem with how optionals works in conjunction with conditionals (and in the same way, with loops, too).

In this (silly and incomplete) example:

const Foo = struct{};

fn doSomethingWithFoo(foo: *Foo) void { 
    // whatever
}

fn doAThing(optional_foo: ?*Foo) void {
    if (optional_foo) |foo| {
      doSomethingWithFoo(foo);
    }
}

I want you to focus on that conditional within doAThing(). There is that syntax that works with optional variables, allowing using the value inside that variable once it’s assured to contain a value.

Its nice to be able to do this, but most of times, you’re forced to provide different names for the variable containing the optional value and the one inside the closed function, when you don’t really care. In this example, I believe there’s no sense in naming function’s parameter optional_foo. You would naturally name this parameter foo, but that would clash with the name of the parameter of the closure.

If you try to name both foo (spoiler alert, it is not allowed):

fn doAThing(foo: ?*Foo) void {
    // Here, foo is ?*Foo 
    if (foo) |foo| {            // |foo| shadows outer foo
      doSomethingWithFoo(foo);  // within this scope, foo is *Foo
    }
}

Compiler gives an error by not allowing the shadowing of outer parameter. I don’t fully understand the rationale for this to happen, but I believe this shadowing should be allowed for this specific use case. Not only that, this would enable having this kind of syntax sugar:

fn doAThing(foo: ?*Foo) void {
    if (foo) {
      doSomethingWithFoo(foo);
    }
}

Where an optional variable used as conditional would allow not having to write the arguments list, and compiler would assume an argument list with the name of the variable. That way, programmer’s burden of having to decide on two names is alleviated and behaviour would be similar to other languages (ie. C# or Kotlin allows usage of optional variables as if they were not optional, as long as null analysis checks null is not possible within scope).

I understand it is a very specific scenario and current implementation is way more general (ie. it allows arbitrary expressions as conditional), but I believe this pattern is quite usual (in my experience with other languages and small experiments with Zig), to let you consider this as a valid language enhancement proposal.

dude_the_builder · October 17, 2023, 12:27am

You could see this the other way around, in that being extra explicit when naming variables, you actually are relieved of the burden of inventing names, the types tell you what the names should be:

fn doSomethingWithFooPtr(foo_ptr: *Foo) void {}

fn doAThing(foo_opt_ptr: ?*Foo) void {
    if (foo_opt_ptr) |foo_ptr| {
        doSomethingWithFooPtr(foo_ptr);
    }
}

There’s also the very handy orelse for even better ergonomics:

fn doAThing(foo_opt_ptr: ?*Foo) void {
    var foo_ptr = foo_opt_ptr orelse return;
    doSomethingWithFooPtr(foo_ptr);
}

AndrewCodeDev · October 17, 2023, 1:27am

Using your hypothetical syntax, consider the following:

fn doAThing(foo: ?Foo) void {
    if (foo) {
      foo = null; // was the optional or the value set to null?
    } else {
      foo = null; // and how about here?
    }
}

To clear that up, we’d have to introduce more rules about the scope of a variable. I personally don’t like rules like this.

This opens up to a broader discussion about syntactical sugar.

In general, Zig’s syntax is very sugar-free. I’m quite happy with that fact as I think a lot of people who use Zig are as well. There’s been several places where this does quite well and in other contexts can still be a bit annoying (variable initialization for loops, for instance).

squeek502 · October 17, 2023, 5:32am

I usually go with maybe_foo in scenarios like this.

IntegratedQuantum · October 17, 2023, 8:23am

I also find this annoying sometimes, but I think it is good that variable names don’t change in meaning depending on the scope.
Your solution would make the code easier to write, but harder to read, because in order to understand a variable we’d need to check all ifs in the current scope.
And honestly even if you are too lazy to use better variable names, just add an underscore. I do that quite often (shame on me) and I think it’s still a lot easier to read, than your proposed syntax, while being almost as easy to write:

if (foo) |_foo| {
  doSomethingWithFoo(_foo);
}

ivanhernandez · October 17, 2023, 8:32pm

The thing is, I would rather not to write type details in variable names (or having to add variables explicitly) because of limitations in the language.

On the other hand, I did not thought in using orelse for this case, that happens to solve the problem for me in this example quite well (although it would force to create a distinct function for “unwrapping” the value in a more realistic problem, I believe it would be justified on that case).

Thanks for the feedback.

ivanhernandez · October 17, 2023, 8:57pm

Of course, I understand my suggestion may not align well with Zig’s design ideas (as I said, I just started to scratch the language).

Anyway, I still find inconvenient having to be this explicit in that example, but as @dude_the_builder showed, there are other mechanisms in the language that I did not consider because of my lack of familiarity with Zig (my fault).

About your example, you’re right it looks hard to read, but I would say this would only be valid in the case parameter foo was of type ??Foo (is this a valid type?). In my interpretation, the semantics would be the current ones (except for variable shadowing allowance), and so the scope between if and else would have a foo of type Foo, so assigning null there, would only lead to a compilation error. In the else branch, foo would be the function parameter itself… I see it can be misleading having two foo names referring to different things if that’s what you’re trying to point me to. And although it works Ok in other languages, it may not fit Zig well.

So, let’s finish this proposal here.

ivanhernandez · October 17, 2023, 9:24pm

That’s what I felt as an antipattern and what I tried to circumvent with the proposed syntax (but as it seems, it has its own load of inconvenience).

AndrewCodeDev · October 17, 2023, 9:27pm

I just want to point out that we support conversation about possible syntax variations. It’s fine to bring it up and if you look at the github issues, you’ll find a complete history of ongoing conversations about this (look at the issues surrounding anytype). Can’t guarantee that people will agree, but it’s usually an interesting conversation.

ivanhernandez · October 17, 2023, 9:34pm

It’s never about having to write the code, but having to read it. You’re right about this syntax would made understanding variable origin more obscure.

Still I sense this if (foo) |_foo| (or if (maybe_foo) |foo| ) awkward, but revising Zig’s compiler codebase, I see most of the time you’re having an struct’s field access expression inside the conditional. Probably the scenario I exposed in my example is not as usual in Zig as I thought beforehand.

Fault’s on my side… should have been familiarizing more with the language before proposing something like this

AndrewCodeDev · October 17, 2023, 9:37pm

In general, the attitude about Zig is that there shouldn’t be an “advanced” syntax that creates supersets of the language (I’m looking at C++ in this case and its lambda captures, generic variadics, fractal interpretation of the word static/inline, etc).

Zig is more on the “context free” side of things - so keywords like fn and such.

So a statement like if (optional) introduces context. We need to know that this is an optional value and not just a predicate. Since we don’t have custom definable conversion operators (thankfully), optional can’t be used directly as a boolean value.

Otherwise we could do statements like if (foo or bar) where foo or bar could be optionals.

This is completely accepted syntax where conversion operators are definable.

tauoverpi · October 18, 2023, 1:36pm

A possible solution would be the ability to assert(foo != null) where from that point foo is *Foo as nothing below that line could ever be null thus foo can be used to access the unwrapped value. The same could be present for if (foo != null and bar != null) { ... } where both are “unwrapped” within the if block. There’s a similar proposal to this, I should probably update it if this case is missing.

kristoff · October 18, 2023, 1:48pm

btw no weird sleight-of-hands games with identifiers help simplify Zig tooling. this is usually an underappreciated benefit of not having complicated rules for those

Luke · October 18, 2023, 2:51pm

Interesting. So complexity must be handled somewhere and the trade-of zig makes is to share the burden with the developer where other recent languages might tend to hide some of that complexity away making everything from language design to tooling harder.

It could be a winning move for zig with the adoption of tools like copilot which can help handle that complexity outside of the language.

Thanks for that high level perspective!

AndrewCodeDev · October 18, 2023, 4:57pm

The if (foo and bar) is one of these reasons I’m strongly opposed to the idea. Consider the following…

What about in the case of if (foo or bar)? In this case, the or statement would not tell you which of the optionals was not null. And the case of xor is comical because one of them had to fail lol.

So in that case, you’d still have to check foo and bar again before you use them.

I see that what you proposed is foo and bar, but it’s the same issue as above - this invents new rules that complicates things further. In other words, now there’s a rule surrounding and vs or for optionals.

ivanhernandez · October 18, 2023, 5:22pm

Agree. Easy grammar and (arguably) semantics is the reason why Go tooling went very good, very fast and good C++ tooling is an oxymoron, since the mid-90’s.

On the other hand, Zig’s comptime is quite demanding on tooling, too. I would say you must always balance your options and decide… isn’t that the funny (although hard) part in langauge design?

tauoverpi · October 18, 2023, 8:58pm

The rule extends beyond optionals for anything that can be directly compared and is of a value type.

github.com/ziglang/zig

proposal: learn comptime-known-ranges from branches on runtime values

opened 08:37AM - 16 Sep 22 UTC

tauoverpi

proposal

Related to #12863 Instead of narrowing (creating a new type) one could attach… the reduced range to the type within the block and treat any branches outside of that range as dead code without changing the underlying type. Consider the example given: ```zig const std = @import("std"); const E = enum { a, b, c, d, }; pub fn main() !void { var e: E = .a; switch (e) { .a => {}, .b => {}, else => |narrow| { switch (narrow) { .c => {}, .d => {}, } }, } } ``` Here `@TypeOf(narrow) == E` while the range of `narrow` only includes the later `c` and `d` tags. If we change `e` to be `const` then the range of `e` within the same scope that `narrow` is introduced is equivalent but the underlying type remains the same. This can be expanded upon to include other forms of branching such as: ```zig extern fn foo() E; test { const e = foo(); if (e != .a) { // .a is not to be part of the range switch (e) { .b, c, d => {}, } } else { // .a is the only tag within range switch (e) { .a => {}, } } } ``` Thus zig can learn the range of a value at compile-time by keeping track of the set/range of possible values that remain within a branch. By keeping the same underlying type it's possible to operate on integers : ```zig extern fn foo() u8; test { const e = foo(); switch (e) { 0 => switch (e) { // only possible value, `e` is comptime known here 0 => {}, }, 1...10 => switch (e) { // Only possible range, matching on other values is comptime known to be false. // Note that the value of `e` remains runtime-known, only the possible range is // comptime known 1...10 => {}, }, else => switch (e) { // `e` is known to be 11 or above given the other branches 11...255 => {}, }, } } ``` And implicitly cast based on the range: ```zig extern fn foo() u8; test { const e = foo(); if (e < 0x10) { // `e` remains as `u8` here but we can avoid the // `@intCast` as the range is comptime-known var s: u4 = e; _ = s; } } ``` If the range of a value on the right hand side of a truncating operation is comptime-known then the range of the result is guaranteed to be below the top of the range thus any use of `%` and `&` narrows the range of the type. The same applies to `comptime_int` on the right hand side: ```zig extern fn foo() u8; test { const bar: u4 = foo() % 16; const baz: u4 = foo() & 15; } ``` The same applies for addition and multiplication: ```zig extern fn foo() u8; test { const x: u4 = foo() % 16; const y: u8 = foo() + x; // range 0...(255 + 15) (possible overflow) const z: u8 = foo() +| x; // range 0...255 (no overflow) _ = x; _ = y; _ = z; } ``` Since the type of the value doesn't change, the value can still be passed safely to functions without casts: ```zig extern fn foo() E; extern fn bar(E) void; test { const e = foo(); if (e != .a) { bar(e); // no cast, `e` keeps it's type `E` } } ``` However integers can be narrowed when the range of the value is within the range of a smaller destination type: ```zig extern fn foo() u32; extern fn bar(u8) void; test { const value = foo(); const small: u4 = foo() & 15; if (value < 256) { const sub = value -| small; // range 0...255 const add = value +| small; // range 0...(255 + 15) bar(sub); // within range, narrowing is safe here bar(add); // compile error: expected type 'u8', found 'u32' range 0...(255 + 15) } bar(value); // compile error: expected type 'u8', found 'u32' range 0...4294967295 } ``` Note that if the range is larger than that of the underlying type then overflow is possible. ## Operators Thus the standard operators should now result in: ```zig extern fn foo() u32; const x: u32 = foo(); const y: u32 = foo(); // the range of both `x` and `y` in this example const m = math.maxInt(@TypeOf(x, y)); assert( @TypeOf(x + 1) == u32 range(1...m + 1) // possible overflow and @TypeOf(x + y) == u32 range(0...m + m) // possible overflow and @TypeOf(x - 1) == u32 range(-1...m) // possible underflow and @TypeOf(x - y) == u32 range(-m...m) // possible underflow and @TypeOf(x * 2) == u32 range(0...m * 2) // possible overflow and @TypeOf(x * y) == u32 range(0...m * m) // possible overflow and @TypeOf(x | y) == u32 range(0...m | m) and @TypeOf(x << 5) == u32 range(0...(m << 5) & m) and @TypeOf(x >> 5) == u32 range(0...m >> 5) and @TypeOf(~x) == u32 range(0...m) and @TypeOf(x <<| y) == u32 range(0...m) and @TypeOf(x +| 1) == u32 range(1...m) and @TypeOf(x +| y) == u32 range(0...m) and @TypeOf(x -| 1) == u32 range(0...m - 1) and @TypeOf(x -| y) == u32 range(0...m) and @TypeOf(x *| 2) == u32 range(0...m) and @TypeOf(x *| y) == u32 range(0...m) and @TypeOf(x % 8) == u32 range(0...7) and @TypeOf(x % y) == u32 range(0...m -| 1) // range of y - 1 and @TypeOf(x & y) == u32 range(0...m & m) // the minimum range of the two and @TypeOf(x +% 1) == u32 range(0...m) and @TypeOf(x +% y) == u32 range(0...m) and @TypeOf(x -% 1) == u32 range(0...m) and @TypeOf(x -% y) == u32 range(0...m) and @TypeOf(x *% 2) == u32 range(0...m) and @TypeOf(x *% y) == u32 range(0...m) ); ``` note: the `range(n...m)` syntax is not part of the proposal ## Extension For safety, allow function parameters and return types to express the range desired and throw a compile-error when the range cannot be satisfied: ```zig fn foo(x: u32 range(0...0xffff)) u32 range(0...0xff) { return x & 0xff; } ``` or maybe as an expression: ```zig fn foo(x: u32, a: []u32) where (x + 1 < a.len) u32 { return a[x] + a[x + 1]; } ``` However ranges for parameters/return types requires more thought.

morezig · October 19, 2023, 11:17pm

we can unwrap multiple optionals using the helper function that takes a tuple and use destructuring for convenience

var opt_a: ?i32 = null;
var opt_b: ?f32 = 2.2;
if (unwrapAll(.{ opt_a, opt_b })) |unwrapped| {
    var a, var b = unwrapped;
    std.debug.print("a = {}, b = {}\n", .{ a, b });
} else {
    std.debug.print("unwrap failed", .{});
}

It was fun to implement it using refiy type

fn UnwrappedType(comptime T: type) type {
    switch (@typeInfo(T)) {
        .Struct => |struct_info| {
            var unwrapped_fields: [struct_info.fields.len]std.builtin.Type.StructField = undefined;
            inline for (struct_info.fields, 0..) |field, i| {
                switch (@typeInfo(field.type)) {
                    .Optional => |field_info| {
                        unwrapped_fields[i] = .{
                            .name = field.name,
                            .type = field_info.child,
                            .default_value = field.default_value,
                            .is_comptime = field.is_comptime,
                            .alignment = 0,
                        };
                    },
                    else => @compileError("all fields must be optional type!"),
                }
            }

            return @Type(.{
                .Struct = .{
                    .layout = .Auto,
                    .fields = &unwrapped_fields,
                    .decls = &.{},
                    .is_tuple = true,
                },
            });
        },
        else => @compileError("parameter must be struct type!"),
    }
}

fn unwrapAll(tuple: anytype) ?UnwrappedType(@TypeOf(tuple)) {
    var result: UnwrappedType(@TypeOf(tuple)) = undefined;
    inline for (tuple, 0..) |opt_field, i| {
        if (opt_field) |field| {
            result[i] = field;
        } else {
            break;
        }
    } else {
        return result;
    }
    return null;
}