Why can't infer Zig array sizes like this when it looks so consistent

kaubonbon · June 1, 2025, 1:10pm

Hi people, I’m quite new to zig and while doing the ziglings I encountered the following topic regarding arrays.

If Zig can infer the type of a variable like this:var test1 = 5;
And zig generally declares types like this: var test2 : u8 = 6;
And declares arrays like this: var test3 : [3]u8 = .{1,2,3};

Why can’t it infer the length in arrays like this: var test4 : [_]u8 = {1,2,3};

Wouldn’t it be most consistently looking that way? is there a logical reason for making it like this:
var test5 = [_]u8{1,2,3};
?

Because I also wouldn’t declare a non array variable like this:
var test6 = u8{255};

What am I missing?

Sze · June 1, 2025, 1:49pm

This needs a dot in front of the brace
var test3 : [3]u8 = .{1,2,3};

Zig has only started to shift its focus towards the var name : type = <expr or literal that uses result location> syntax in recent versions.

That said I think [_]u8{1,2,3} makes more sense, because your hypothetical syntax would allow errors like this:

var test4 : [_]u8 = <runtime value>;

With the [_]u8 being directly in front of the array literal it is impossible to use the syntax with a runtime value, which is good because you can’t create an array with a length that is only known at runtime, the length always needs to be known at comptime.

kaubonbon · June 1, 2025, 2:21pm

Thx, I corrected it above.

I see… But the compiler could easily determine if the value is fixed during compile time, with checking for a [_] literal in the type and give me an error, couldn’t it?

Sze · June 1, 2025, 2:29pm

I think it could check, but I would still say that the current syntax is easier and can be used more directly and locally, without having to do more work for slightly fancier syntax.

While I am not totally opposed to it, I think it would make the compiler implementation more complex, for no obvious benefit besides making the syntax look more regular. But I guess a compiler developer would have to say, whether that is actually true, I don’t have enough in-depth knowledge to be completely sure.

It would also require to add a new error message for something that currently just can’t be expressed, because it is not valid syntax.

kaubonbon · June 1, 2025, 2:47pm

+ you admitted it is slightly fancier

From an zig beginner point of view, the current array syntax is not canonical with the declaration of other variables, which for me was worth writing this topic, because on the ziglang website it says:

“A Simple Language - Focus on debugging your application rather than debugging your programming language knowledge.”

So at some points I must seek insight, why the syntax is like it is, when I don’t understand why the syntax is not ‘simpler’.

At the moment I acknowledge the one point for the current array syntax:

more compiler friendly

Sze · June 1, 2025, 2:56pm

I guess that is fair, but I also thought that lisp is nice simple regular syntax, while others find it to be an absolutely unreadable mess of parentheses

So overall I would say it is difficult to say what would be simple to most people or even the target audience.

Maybe someone else has another perspective/insight on the topic.

castholm · June 1, 2025, 6:33pm

Relevant proposal (not accepted but filed by a core member):

github.com/ziglang/zig

Proposal: allow inferred-size array in type annotations

opened 12:53PM - 17 Apr 24 UTC

mlugg

proposal frontend

I originally wrote up this proposal in a comment on #5038. I've remained a fan o…f it since then, so thought it was worth promoting to a proper proposal. --- The biggest blocker to eliminating `T{ ... }` syntax from the language (which is a broadly discouraged syntax form) is the existence of inferred-size array literals. Today, the syntax `[_]T{ ... }` defines an array whose length is inferred from the number of elements provided. There is no way to achieve the same thing with type annotations, since `[_]T` is not an actual type (arrays in Zig always have a fixed length encoded in the type). I propose that we permit the syntax `[_]T` in a type annotation on `const` and `var` (both local and container scope). It could optionally also be allowed as the operand to `@as`. Like the `[_]T{ ... }` syntax, this results in a normal-fixed size array, and is a specific syntax form: for instance, `const x: ([_]T) = ...` is disallowed, just as `([_]T){ ... }` is disallowed today. When this "type" is used, the expression with this type is given a new kind of result location. All peers of this expression must be array initialization literals with lengths matching the expected length. I think the best way to get an idea of how this would work is to look at the implementation. The new form of result location would be implemented in `AstGen` like this: ```zig /// This expression is the initialization expression of a var decl whose type is an inferred-length array. /// Every result sub-expression must use array initialization syntax. The array's length should be written /// to `chosen_len` so the caller can retroactively set the array length. inferred_len_array_ptr: struct { /// The array pointer to store results into. ptr: PtrResultLoc, /// This is initially `null`, and is set when an expression consumes this result location. /// If an expression has a length which does not match the currently-set one, it can use `src_node` to emit an error. chosen_len: *?struct { len: u32, src_node: Ast.Node.Index, }, }, ``` When we encounter the first peer array initialization, its length is written to `chosen_len`. Later peers will check that their length matches the other length, and emit an error if not. The `var`/`const` decl will create a stack allocation whose length in the ZIR is retroactively rewritten to match the length of the initialization expressions. This kind of result location will immediately trigger an error when encountered for any expression other than array initializers, such as struct inits and any syntax form which calls `AstGen.rvalue`. Here is what the proposal looks like in practice: ```zig // these are all valid const x: [_]u8 = .{ 1, 2, 3 }; const y: [_]u8 = if (condition) .{ 1, 2 } else switch (x) { .foo => .{ 3, 4 }, .bar => .{ 5, 6 }, else => .{ 7, 8 }, }; const z: [_][]const u8 = blk: { if (foo) break :blk .{ "hello", "world" }; break :blk .{ "foo", "bar" }; }; // this is invalid // error: array length cannot be determined // note: result must be array initialization expression const a: [_]u8 = @as([3]u8, .{ 1, 2, 3 }); const b: [_]i16 = blk: { const result: [2]i16 = .{ 1, 2 }; break :blk result; }; const c: [_]u8 = if (cond) .{ 1, 2 } else something_else; // this is also invalid // error: array length '3' does not match array length '2' // note: array with length '2' here // note: inferred-length array must have a fixed length const d: [_]u8 = if (cond) .{ 1, 2 } else .{ 3, 4, 5 }; ``` Implementing this proposal would solve the primary blocker for accepting #5038. I personally believe that proposal to be the right direction for the language, but even if it is not accepted, I feel that this proposal is beneficial, because it brings the language further in line with our preference for direct type annotations over explictly-typed expressions.

kaubonbon · June 1, 2025, 7:38pm

Thanks, now I have some read-up to do

vulpesx · June 2, 2025, 10:14am

simple, regular and unreadable are not mutually exclusive

ericlang · June 2, 2025, 11:11am

Hahaha I have to think about that one.

That saying: the one and only array declaration should be

[a, b, c]

Braces are for scope and blocks.
& is an address.

kaubonbon · June 2, 2025, 6:20pm

That’s true.
The sentence you just posted is a good example.
You could also have said:
“Things that are simple can still be unreadable. And also, things that are unreadable, still can be simple.”
(Now, hopefully I understood correctly )

The more we strip down a language in terms of vocabulary, the simpler we have to communicate. (But also the longer the conversation might be)
On the other hand, a vast vocabulary offers sophisticated expressions and shorthands. (But if we have have one unique word for every expression we obviously overkilled it)

I like both approaches for their respective traits.
As with many things I have the feeling that the middle-ground is the best thing.

There is a certain pleasure to decode obfuscated code or language. There is always a time in space for that.

There is also a certain pleasure in decode the meaning of something great that is comes in a clear language.

For me personally, in programming, a language should have a single way to do a unique task and it should encouraging to write simple code that is readable.
But there is only so much the language itself can do, to achieve this.

I got carried away …

kaubonbon · June 2, 2025, 6:27pm

With 3K open issues, there is a possibilty (even if just small) the language is going back to that