Proposal: Disallow arithmetic binary operators with operand types that differ from their result types

npc1054657282 · August 22, 2025, 4:52am

Proposal: restrict `++` to allow it to interact with RLS

opened 10:55AM - 19 Aug 25 UTC

breaking proposal

## Background The accepted proposal #24738 is removing the `**` operator from Z…ig, because its main use case -- initializing arrays with a constant element -- has been superseded by `@splat` due to #20433. This is a welcome change, because `**` is quite awkward to use correctly due to its non-interaction with RLS. Code like `const x: [10]u32 = .{123} ** 10;` only works because Zig allows the generated *tuple* of 10 `comptime_int` values to coerce to the array type `[10]u32`. (Actually, that's not quite true: the compiler *does* look at the result type of a `**` operation, but this is *purely* an optimization to work around performance problems, because the compiler performance impact of going through this tuple-array coercion after `**` was often severe.) Meanwhile, code like this: ```zig const MyStruct = struct { field: u32 }; const x: [10]MyStruct = .{.{ .field = 123 }} ** 10; ``` ...fails to compile at all, because `.{ .field = 123 }` has an anonymous struct type (effectively meaning the compiler implicitly creates a "new" struct type), which cannot coerce to `MyStruct`. The `@splat` builtin, aside from just being more readable and nice to use, does not suffer from these problems, so is clearly superior to `**` for constant array initialization. Unfortunately, however, the `++` operator -- which we have no current plans to remove from the language -- suffers from much of the same type-system awkwardness as `**`. When the `++` and `**` operators were designed, Zig was a quite different language, where these operators' behaviors fit in quite neatly with existing semantics. For instance, "anonymous struct types" used to be a distinct concept in the language with special coercion rules, where such types could coerce to structurally-equivalent types, which allowed writing code like this: ```zig const anon = .{ .field = 123 }; const MyStruct = struct { field: u32 }; const typed: MyStruct = anon; ``` In this world, the behavior of `++` and `**` made sense. But over time, Zig has moved in a different design direction: types are not implicitly converted at the point of binding, but rather propagated through expressions via Result Location Semantics, more specifically the system of Result Types. I won't hash out the details of this system yet again; if you're unfamiliar, you can [learn about it in the language reference](https://ziglang.org/documentation/master/#Result-Types). The world we live in today looks like different -- we now generally expect types to be bound immediately and propagated through expressions, so that values have the right type from the moment they are created. This world is broadly a better one: it results in more legible code (types are known early on), makes life easier for tooling (because, uh... again, types are known early on!), and improves compiler performance (due to "deep coercions" happening far less frequently) and potentially even runtime performance (again, because we avoid these "deep coercions"). However, `++` has been somewhat left behind. Using `++` today often requires the user to perform explicit coercions which are visually noisy and make code harder to read. For instance, consider appending to a slice of struct values (the typical use case here is dynamically constructing a slice at comptime whose length is not known in advance) -- this requires the user to write an explicit coercion like this: ```zig const MyStruct = struct { x: usize }; comptime { var vals: []const MyStruct = &.{}; for (0..100) |i| { // doesn't work //vals = vals ++ .{.{ .x = i }}; // instead you need something like: vals = vals ++ .{MyStruct{ .x = i }}; } } ``` This also currently acts as an argument against #5038, since `vals ++ .{@as(MyStruct, .{ .x = i })}` is even clunkier than the above. But this is clearly silly -- the `++` expression has a result type of `[]const MyStruct`, so surely it could propagate the information to its operands that they should also be aggregates of `MyStruct`? Well, the problem with this today is that `++` is too permissive with types. Note that the RHS expression in the above snippet is actually a tuple `struct { comptime MyStruct = ... }`. It is also allowed to be an array `[1]MyStruct`, or a slice `[]const MyStruct`, though *not* a pointer-to-tuple `*const struct { comptime MyStruct = ... }`. This means we can't propagate a result type to the RHS of `++`, because there are too many types the expression *could* have! So, integrating `++` with RLS will require restricting it in a breaking way. That brings us to this proposal. ## Proposal Tighten the definition of `++` so that exactly one of the following holds: * Both operands are tuples, and the result is a tuple. * Both operands are arrays, and the result is an array. * Both operands are slices, and the result is a slice. (I have intentionally omitted vectors from this list for now, because I don't think appending vectors is a useful operation nor one we should be encouraging. That could be supported if there are real use cases I haven't thought of.) Under this definition, if `lhs ++ rhs` has a result type `T`, we will propagate result types to its operands as follows: * If `T` is a tuple, do not forward any result types; we don't know where to split the tuple. * If `T` is an array `[n]Elem`, forward the result type `[_]Elem` to `lhs` and `rhs`. (This isn't a real type, but it can be considered to exist in the context of RLS; this is used today to give a result type to `expr` in `@as([]const T, &expr)`.) * We could potentially forward a concrete length to `rhs` having evaluated `lhs`. For instance, if `T` is `[10]u32`, and `lhs` turns out to be an `[4]u32`, then we can provide the result type `[6]u32` to `rhs`. This makes the operator a little less symmetric, but it might be handy in some cases, e.g. `const long_array: [10]u32 = short_array ++ @splat(0)`. * If `T` is a slice `[]const Elem` (or a pointer-to-array), forward the result type `[]const Elem` to `lhs` and `rhs`. Here's what this means in practice for various use cases: * String concatenations like `@compileError("foo" ++ "bar")`: these values are already slice-like (i.e. pointer-to-arrays), so will work regardless. * But note that `@compileError` *does* give its operand a result type, so the operands *will* be coerced here, meaning you could also write very unusual code like `@compileError("foo" ++ .{@intCast('!')})`. * `@call(.auto, myFunc, .{a} ++ other_args)`: no result type is provided to the `args` operand of `@call`, and these values are tuples anyway. So result types are not involved, but this continues to work, as `.{a}` and `other_args` are both tuples. * `slice = slice ++ .{elem}`: as-is, this no longer works -- the rhs must become `&.{elem}`. But once you do that, everything looks much better: a result type is propagated to `elem`, making type inference work much more nicely. This avoids the need for typed initializers and potentially improves compiler performance a bit by doing fewer coercions. So, common use cases generally either continue working as-is, or require a small change (but might work better after said change). Skimming through the standard library, I don't see anything which becomes significantly worse with this change (the most change necessary to keep code compiling looks to be adding an `&`). This change shouldn't be too difficult to implement in the compiler, so we could probably trial it relatively easily if there is concern of unforeseen breakage. ## Further Thoughts With this proposal eliminating another dependency on "deep" coercions (e.g. tuple -> array), I think it could be desirable to eliminate some of that functionality -- namely, disallow coercions between arrays, vectors, and tuples. I strongly suspect these coercions are mostly redundant now that RLS is core to Zig's design. Array<->vector coercions *are* used with more intent at times, but I suspect that's often a bad idea -- those coercions could be expensive at runtime! Coercion between structurally-coercible tuples (e.g. `struct { u32, u32 }` -> `struct { u64, u64 }`) is a more "intentional" design choice, so should probably be considered separately, though it too has the issue of potentially high runtime cost. I think removing the array<->vector / array<->tuple / vector<->tuple coercions is definitely an idea worth exploring; most code that breaks as a result of that change would likely point to a part of the language's design which needs updating to mesh properly with the Result Type system.

#24908 proposes a rule for the binary operator ++ to propagate the result type to its operands.
As far as I know, most binary operators currently do not propagate the result type to their operands; this is the current behavior:

test "shift, wrapping and saturating" {
    const a: u8 = 255;
    const b: u8 = 255;
    var c: u32 = a +% b;
    try std.testing.expectEqual(254, c);
    c = a +| b;
    try std.testing.expectEqual(255, c);
    c = a << 1;
    try std.testing.expectEqual(254, c);
    const d: u16 = 255;
    c = a + d;
    try std.testing.expectEqual(510, c);
}

If we accept the concept of type deduction for both the operands and the result of a binary operation, such as ++, then we might expect:

    const a: u8 = 255;
    const b: u8 = 255;
    var c: u32 = a +% b;
    try std.testing.expectEqual(510, c); // This is wrong in the current version

However, the objection is that this rule can also be surprising and lead to misuse:

github.com/ziglang/zig

RLS for LHS of shift operands

opened 03:58PM - 22 Jul 23 UTC

mlugg

breaking proposal

## Proposal In status quo, it is common to have to write code like this: ```…zig @as(u64, 1) << n // where n is runtime-known ``` This is because the LHS of a shift operand never has any known result type, so often must be explicitly provided, which (for a constant LHS) probably means using `@as`. The type returned by a shift operation is (in status quo) always the same as the type of its LHS. That means we could change the language to apply RLS. The rule here is simple: if `a << b` or `a >> b` has a result type `T`, forward that result type onto `a`. If there is no result type, work as normal. This would make, for instance, the following work: ```zig var n: u6 = something(); const x: u64 = 1 << n; ``` ## Breaking This is *almost* a non-breaking change. There are two ways I can see for existing code to break. The first is related to comptime integer coercions. Consider this code: ```zig const x: u8 = @as(u64, 256) >> 1; ``` Here, the type of the shift expression is a `u64`, but it has value `128`, so (since it is comptime-known) can coerce safely down to a `u8`. Under the new semantics, the result type of `u8` will be forwarded to the LHS, resulting in this code emitting a compile error since `256` cannot coerce to a `u8`. The failure mode for this breaking change is a compile error. The second way code could break is the other way around: upcasts. Consider: ```zig const x: u64 = @as(u8, 255) << 1; ``` In status quo, the value stored in `x` is 254, because the shifted top bit is discarded (due to the shift being on a `u8`). Under the new semantics, the LHS is coerced up to a `u64`, making the result value instead `510`. This change could cause incorrect behavior in a program. However, I anticipate this pattern of depending on truncation of a shift but immediately coercing it up afterwards is rare. Indeed, when presented differently, it looks more like this proposal is solving a footgun: ```zig const x: u8 = foo(); // [a bunch of stuff] const y: u64 = x << 2; ``` It's not great that you currently have to look back at the type of `x` to understand this behavior: under the proposal it can instead be gained through local information, which is probably the intent in this snippet. Only if type inference is used for `y` does the type of `x` matter.

Therefore, my proposal is to infer the result type of binary operations from the operands and prohibit implicit conversion of the operand types to the result type.

This means that the following will become a compilation error:

    const a: u8 = 255;
    const b: u8 = 255;
    var c: u32 = a +% b;
    try std.testing.expectEqual(254, c);

We need to rewrite it to the following code to keep the behavior consistent with the current version:

    const a: u8 = 255;
    const b: u8 = 255;
    var c: u32 = @intCast(@as(u8, a +% b));
    try std.testing.expectEqual(254, c);

Or convert the type of the operand to achieve another expected result:

    const a: u8 = 255;
    const b: u8 = 255;
    var c: u32 = @as(u32, a) +% @as(u32, b);
    try std.testing.expectEqual(510, c);

Increasing this limit also works well with #22182.

github.com/ziglang/zig

Proposal: remove Peer Type Resolution from the language

opened 11:21PM - 07 Dec 24 UTC

mlugg

breaking proposal

~~This may be the spiciest proposal I ever write.~~ oh man was i wrong, everyone… seems to love this one? ## Background Peer Type Resolution (PTR) is a mechanism to combine a set of arbitrarily many types into a final type which all of the inputs can coerce to. For instance, `u8` and `u16` peer resolve to `u16`, while `*volatile [4:0]u8` and `*const [7:0]u8` peer resolve to `[:0]const volatile u8`. The main purpose of PTR is to combine the results of different control flow branches. Opposing branches of a control flow construct are known as "peers", and PTR is applied to the results of those branches to figure out the result of the entire construct. For example: ```zig var t = true; // runtime-known test { const a: *const volatile [4:0]u8 = "abcd"; const b: *const [7:0]u8 = "abcdefg"; const x = if (t) a else b; @compileLog(@TypeOf(x)); // [:0]const volatile u8 } ``` The output here shows that the `if` statement applied Peer Type Resolution to the two "peer expressions" `a` and `b`. **This issue will propose removing Peer Type Resolution from Zig.** However, we must first note that Peer Type Resolution is also used in three other places in Zig. * Binary operators like `+` and `&`, including some builtins like `@addWithOverflow`, apply PTR to their operands. This proposal **does not** propose changing these operators. It is likely that their typing rules will change in the future anyway (see #3806, #16310), but even if they don't, these operators require only a very small subset of the behavior of modern PTR. * Passing multiple arguments to the `@TypeOf` builtin. In this case, the builtin will apply PTR to the types of the operands. This proposal **does** suggest removing this functionality. * When `switch`ing on a tagged union, a prong containing multiple items with distinct payload types apply PTR to determine the type of the capture; e.g. `switch (u) { .my_u32, .my_u16 => |u32_val| ... }`. This proposal **does** suggest removing this functionality. When we get into the meat of the proposal, I'll discuss these cases in a little more detail. ## Problems with PTR Peer Type Resolution, while commonly accepted as a part of Zig, actually has some problems. Firstly, it leads to a common way for semantics to differ between runtime and comptime execution. For instance, consider this code: ```zig fn f(b: bool) u32 { const opt = if (b) null else @as(u32, 123); if (opt) |x|return x; return 0; } ``` This function is fairly straightforward, if quite esoteric. But the key point here is: what is the type of `opt`? When `f` is called at runtime, the type is determined using PTR; the peers have types `@Type(.null)` and `u32`, which peer resolve to `?u32`, so `opt` has type `?u32`. This makes the following `if` statement work as expected. However, if `f` is called at comptime, PTR is not used, because the not-taken branch of the first conditional is not evaluated. This is a very useful feature of Zig, and not one that should change; but it means that the type `opt` is either `@Type(.null)` or `u32`, depending on the comptime-known value of `b`. In the `u32` case (i.e. `comptime f(false)`), this causes the second `if` statement to emit a compile error, because `u32` is not an optional type, so this construct is invalid! So, this code emits a compile error when called at comptime. The second issue is that it can unintuitive impacts on comptime-only types. The statement `const x = if (b) 1 else 2` is invalid at runtime in Zig, because the resoved type of `x` is `comptime_int`, and its value depends on runtime control flow (the `if` expression), so we have a value of a comptime-only type depending on runtime control flow, which is disallowed. However, this error goes away if one of the peers has a concrete integer type -- for instance, `const x = if (b) 1 else @as(u32, 2)` works fine, because the first peer is coerced to `u32` which *can* exist at runtime. This kind of "spooky action at a distance" can be confusing for new Zig users. Lastly, it can hinder readability. Consider these definitions: ```zig const x = if (b) "hi" else "world"; const y = if (b) "hello" else "world"; ``` What are the types of `x` and `y`? If you said `[:0]const u8`, you're actually wrong; that's the type of `x`, but the peers of `y` are strings of the same length, so the types peer resolve to `*const [5:0]u8`. In this case, that's probably not a huge deal, but you can imagine it being more confusing when, say, integer widening is involved, or more significant pointer qualifiers like `volatile`. In these cases, it would be more clear to annotate the type of the variables. This can also help to clarify what properties of the type your code depends on; for instance, a user might annotate the type of `x` as `[]const u8`, because the null terminator doesn't matter for their use case. ## Proposal Remove Peer Type Resolution from the language. The features of Zig which utilize it are changed as follows: * Expressions combining peers (e.g. `if` expressions, `switch` expressions, labeled blocks) emit a compile error if all peers are not of the same type (excluding `noreturn` peers). * Binary operators using PTR are unchanged; they can use a simple, specialized form of the algorithm. * `@TypeOf` accepts only one operand, making this proposal a reversal of #439. * `switch` on a tagged union requires all payloads to have the same type, making this proposal a reversal of #2812. This change resolves all three of the issues described above: * Since all peers must have the same type, it's fine that comptime only evaluates one; the expression will have the same type. This eliminates the need for #13025, which would be a complex language change to fix this issue. This would also solve #5462, which is the same issue. * It would no longer be possible for e.g. `comptime_int` to implicitly become a runtime type due to the type of a peer; if you intend for the result to be e.g. a `u32`, you would have to coerce all peers. More realistically, you would annotate the type *outside* of the expression; more on this in a second. * The type of an expression would have to be the same across all peers, making it hopefully obvious. In cases where the types are not identical, you would use an explicit type annotation; again, more on this below. This proposal also simplifies the language in general, which is a nice plus. The effect of this proposal on user code would be to **encourage more type annotations in places where types are non-obvious**. This style of including type annotations where possible is something Zig has been moving towards in recent years: * We have an "unofficial" preference for `const x: T = .{ ... }` over `const x = T{ ... }`. * Decl literals encourage writing `const x: T = .foo` rather than `const x = T.foo`. * The "new" (not that new anymore) casting builtins encourage explicit type annotations by sometimes requiring them around type casts. The advantages of explicit type annotations are as follows: * It increases readability for humans, since it becomes easier to know what types different expressions have; in particular, giving local variables type annotations can make the variables' uses easier to understand. * It is useful to tooling acting on Zig source code; for instance, language servers or documentation tooling which is performing a "best effort" interpretation of code without full semantic analysis capabilities can know more types with certainty, just like how humans can. * For container-level declarations, it increases the ability of the compiler to be parallelized, since the type can be determined while queuing value resolution for later. * For container-level declarations, it will allow self-reference and mutual references; see #131. With all of these in mind, it's pretty clear that type annotations are a Good Thing, and I tend to support features which encourage more of them (within reason). I think this proposal probably falls within that category. ## Impact on Real Code This proposal will almost certainly cause a lot of breakage in the wild, including in the standard library. As I see it, the main question will be whether the diffs required to fix these breakages make code *more* or *less* readable. I strongly suspect the answer is that code will become more readable. However, I think we will have to implement this in the compiler (which would be relatively straightforward) and take a look at some of what breaks in a large codebase, probably the standard library and the compiler itself. # EDITS BELOW ## Clarification: Result Types This proposal never affects semantics when an expression has a result type. For instance, this code still works: ```zig const x: u32 = if (b) 123 else my_u16; ``` Here, even though the peers have types `comptime_int` and `u16`, the result type of `u32` is propagated to these expressions and is applied before the values "exit" the conditional branch. This code working is actually a key motivation for this proposal: it encourages adding type annotations like this. ## Discussion: `catch` and `orelse` Under this proposal as written, the following code would fail to compile: ```zig const E = enum { a, b, c }; fn getE() ?E { ... } test { const result = getE() orelse .c; _ = result; } ``` That's because the `orelse` statement currently applies Peer Type Resolution to the types `E` and `@Type(.enum_literal)`. Without PTR, these types would not match. The same applies to `catch`. However, if this proposal is accepted, this code actually can work; not through PTR, but by providing a result type to the RHS. If we call `?T` the type of the LHS after being evaluated, then the RHS can be evaluated with result type `T`; this is acceptable because under this proposal, it would need to have type `T` anyway for the peers to successfully combine. Again, the same thing applies to `catch`. To be honest, I could see an argument that this *isn't* desirable, and that the above snippet should indeed require a type annotation on `result`. But it's a possibility nonetheless. ## Discussion: Ranged Integers One potential downside to this proposal is that it could make #3806 significantly more difficult to work with. For instance, consider this code: ```zig const x: u8 = something; const y = if (b) x else x + 1; ``` Under #3806 *with* PTR, `y` has type `@Int(0, 257)`, since PTR is applied to the peer types `@Int(0. 256)` and `@Int(1, 257)`. However, this proposal would cause this code to emit a compile error, because the peer types differ. That could be a big problem, since it could cancel out some of the benefits of implicit range expansion by requiring explicit type annotations. Assuming this is indeed awkward in practice, I'm not sure if there's a good way to reconcile these two proposals. This gives way to a counter-proposal... ## Counter-proposal: Restrict PTR to Numeric Types Instead of eliminating PTR altogether, we could potentially just nerf it a lot. Here's what I would suggest: * PTR of integers combines integer bounds (under #3806) * PTR of floats selects the largest float type, like today * No other types peer resolve This refocuses PTR to be about combining *numeric types*. This restriction still solves the problems discussed in the original issue, whilst avoiding conflicting with #3806: * It wouldn't really matter that comptime evaluation only evaluates one peer: the only thing that could differ between runtime and comptime is an exact integer or float type. The former could not have any effect on semantics, aside from explicitly depending on `@TypeOf(expr)`. The latter *could* affect floating-point precision/rounding, but it seems reasonable that if you need precise details of one floating-point type, you should be annotating it anywhere where it's unclear. * Under ranged integers, `comptime_int` ceases to exist anyway, so this case where adding a runtime peer makes runtime evaluation work doesn't exist. It might still exist for floats if we allow `comptime_float` to peer resolve, which we probably should. This is a minor downside to this counter-proposal. * One property of ranged integers is that exact types don't actually matter that much, so it wouldn't necessarily be an issue that this limited PTR can make those types non-obvious. Likewise, for floating-point types, it *rarely* matters too much which exact type is being used; where it does, it again seems reasonable to expect annotations anyway.

floooh · August 22, 2025, 7:01am

I don’t understand why this should be a compile error, with the current ‘Zig semantics’ the result is expected to be 254, since Zig has no integer promotion like C (for better or worse) - e.g. the result type of wraparound-adding two u8 is u8, and this result then gets assigned to an u32, which is allowed because no information is lost. Why should this operation require any casting since it is already ‘well-defined’?

try std.testing.expectEqual(510, c); // This is wrong in the current version

This expectEqual doesn’t make any sense for wraparound-adding two u8 IMHO.

For non-wrapping + a compile error might make sense if the compiler knows the inputs at compile time (e.g. moving the runtime overflow error to a comptime error).

Please don’t add even more @-clutter if possible, it’s already way past the acceptable pain-point

What’s arguable IMHO is whether a non-wrapping add should produce a result which can fit the espression result (e…g adding two u8 could lead to an inferred result type of u9 and thus cannot produce an overflow), but the result of a wrapping-add of two u8 should always be an u8.

PS: I sometimes wonder if the C designers ran into all those issues too and then simply “invented” integer promotion as the solution

npc1054657282 · August 22, 2025, 7:10am

If zig insisted on not inferring result types for all arithmetic and binary operators, then I could assume that zig’s behavior was consistent, and I would expect its result to be 254.
However, the following proposals exist:

github.com/ziglang/zig

RLS for LHS of shift operands

opened 03:58PM - 22 Jul 23 UTC

mlugg

breaking proposal

## Proposal In status quo, it is common to have to write code like this: ```…zig @as(u64, 1) << n // where n is runtime-known ``` This is because the LHS of a shift operand never has any known result type, so often must be explicitly provided, which (for a constant LHS) probably means using `@as`. The type returned by a shift operation is (in status quo) always the same as the type of its LHS. That means we could change the language to apply RLS. The rule here is simple: if `a << b` or `a >> b` has a result type `T`, forward that result type onto `a`. If there is no result type, work as normal. This would make, for instance, the following work: ```zig var n: u6 = something(); const x: u64 = 1 << n; ``` ## Breaking This is *almost* a non-breaking change. There are two ways I can see for existing code to break. The first is related to comptime integer coercions. Consider this code: ```zig const x: u8 = @as(u64, 256) >> 1; ``` Here, the type of the shift expression is a `u64`, but it has value `128`, so (since it is comptime-known) can coerce safely down to a `u8`. Under the new semantics, the result type of `u8` will be forwarded to the LHS, resulting in this code emitting a compile error since `256` cannot coerce to a `u8`. The failure mode for this breaking change is a compile error. The second way code could break is the other way around: upcasts. Consider: ```zig const x: u64 = @as(u8, 255) << 1; ``` In status quo, the value stored in `x` is 254, because the shifted top bit is discarded (due to the shift being on a `u8`). Under the new semantics, the LHS is coerced up to a `u64`, making the result value instead `510`. This change could cause incorrect behavior in a program. However, I anticipate this pattern of depending on truncation of a shift but immediately coercing it up afterwards is rare. Indeed, when presented differently, it looks more like this proposal is solving a footgun: ```zig const x: u8 = foo(); // [a bunch of stuff] const y: u64 = x << 2; ``` It's not great that you currently have to look back at the type of `x` to understand this behavior: under the proposal it can instead be gained through local information, which is probably the intent in this snippet. Only if type inference is used for `y` does the type of `x` matter.

github.com/ziglang/zig

Bitwise complement should propagate result type

opened 01:10PM - 13 Jun 25 UTC

matklad

enhancement breaking proposal

### Zig Version 0.14.0 ### Steps to Reproduce and Observed Behavior ``` λ cat… main.zig const std = @import("std"); const assert = std.debug.assert; pub fn main() void { const x: u64 = ~0; assert(x == std.math.maxInt(u64)); } matklad@TigerMac ~/p/tb/work (matklad/fix-assert *) λ ~/zig-0.14/zig run main.zig main.zig:5:20: error: unable to perform binary not operation on type 'comptime_int' const x: u64 = ~0; ``` ### Expected Behavior Program exists with zero.

github.com/ziglang/zig

Proposal: restrict `++` to allow it to interact with RLS

opened 10:55AM - 19 Aug 25 UTC

mlugg

breaking proposal

## Background The accepted proposal #24738 is removing the `**` operator from Z…ig, because its main use case -- initializing arrays with a constant element -- has been superseded by `@splat` due to #20433. This is a welcome change, because `**` is quite awkward to use correctly due to its non-interaction with RLS. Code like `const x: [10]u32 = .{123} ** 10;` only works because Zig allows the generated *tuple* of 10 `comptime_int` values to coerce to the array type `[10]u32`. (Actually, that's not quite true: the compiler *does* look at the result type of a `**` operation, but this is *purely* an optimization to work around performance problems, because the compiler performance impact of going through this tuple-array coercion after `**` was often severe.) Meanwhile, code like this: ```zig const MyStruct = struct { field: u32 }; const x: [10]MyStruct = .{.{ .field = 123 }} ** 10; ``` ...fails to compile at all, because `.{ .field = 123 }` has an anonymous struct type (effectively meaning the compiler implicitly creates a "new" struct type), which cannot coerce to `MyStruct`. The `@splat` builtin, aside from just being more readable and nice to use, does not suffer from these problems, so is clearly superior to `**` for constant array initialization. Unfortunately, however, the `++` operator -- which we have no current plans to remove from the language -- suffers from much of the same type-system awkwardness as `**`. When the `++` and `**` operators were designed, Zig was a quite different language, where these operators' behaviors fit in quite neatly with existing semantics. For instance, "anonymous struct types" used to be a distinct concept in the language with special coercion rules, where such types could coerce to structurally-equivalent types, which allowed writing code like this: ```zig const anon = .{ .field = 123 }; const MyStruct = struct { field: u32 }; const typed: MyStruct = anon; ``` In this world, the behavior of `++` and `**` made sense. But over time, Zig has moved in a different design direction: types are not implicitly converted at the point of binding, but rather propagated through expressions via Result Location Semantics, more specifically the system of Result Types. I won't hash out the details of this system yet again; if you're unfamiliar, you can [learn about it in the language reference](https://ziglang.org/documentation/master/#Result-Types). The world we live in today looks like different -- we now generally expect types to be bound immediately and propagated through expressions, so that values have the right type from the moment they are created. This world is broadly a better one: it results in more legible code (types are known early on), makes life easier for tooling (because, uh... again, types are known early on!), and improves compiler performance (due to "deep coercions" happening far less frequently) and potentially even runtime performance (again, because we avoid these "deep coercions"). However, `++` has been somewhat left behind. Using `++` today often requires the user to perform explicit coercions which are visually noisy and make code harder to read. For instance, consider appending to a slice of struct values (the typical use case here is dynamically constructing a slice at comptime whose length is not known in advance) -- this requires the user to write an explicit coercion like this: ```zig const MyStruct = struct { x: usize }; comptime { var vals: []const MyStruct = &.{}; for (0..100) |i| { // doesn't work //vals = vals ++ .{.{ .x = i }}; // instead you need something like: vals = vals ++ .{MyStruct{ .x = i }}; } } ``` This also currently acts as an argument against #5038, since `vals ++ .{@as(MyStruct, .{ .x = i })}` is even clunkier than the above. But this is clearly silly -- the `++` expression has a result type of `[]const MyStruct`, so surely it could propagate the information to its operands that they should also be aggregates of `MyStruct`? Well, the problem with this today is that `++` is too permissive with types. Note that the RHS expression in the above snippet is actually a tuple `struct { comptime MyStruct = ... }`. It is also allowed to be an array `[1]MyStruct`, or a slice `[]const MyStruct`, though *not* a pointer-to-tuple `*const struct { comptime MyStruct = ... }`. This means we can't propagate a result type to the RHS of `++`, because there are too many types the expression *could* have! So, integrating `++` with RLS will require restricting it in a breaking way. That brings us to this proposal. ## Proposal Tighten the definition of `++` so that exactly one of the following holds: * Both operands are tuples, and the result is a tuple. * Both operands are arrays, and the result is an array. * Both operands are slices, and the result is a slice. (I have intentionally omitted vectors from this list for now, because I don't think appending vectors is a useful operation nor one we should be encouraging. That could be supported if there are real use cases I haven't thought of.) Under this definition, if `lhs ++ rhs` has a result type `T`, we will propagate result types to its operands as follows: * If `T` is a tuple, do not forward any result types; we don't know where to split the tuple. * If `T` is an array `[n]Elem`, forward the result type `[_]Elem` to `lhs` and `rhs`. (This isn't a real type, but it can be considered to exist in the context of RLS; this is used today to give a result type to `expr` in `@as([]const T, &expr)`.) * We could potentially forward a concrete length to `rhs` having evaluated `lhs`. For instance, if `T` is `[10]u32`, and `lhs` turns out to be an `[4]u32`, then we can provide the result type `[6]u32` to `rhs`. This makes the operator a little less symmetric, but it might be handy in some cases, e.g. `const long_array: [10]u32 = short_array ++ @splat(0)`. * If `T` is a slice `[]const Elem` (or a pointer-to-array), forward the result type `[]const Elem` to `lhs` and `rhs`. Here's what this means in practice for various use cases: * String concatenations like `@compileError("foo" ++ "bar")`: these values are already slice-like (i.e. pointer-to-arrays), so will work regardless. * But note that `@compileError` *does* give its operand a result type, so the operands *will* be coerced here, meaning you could also write very unusual code like `@compileError("foo" ++ .{@intCast('!')})`. * `@call(.auto, myFunc, .{a} ++ other_args)`: no result type is provided to the `args` operand of `@call`, and these values are tuples anyway. So result types are not involved, but this continues to work, as `.{a}` and `other_args` are both tuples. * `slice = slice ++ .{elem}`: as-is, this no longer works -- the rhs must become `&.{elem}`. But once you do that, everything looks much better: a result type is propagated to `elem`, making type inference work much more nicely. This avoids the need for typed initializers and potentially improves compiler performance a bit by doing fewer coercions. So, common use cases generally either continue working as-is, or require a small change (but might work better after said change). Skimming through the standard library, I don't see anything which becomes significantly worse with this change (the most change necessary to keep code compiling looks to be adding an `&`). This change shouldn't be too difficult to implement in the compiler, so we could probably trial it relatively easily if there is concern of unforeseen breakage. ## Further Thoughts With this proposal eliminating another dependency on "deep" coercions (e.g. tuple -> array), I think it could be desirable to eliminate some of that functionality -- namely, disallow coercions between arrays, vectors, and tuples. I strongly suspect these coercions are mostly redundant now that RLS is core to Zig's design. Array<->vector coercions *are* used with more intent at times, but I suspect that's often a bad idea -- those coercions could be expensive at runtime! Coercion between structurally-coercible tuples (e.g. `struct { u32, u32 }` -> `struct { u64, u64 }`) is a more "intentional" design choice, so should probably be considered separately, though it too has the issue of potentially high runtime cost. I think removing the array<->vector / array<->tuple / vector<->tuple coercions is definitely an idea worth exploring; most code that breaks as a result of that change would likely point to a part of the language's design which needs updating to mesh properly with the Result Type system.

These proposals imply that the concept of result type propagation will be extended to operands.
If zig were to adopt the concept of result type propagation for some operators in the future, while others would not, it would be very confusing. If the above proposals are accepted, I would undoubtedly assume that the expected result here is 510.
At the same time, I agree that some people would expect it to be 254 based on its past behavior. To avoid inconsistency among different users here, I believe the only solution is to disallow automatic type conversion here.

floooh · August 22, 2025, 7:13am

IMHO this would only make sense for a non-wrapping add, but not for a wrapping add.

The idea of extending the result type to fit the result of an expression isn’t bad IMHO, but for wraparound-operations I would expect that the result is not extended. E.g. the result of a regular add of two u8’s is u9 (in CPU’s the 9-th bit is basically the carry-flag), but with a modulo-add the carry-flag should be ignored (e.g. the result doesn’t need an extra bit) because a wraparound within 8-bit is expected.

Funny enough, the rules for integer arithmetics are much simpler down on the CPU level, then up in Zig. Maybe it makes sense to expose the concept of a carry flag to Zig’s integer arithmetic

npc1054657282 · August 22, 2025, 7:19am

The problem is simply a syntactic inconsistency. I understand that most people’s intuition here is 254, as most languages don’t adopt result positional semantics. However, once you accept that zig will adopt result positional semantics for most operators in the future and propagate the result type to the operands, the inconsistency in the wraparound operation here becomes a significant dissonance. In practice, cases where the operand and result types of a wraparound operation don’t match should be rare, so explicitly specifying them is acceptable.

floooh · August 22, 2025, 7:46am

I don’t see an inconsistency tbh.

The result of adding two u8 without wraparound is guaranteed to fit into an u9, so the result type is u9:

u8 + u8 => u9

…but the result of adding two u8 with wraparound is u8, so the result type is u8:

u8 +% u8 => u8

A wraparound into an u9 simply doesn’t make sense since the MSB will always be zero anyway, and if you allow an additional bit, then adding two u8s cannot wraparound, the ‘wraparound’ would spill into the additional bit instead.

FWIW, I see wraparound-operations more in the ‘bit-operation corner’ than the ‘arithmetic’ corner, for instance modulo operations should mostly only be applied to unsigned integers.

npc1054657282 · August 22, 2025, 7:56am

This is the way most languages think.
Based on current proposals to infer the result position type from the operands, this effectively introduces a new way of thinking: when the result position of an expression is u32, the operand type of the binary expression is inferred to be u32. Consequently, the operands are first converted to the inferred type before the operation is performed.
Proposals #16489 and #24167 both follow this logic. If we accept this logic for some operators, we naturally assume that other operators will follow suit.

floooh · August 22, 2025, 8:02am

IMHO this is a bad proposal then

I would expect that a complex expression is split into simple subexpressions, and each subexpression has a result type, which is then carried upward the expression-AST to the root. E.g. if I have something like this:

const a: u2 = 2;
const b: u2 = 3;
const c: u1 = 1:
const d = c + (a + b);

…or in types:

? = u1 + (u2 + u2);

I would expect this to resolve in the following steps:

? = u1 + (u2 + u2)
// subexpression u2 + u2 resolves to u3
? = u1 + u3;
// maybe u1 is extended to u3, but that doesn't matter for the result
? = u3 + u3;
// the result type becomes u4
u4 = u3 + u3;

…and if the left hand side already defines a type, that doesn’t matter, since u4 can be assigned to u32 without loss of information… but IMHO this shouldn’t mean that all expression inputs are first propagated to u32 (which wouldn’t work anyway if the left-hand-side type needs to be inferred from the expession result).

PS: the interesting part is now, what happens when I try to modulo-add two types of different length… does the wraparound work on the short or longer type? IMHO the longer type should be picked:

? = u1 +% (u2 +% u2);
? = u1 +% u2;
u2 = u2 +% u2;

npc1054657282 · August 22, 2025, 8:10am

Honestly, I don’t dislike your paradigm at all. It’s the automatic type deduction paradigm, and it’s the paradigm Rust uses.
But what bothers me most about Zig right now is that it currently has two distinct paradigms: the result type paradigm and the automatic type deduction paradigm. Most current Zig proposals encourage the result type paradigm, not the automatic type deduction paradigm. Furthermore, most existing proposals attempt to enable backward deduction of existing expressions based on the result type, rather than forward automatic type deduction.
While I prefer the automatic type deduction paradigm, I’d prefer a consistent paradigm across all Zig operators. If Zig advocates for the result type paradigm for some operators, I’d like all operators to follow it. This way, I can have appropriate expectations about operator behavior without having to remember inconsistent behavior across different operators.

floooh · August 22, 2025, 8:27am

PS: of course my proposal also means that the type of c is u65 here

const a: u64 = ...;
const b: u64 = ...;
const c = a + b;

…that’s why it would make more sense to use u63 as the ‘natural word size’ unsigned integer type, e.g. usize should be changed from 64- to 63-bits (or maybe better: dropped completely from the language, since what would be the result type of adding two usizes?

This specific problem could be solved with the proposed ranged-integer-types which could more precisely infer the required result type (and might prevent the result type from always being one bit wider).

An interesting side effect of this ‘expression result type inference’ is that an unsigned add can never overflow, since the result type will always have enough room to hold the result - but at the cost of potentially spilling into the next 8/16/32/64/… wide type.

But in any case, the more I think about it, the less a hardwired-width usize type makes sense.

npc1054657282 · August 22, 2025, 8:36am

github.com/ziglang/zig

allow integer types to be any range

opened 10:31PM - 29 Nov 19 UTC

andrewrk

breaking proposal

Zig already has ranged integer types, however the range is required to be a sign…ed or unsigned power of 2. This proposal is for generalizing them further, to allow any arbitrary range. ```zig comptime { assert(i32 == @Int(-1 << 31, 1 << 31)); assert(u32 == @Int(0, 1 << 32)); assert(u0 == @Int(0, 1); assert(noreturn == @int(0, 0)); } ``` ---- Let's consider some reasons to do this: One common practice for C developers is to use `-1` or `MAX_UINT32` (and related) constants as an *in-bound* indicator of metadata. For example, the stage1 compiler uses a `size_t` field to indicate the ABI size of a type, but the value `SIZE_MAX` is used to indicate that the size is not yet computed. In Zig we want people to use [Optionals](https://ziglang.org/documentation/master/#Optionals) for this, but there's a catch: the in-bound special value uses less memory for the type. In Zig on 64-bit targets, `@sizeOf(usize) == 8` and `@sizeOf(?usize) == 16`. That's a huge cost to pay, for something that could take up 0 bits of information if you are willing to give up a single value inside the range of a `usize`. With ranged integers, this could be made type-safe: ```zig const AbiSize = @Int(0, (1 << usize.bit_count) - 1); const MyType = struct { abi_size: ?AbiSize, }; var my_type: MyType = undefined; test "catching a bug" { var other_thing: usize = 1234; my_type.abi_size = other_thing; // error: expected @Int(0, 18446744073709551615), found usize } ``` Now, not only do we have the Optionals feature of zig protecting against accidentally using a very large integer when it is supposed to indicate `null`, but we also have the compile error helping out with range checks. One can choose to deal with the larger ranged value by handling the possibility, and returning an error, or with `@intCast`, which inserts a handy safety check. How about if there are 2 special values rather than 1? ```zig const N = union(enum) { special1, special2, normal: @Int(0, (1 << u32) - 2), }; ``` Here, size of N would be 4 bytes. ---- Let's consider another example, with enums. Enums allow defining a set of possible values for a type: ```zig const E = enum { one, two, three, }; ``` There are 3 possible values of this type, so Zig chooses to use `u2` for the tag type. It will require 1 byte to represent it, wasting 6 bits. If you wrap it in an optional, that will be 16 bits to represent something that, according to information theory, requires only 2 bits. And Zig's hands are tied; because currently each field requires ABI alignment, each byte is necessary. If #3802 is accepted and implemented, and the `is_null` bit of optionals becomes `align(0)`, then `?E` can remain 1 byte, and `?E` in a struct with `align(0)` will take up 3 bits. However, consider if the enum was allowed to choose a ranged integer type. It would choose `@Int(0, 3)`. Wrapped in an optional, it actually could choose to use the integer value 3 as the `is_null` bit. Then `?E` in a struct will take up 2 bits. Again, assuming #3802 is implemented, Zig would even be able to "flatten" several enums into the same integer: ```zig const Mode = enum { // 2-bit tag type Debug, ReleaseSafe, ReleaseFast, ReleaseSmall }; const Endian = enum { // 1-bit tag type big, little, }; pub const AtomicOrder = enum { // 3-bit tag type Unordered, Monotonic, Acquire, Release, AcqRel, SeqCst, }; pub const AtomicRmwOp = enum { // 4-bit tag type Xchg, Add, Sub, And, Nand, Or, Xor, Max, Min, }; const MyFancyType = struct { mode: Mode align(0), endian: Endian align(0), atomic_order: AtomicOrder align(0), op: AtomicRmwOp align(0), }; ``` If you add up all the bits of the tag type, it comes out to 10, meaning that the size of MyFancyType would have to be 2 bytes. However, with ranged integers as tag types, zig would be able to flatten out all the enum tag values into one byte. In fact there are only 21 total tag types here, leaving room for 235 more total tags before MyFancyType would have to gain another byte of size. ---- This proposal would solve #747. Peer type resolution of comptime ints would produce a ranged integer: ```zig export fn foo(b: bool) void { // master branch: error: cannot store runtime value in type 'comptime_int' const x = if (b) -10 else 100; // proposal: @typeOf(x) == @Int(-10, 101) } ``` ---- With optional pointers, Zig has an optimization to use the zero address as the null value. The `allowzero` property can be used to indicate that the address 0 is valid. This is effectively treating the address as a ranged integer type! This optimization for optional pointers could now be described in userland types: ```zig const PointerAddress = @Int(1, 1 << usize.bit_count); const Pointer = ?PointerAddress; comptime { assert(@sizeOf(PointerAddress) == @sizeOf(usize)); assert(@sizeOf(Pointer) == @sizeOf(usize)); } ``` One possible extension to this proposal would be to allow pointer types to override the address integer type. Rather than `allowzero` which is single purpose, they could do something like this: ```zig comptime { assert(*addrtype(usize) i32 == *allowzero i32); assert(*addrtype(@Int(1, 1 << usize.bit_count)) == *i32); } ``` This would also introduce type-safety to using more than just 0x0 as a special pointer address, which is perfectly acceptable on most hosted operating systems, and also typically set up in freestanding environments as well. Typically, the entire first page of memory is unmapped, and often the virtual address space is limited to 48 bits making `@Int(os.page_size, 1 << 48)` a good default address type for pointers on many targets! Combining this with the fact that pointers also have alignment bits to play with, this would give Zig's type system the ability to pack a lot of data into pointers which are annotated with `align(0)`. ---- What about two's complement wrapping math operations? Two's complement only works on powers-of-two integer types. Wrapping math operations would not be allowed on non-power-of-two integer types. Compile error.

I think you might like the idea of combining this proposal.
However, zig is unlikely to move in this direction, preferring to explicitly specify the type of the result value rather than automatically inferring it.

floooh · August 22, 2025, 8:37am

Yeah I’m aware of the proposal and I think it makes a lot of sense.

But IMHO ranged integer types would fit even better with inferring the type of subexpressions to a type wide enough to hold the subexpression result, instead of promoting all subexpressions to some arbitrary ‘left-hand-side type’.

floooh · August 22, 2025, 8:54am

PS: another obvious downside of my simplistic “width-extension” idea is that each subexpression needs to add one bit to the result type (ignoring multiplication here for now)… this would need to happen because without comptime-known inputs the compiler needs to expect that each expression input might be MAX_INT - 1 (where each uN type has its own MAX_INT of course), so it always needs to add one bit to make room for the result.

With ranged-integer-types this explosion could be tamed a bit but not entirely prevented (because the upper bound for a ranged type could be smaller than the MAX_INT - 1 for a 2^N type).

The advantage of such a ‘growing result type’ is that in turn a lot of overflow checks for subexpressions could be removed.

But once multiplication comes into play this idea would quickly run out of ‘hardware bits’ (since the result type would not need to grow by 1 bit, but by the sum of the input bits, e.g. u8 * u8 => u16 etc…).

Another way to tame the bit explosion is to use a type provided on the left-hand-side as upper limit:

const a: u32 = ...;
const b: u32 = ...;
const c: u32 = a * b;

…without the explicit type or comptime-known inputs, c would be inferred as u64 since that’s needed to hold a ‘worst case result’, but since the explicit type is provided, the compiler now knows that no sub expression may go beyond 32-bits (which then of course requires to add overflow-checks for sub-expressions that may result in a value that doesn’t fit into 32-bits)…

E.g. maybe a combination of inferred sub-expression types and an ‘upper bound’ result type makes the most sense? E.g. subexpression result types would grow to hold the subexpression-result, but with an optional upper-bound for all subexpressions if the left-hand-side has an explicit type.

Without an ‘upper bound type’ it’s still ‘dangerous’ (in terms of performance expectations) though, because if all inputs are u64 here:

const c = a * b * c * d * e * f;

…the inferred type for c would need to be u384:

? = u128 * u64 * u64 * u64 * u64;
? = u192 * u64 * u64 * u64;
? = u256 * u64 * u64;
? = u320 * u64;
? = u384;

…but I guess that’s sort of expected when fully embracing ‘variable width integer arithmetic’

npc1054657282 · August 22, 2025, 10:26am

A common arithmetic scenario is based on runtime logical loops. In general, I think runtime overflow is inevitable, and the scenarios that can be used with compile-time type extension are very limited.

gwenzek · August 23, 2025, 11:47am

I find it quite jarring that to understand what var c: u32 = a +% b; you need to know the type of a and b.
For me this operation should work with u32.
BTW if you compile in ReleaseFast the code will return the 540 expected value. So the Zig is not pushing the more hardware friendly computation.

It also becomes a footgun. If someone changes the type of ‘a’ to u16, then the behavior changes. This is not obvious to detect if ‘a’ is the output of a function and not explicitly typed.

IntegratedQuantum · August 23, 2025, 11:58am

If that really happens then that would make it a compiler bug with current semantics. Could you please share your full code that reproduces this?

It seems you get a similar class of footguns when doing the calculation based on the result type:
if someone changes the type of c to u8 then the behavior changes.
This is not obvious to detect if this is the input of a function and not an explicitly named variable.

floooh · August 23, 2025, 12:02pm

It also becomes a footgun. If someone changes the type of ‘a’ to u16, then the behavior changes.

The alternative is C-style integer promotion which is just a different bucket of footguns.

BTW if you compile in ReleaseFast the code will return the 540 expected value.

Hmm, I can’t reproduce that here (and if that would clearly be a bug), e.g. this code:

pub fn main() void {
    const a: u8 = 255;
    const b: u8 = 255;
    const c: u32 = a +% b;
    @import("std").debug.print("c: {}\n", .{c});
}

…returns the same wraparound-result (254) both with zig build-exe bla.zig and zig build-exe -OReleaseFast bla.zig.

Did you maybe test with a non-wrapping + instead of wrapping +%?

gwenzek · August 23, 2025, 2:00pm

You’re right it was with a plain +.

My point stand though.

C can’t do what I suggested cause there is no propagation of result type.

In C++ it wouldn’t work either cause ‘+’ can be overriden