Arithmetic Compatibility

I’ve had this post on ice for, well, awhile now. Zig 0.16 has taken some steps in this direction, which shows, I hope, a willingness to consider taking things further.

Without further ado:

Arithmetic Compatibility

Since Andrew had encouraging things to say about this general idea, it seems worthwhile to brainstorm a bit about how it can work. Especially after 0.16 comes with several improvements aimed squarely at less-painful arithmetic.

Zig’s casting rule is very simple, that’s good, and for casting it works just fine. Empirically, the way it’s used in arithmetic expressions is not ergonomic, this is an acknowledged pain point.

Any change needs all these things:

  • The rule must also be simple. It’s not possible to be simpler than the casting rule, so it won’t be, but it should be as simple as possible given that.

  • It must not make the language less safe. The casting rule imposes friction on downcasts, this encourages the use of wide-enough types, and biases code toward unsigned values for indices, both of which are, broadly speaking, good.

  • It must work for arbitrary integer ranges. That issue hasn’t been accepted, but it has support on the core team, and well, I for one hope we get it. So if a change to how this works helps make that possible, that’s good.

I think what follows meets all those criteria. We introduce a rule called arithmetic compatibility, that’s arithMETic, not aRITHmetic: adjective, not noun. It’s not a cast or a promotion, it’s a rule which determines when the combination of two operations and the result location are mutually compatible.

This is the rule: arithmetic is allowed when the image of the operation has less than, or equal to, the number of invalid results as if the operands were in the domain of the result type. Concretely, for C = A • B, we ask: if A and B are the same type as C, what percentage of results of the operation are invalid? Then we ask again for the actual types of A and B, if the result is greater, the operation is not allowed, if it’s equal or less, the operation is allowed.

The rule is very simple, but, it is abstract and this is somewhat demanding on the user. But it does have to be, in order for it to work with Zig’s unusually wide variation in integer types, and to have a prayer of working with integer ranges.

I will now illustrate some of the more common results of this rule. I’ll stick with operands of the same bitwidth, which is the most important case. Don’t ignore that there’s a result location here, that’s important.

  • uN += iN: Allowed, as is -. Precisely 75% of results of signed-unsigned addition and subtraction fit into the unsigned result type, with both unsigned, it’s 50% + 1.

  • uN *= iN: Not allowed. Any negative value of iN gives an invalid result, and this makes the domain smaller than unsigned multiplication. If the signed value is small enough, the rule allows it: I’m not sure that it should. A negative value is never legal here, and it doesn’t seem like eliding the cast for this operation is fulfilling the obligation to keep things at least as safe as they are now. But this would complicate the rule.

  • iN += uN: Not allowed. This turns out to have the same analysis as uN += iN, for nontrivial reasons.

  • fN •= (i|u)M for any M: allowed. Our result location is a float, we accept that this comes with loss of precision in exchange for a very, very large range with saturation at the high and low ends, and underflow to zero for division. Note that this operation cannot panic under any circumstances, even if is / and the integer is 0, so we’re emphatically not doing anything unsafe here.

A nice thing about this rule is that it’s calculable. For one thing, that’s essential in order for the compiler to determine what to do, for another, we can give a clear error message even with arbitrary integer ranges.

Another advantage of considering the result location is we could have more cases of @as(T, a + b) and less of the dreaded @as(T, @intCast(a)) + @as(T, @intCast(b)). I will say: Rust code requires more a as T type code, but it’s much lighter on the page. This alone would make me miss that less when I’m writing Zig.

A caveat: this rule doesn’t work for shifting, but, shifting is not arithmetic. I propose no changes to how shifting currently works in Zig. It’s also basically meaningless for bit masking, which is also working just fine as it is.

Indexing: where I finally came down on this is that it’s a cast, and so the current rule holds. I bring it up simply because of the fact that a usize is almost always absurdly larger than the actual legal range of an index, and if you think about it that way: a [5]T has five usize values which are legal, and five isize values which are legal. An @intCast on a number does nothing in unsafe modes, and safe modes have bounds checking, so it’s kind of double-dipping.

But, I do think it’s good that this encourages the use of unsigned values for index types, and I expect that the arithmetic compatibility rule will ease the pain of working with offset types / deltas enough that relaxing this constraint wouldn’t matter in the end. Right now, the choice is between clearer signed arithmetic with a cast for every index, or a bunch of muddy and visually-heavy casting to keep an index unsigned, but then indexing doesn’t need a cast. This would remove that dichotomy, and actually strengthen the use of unsigned index types.

I could be persuaded otherwise. The “at least as safe” precept mostly holds; not if we consider cases like i8 vs u8 for a [256]T, but there’s no restriction on usize and below for that array in current Zig either, and “less safe” only kicks in here with valid positive indices which the signed type under consideration cannot represent.

It’s more conservative to see how much it matters after introducing arithmetic compatibility. Given where I’ve felt the pain in writing this kind of code, I suspect it will be much less worth addressing.

Simplification

This rule is elegant, and simple in that sense. I considered tripling the size of this post by brain-dumping all the number theory of how to improve on a brute-force calculation, but: really this is only necessary for arbitrary ranges. It’s not clear that we’re going to have those, so I’ll just note that it’s fast for addition and subtraction (constant time), and there are ways to memorize results from multiplication and division, such that the answer can be interpolated correctly.

For the numbers we do have, there are only a few rules here:

  • Integers are automatically promoted to float when the result location and other operand are floats. Zig 0.16 has that already for casts: intolerant of loss of precision, which is correct for casts, but the premise of arithmetic compatibility is that float *= usize has a float result type, so the code has ‘opted in’ to the usual tradeoffs found in float calculation.

  • When the result location is unsigned, addition and subtraction may have either operand signed or unsigned, so long as both have equal or lesser bit width to the result location. Multiplication and division must have both operands unsigned. This is done through ‘natural’ arithmetic in two’s complement, safety checks confirm the result is >= the left operand for addition, and <= the left operand for subtraction.

  • When the result location is signed, any unsigned participants must be of lesser bit width than the result location, for all four arithmetic operations.

This summary isn’t quite a mere consequence of applying the rule, specifically it disallows any signed participants in multiplication and division for an unsigned result. I think that’s better in the absence of arbitrary ranged types, that’s where things get complex and a general rule is needed. For a system like we have, which is purely power-of-two based, you can modify the rule and say that, rather than comparing to the result location, compare as though each operand has its bit width, but the sign of the result location, and then you get exactly as sketched above. You can see why that doesn’t work for arbitrary ranges, because sign is not a central fact about a range in the same way as it is for power-of-two arithmetic.

Bonus Round

While I’m here, I would like to enter a plea for signed division. I do agree with the logic which removed it from the language: popular languages disagree about how to handle it, this makes things inherently ambiguous, taking a side arbitrarily is not the correct resolution. The “/ is not valid with signed operands” part, I’m in.

But punting to builtins is also bad, frankly. It’s just weird, division is an operator and using a function for it is, well, it’s unheard of, if we don’t count Lisp. We should be wary of arguments from popularity, but look: a lot of code, especially numeric code, gets translated from somewhere else, and making division a function screws up operator precedence in a major way. I have personally spent a few hours tracking down a bug caused by this, and I didn’t like it. Algebra is a language, and this makes it unduly hard to read arithmetic operations in Zig which employ signed division.

If I told you there’s a language where you can’t use / for signed divison, because it’s ambiguous whether truncation or flooring is intended, and therefore, you have to use either /- or /%, would you be at all confused about which one does what? I hope not! Zig already uses composite operators to handle several tricky corners of overflow-able arithmetic on fixed-sized values, to great effect I might add, and I hold to the opinion that adding these would have an improving effect.

This is the once-buggy line I had to fix:

bin_mid = @divTrunc(bin_max - bin_min, 2) + bin_min;

Binary search, it’s hardly a recondite operation.

I had transcribed it thus:

bin_mid = bin_max - @divTrunc(bin_min, 2) + bin_min;

Like every mistake, it’s obvious: in retrospect. What I want you to notice is that the precedence did, in fact, change. That kind of thing has consequences.

Compare:

bin_mid = (bin_max - bin_min) /% 2 + bin_min;

YMMV, but I find this much easier to read and understand.

I can make the same case for @rem and @mod, but not as confidently. The problem there is that % is itself ambiguous, and tacking another operator onto it can’t fix it. That said, once the user knows about /- and /%, learning what %- and %% do is a matter of analogy, which is something we’re pretty good at, as a general rule.

I come away from the Wikipedia article on modulo with the impression that describing truncated remainder as “remainder” and floored remainder as modulus is clear for contextual social reasons, but not because it faithfully reflects how mathematics deals with the same ambiguity: a remainder is a modulus, the several possibilities are handled with noun phrases, not by drawing an artificial distinction between the one and the other.

The relevance of that observation being that “floored remainder” and “truncated remainder” are fully cognate with their equivalent terms in addition, and both results are considered a modulus. So maybe we can handle symbolic definitions for those operators as well.

// Before
(@divFloor(a, b) * b) + @mod(a, b) == a
// After
(a /- b * b) + @mod(a, b) == a;
// Full change:
(a /- b * b) + a %- b == a;

// Before
(@divTrunc(a, b) * b) + @rem(a, b) == a
// After
a /% b * b + @rem(a, b) == a;
// Full change:
a /% b * b + a %% b == a;

One with redundant parens (like the original), one without. I mean redundant in terms of the result, I personally favor parens for anything more complicated than the above, meaning I wouldn’t use them for this but reasonable people can disagree which is clearer.

I’d even hazard that adding the third-choice lines to the documentation would go a long way toward fixing the meaning in users’ minds.

Even in the somewhat artifical context of documenting the meaning of those two builtins, I find the latter two easier to comprehend.

Readability

These things aren’t connected except insofar as the intention is to make Zig code using signed values easier to read. I consider that goal valuable. There’s a balance to be struck between

  • Communicate intent precisely

And

  • Favor reading code

I believe that the amount of casting in status quo, as well as handling division by function builtins, is worse for both of these goals. A systems language in which numerics-heavy code appears unfamiliar and must be heavily punctuated with casts is alienating one of its core audiences. I think it would be great, by the way, if the core language supported complex numbers and matrices. But one thing at a time.

21 Likes

I’ll read this again tonight but I like the signed-division/operator part only because for me that is a real readability problem (in my own world). The binary search example makes that point pretty well: replacing / with builtins can change the visual eyeball structure and precedence of arithmetic in a way that invites bugs.

Intersting post!

1 Like

Completely appreciate and agree with this. In many cases, thinking about overflow is already required; there’s no hiding from it. Thinking about it wrt/ a rule that frees you to proceed with unencumbered arithmetic, after making conclusions about your types, seems very reasonable.

But my brain isn’t working quite right, it seems. Help me understand a basic example from your first bullet:

So, for instance: u4’s range is 0 to 15. i4’s is -8 to 7. By “percent”, do you mean the simple “possible outcomes”, or the combinatorial possibilities? If simple outcomes, my simpleton look at this one says that the smallest possibility is -8 (0 + -8) and the largest possibility is 22 (15 + 7); that gives me 8 possibilities below the RL (-1 through -8) and 7 above (16 through 22). That would be 15 out of range. With 15 in range, I’d call it a 50%er, which would still meet the criteria (both unsigned would give a range of 0 to 30, with 16-30 all being out of range), but not with as much margin. Anyway, I’m doing something wrong here, or you’re talking about combinatorial possibilities, and I haven’t bothered my brain to work that one out yet….

2 Likes

u4 + u4 has a total of 256 possible inputs, of which 129 do not overflow. The + 1 is because 0 adds to everything.

u4 + i4 has the same total number of possibilities, of which 192 are within the range.

3 Likes

I read it again. I like the readable operator for signed division it would work well and be useful. Pretty clever. My question: It seems likle the arithmetic legality depend on the result location not just the operand(s)? Does that mean the mixed expression can become legal or illegal depending on what it’s being assigned or cast to?

1 Like

intuitively, this should be the case imo. To pick a somewhat extreme example to demonstrate:

u8 * i8 shouldn’t be legal to fit into an u8 without a cast… but should fit into an i32 just fine!

2 Likes

As long as its boringly obvious (i need things like that lol). Sometimes I get all goofed up regarding whether or not i’m for or against moving complexity from the code into the language. But agreed with the OP its it’s solving a real pain in the a thing.

1 Like

Got it. To clarify, (2^(bitwidth))^2 offers the full set of permutations (including duplicates); overflow count calculated depending on the scenario. (As opposed to my simpleton analysis of outcomes-only). That’s reasonable, and a meaningful metric for the decision. All very welcome. Any more discussion on the intersection with integer ranges, or is that only the stuff of your “triple-sized brain-dump post”… for later?

1 Like

I find this quite a good idea but would maybe even extent it to addition. When converting anything to float a possible loss of precision is expected and everything should have this is the back of their mind anyway. When you then possibly go back from float to integer you need to decide what to do with that loss of precision - round, truncate, whatever.

The thing then is, that then one can quite easily write a user-space function for only allowing casts that don’t lose precision to replicate the 0.16 behaviour. An this is then possible both at comptime with a @compileErr and at runtime by just returning an error.

1 Like

My fear is that with the rule of number of invalid results is somewhat difficult to do from the programmer’s perspective, you will have to do calculations in your head to know if your code is legal or not. What If instead of that the rule in the compiler was something simpler and easier to remember that achieved a similar effect most of the time?

In the binary search example you could use /2 because 2 is comptime_int.

pub fn main() void {
    const a: i32 = 10;
    const b: i32 = -13;
    const c: i32 = (a + b) / 2;
    std.debug.print("{}\n", .{c});
}

Personally I am a big fan of +% and +| operators that were introduced for different “flavors” of addition, and I think something like that for the division like you mentioned would be great to use. Perhaps a different ergonomic operator could be introduced for signed to unsigned conversions in particular.

1 Like

I imagined a tool. It should be pretty easy to engage a tool whose sole purpose is for this. In fact, I’m guessing it might be pretty easy(?) to build a needed int-size, based on anticipated operations and needed maximums, with something like math.IntFittingRange() (“like” meaning: including new functions that might be written for the purpose). In other words - a built-in tool that you use to construct your int type sizes, if that suits you, even instead of doing your analysis and then hard-coding your type (perhaps with a comment saying how you got there). Built-in type-width calculation could also flex, then, when your needs changed and you anticipated the need for a wider range… or when runtime overflows indicated that you’re going out of range.

Brain, tool, or inline-tool - it would take a little work, sure, but… for the tradeoff? Zig is all about the low-level shaping, for its advantages, and the goal of legible mid-logic code (possibly at the expense of more complex setup code) is a reasonable one.

1 Like

I’ve made this. It’s not builtin but it’s been interesting to make.

1 Like

Yes, this is the bit that makes it all hang together. Arithmetic compatibility is two operands, an operator, and a result location. Specifically:

 iN += uN

Is not allowed,

uN += iN

Is allowed. Writing this in abstracted form using = is.. it’s easier to do it this way.

That’s my fault, I was using as “any of four arithmetic operators”, but should have said that. There are contexts where that could be implicit, this isn’t one of them.

The compiler will just tell you that part, what I’m aiming to get right here is: we get more arithmetic which is allowed, but not by making any operation less safe.

It’s all a bit galaxy-brained because I was thinking about the integer ranges proposal, which means, unique types across two arbitrary integers: and this rule scales to that, but, it simply has to be abstract for that to work.

Personally, I would rather have an operator like u8 + u8 create a result of type u9, and u8 * u8 have a result of u16. Then they are fully legal for their range of inputs. Modulo and saturating versions are already fully defined so they can be left alone. Basically I’d like to have to tell the compiler what to do in the overflow case. Store it, wrap it, or clip it.

I know this would be deeply unpopular though because it breaks 50+ years of C loose arithmetic.

What really bakes my noodle though is that u9 = u8 + u8 does not preserve any carry-bit. You have to promote one of the operands first. Similar with multiplication.

6 Likes

I wonder if you’d like that in practice. I’ve had similar thoughts and concluded: no, it would get in my way despite being “the right thing”.

Although for integer ranges it’s nearly essential, or, something like it is.

This is latent in the idea of involving the result location in the resolution of arithmetic, but I left it latent for a reason. Way too much judgement call, and leaning into this philosophy hard enough veers toward, as you say,

I do not want the compiler trying to figure out what I mean. It’s ‘obvious’ for simple demonstrations but it quickly becomes less so.

I’ve mulled this one over too, and I can’t come up with a rule for it that really works. Maybe someone will have better luck.

Hmnnn… like:

// y and z are u8s
const x = y + z; // @TypeOf(x) == u9

?

Would you still be able to:

const x: u10 = y + z;

?

Presumably that’s essential…

I don’t understand this one…

In u9 = u8 + u8 the ‘carry bit’ is the topmost bit of the u9 result, isn’t it? E.g. in CPUs, the carry flag is simply the ‘missing’ topmost result bit because the result register usually has the same width as the operand registers, so that ‘extra’ result bit must live somewhere else, and that’s the carry flag.

In general I like the idea of automatically extending the result to the required width, but it quickly explodes, e.g. consider:

u160 = u32 * u32 * u32 * u32 * u32

…e.g. in multiplication, the required result bits quickly add up, but those result bits are only really used when the operand values actually use their entire bit-range - and those ‘wide multiplications’ can’t be done efficiently down on the CPU.

PS: ah I see what you mean: trying to add two u8’s into an u9 currently creates an overflow error:

const std = @import("std");

pub fn main() void {
    const a: u8 = 255;
    const b: u8 = 2;
    const c: u9 = a + b;
    std.debug.print("c: {}\n", .{c});
}
bla.zig:6:21: error: overflow of integer type 'u8' with value '257'
    const c: u9 = a + b;
                  ~~^~~

…that current behaviour is definitely unintuitive… the input types should ‘somehow’ be promoted to the result type… (question is what should happen when the result type needs to be inferred - use the widest input type? or the ‘required bits’ to hold the result type without overflow?).

4 Likes

It would have to be the former, wouldn’t it? Auto-widening just seems like it would spell trouble. I’m all for a tool that helps you calculate, and explicitly declare, or even a comptime tool to calculate width, because then it’s visible that that’s what you really want, but I can’t see how magic RL bitwidth deduction can avoid trouble. I don’t see how it could be clear to readers of the code.

I like @mnemnion‘s original proposal: just a rule dictating whether an arithmetic operation is allowed.

I’m less convinced about the bonus round. Yet, anyway. But I could just like the clarity of @divFloor and company, and if I stare at the “before and afters” long enough, I do feel the gravity of those hieroglyphic operators. I’d just have to look them up for quite awhile. :slight_smile:

2 Likes

I think this is the most egregious problem with Zig ATM. The inserted “safety check” here actually can crash programs that are totally valid.

Recently I had a 2D index stored in a struct as u8 and converting the indices into an offset was in fact doing the math in u8 and panicking.

I believe this goes against Zig philosophy :

  • it punishes you for using small ints to store data (DoD)
  • it uses hidden control flow to insert panics
  • it doesn’t reflect what hardware is doing (computing in u64)
3 Likes