Adding a signed integer to an unsigned integer

kj4tmp · August 29, 2024, 5:30am

Forgive me for such a simple question, but why can’t I do this?

pub fn seek(address: u16, amount: i16) u16 {
    return address +% amount;
}

test {
    std.testing.expectEqual(@as(u16, 3), seek(0, 3));
}

jeff@jeff-debian:~/repos/zecm$ zig test src/sii.zig 
src/sii.zig:538:20: error: incompatible types: 'u16' and 'i16'
    return address +% amount;
           ~~~~~~~~^~~~~~~~~
src/sii.zig:538:12: note: type 'u16' here
    return address +% amount;
           ^~~~~~~
src/sii.zig:538:23: note: type 'i16' here
    return address +% amount;

I see a similar example in

github.com

ziglang/zig/blob/master/lib/std/io/fixed_buffer_stream.zig#L78


      
          
              if (n == 0) return error.NoSpaceLeft;
          
              return n;
          }
          
          pub fn seekTo(self: *Self, pos: u64) SeekError!void {
              self.pos = @min(std.math.lossyCast(usize, pos), self.buffer.len);
          }
          
          pub fn seekBy(self: *Self, amt: i64) SeekError!void {
              if (amt < 0) {
                  const abs_amt = @abs(amt);
                  const abs_amt_usize = std.math.cast(usize, abs_amt) orelse std.math.maxInt(usize);
                  if (abs_amt_usize > self.pos) {
                      self.pos = 0;
                  } else {
                      self.pos -= abs_amt_usize;
                  }
              } else {
                  const amt_usize = std.math.cast(usize, amt) orelse std.math.maxInt(usize);

but this seems like a lot of code to write for this.

AndrewCodeDev · August 29, 2024, 6:11am

It may be worth mentioning that the abs_amt = @abs(amt) becomes unsigned. The @abs returns an unsigned amount for signed integers: Documentation - The Zig Programming Language

One reason why is that this prevents overflow when a range of a signed integer is asymmetric… for example [-128,127] for instance. It can’t map -128 to positive 128 because it will overflow by one. In this case, adding that extra bit (what was originally the sign bit in the case of an unsigned integer of the same bitwidth) will more than compensate for that.

If you need to go from unsigned to signed and always guarentee that the cast will work, you can use an unsigned integer with one less bit and cast to a signed integer with one more bit.

u7 can safely cast to i8
u31 can safely cast to i32
etc...

And then using @intCast, increase the bit width of the unsigned value to the signed target and then do your arithmetic that way.

As you’re noticing, Zig makes it painful (not terribly, but noticeable) on purpose. There’s a lot of issues with mixing lanes and Zig makes those apparent.

Sze · August 29, 2024, 11:16am

related topic:

mnemnion · August 29, 2024, 2:51pm

So C has a bunch of implicit conversion rules. It’s supposed to make arithmetic expressions easy to write, and it does do that. But it’s a notorious source of bugs. The actual rules are quite complex, the results can be surprising, and it’s far too easy to hit undefined behavior.

Zig made a different choice, it has exactly one rule:

Type coercions are only allowed when it is completely unambiguous how to get from one type to another, and the transformation is guaranteed to be safe.

This is much better, and also it can be extremely annoying. You’ve hit the case I find most annoying: addition or subtraction between ‘peer’ signed and unsigned types.

This is legal:

fn unsignedMinus(a: usize, b: usize) usize {
    return a - b;
}

But this is not:

fn signedUnsignedPlus(a: usize, b: isize) usize {
    return a + b;
}

That’s despite the fact that both of these functions hold the same hazard: the returned value might be negative. The second one also poses a risk of overflow, but of course it’s easy to run that risk with two usize as well.

Just because it’s annoying doesn’t mean I disagree with it. Adding more rules to make writing code more ergonomic adds back some of the complexity we’re trying to get away from.

But I have this function in probably a majority of my libraries:

inline fn cast(T: type, v: anytype) T {
    return @intCast(v);
}

I think this is central enough to doing useful things with integer values in Zig that it should probably be a builtin. Before result location semantics, @intCast used to take two arguments, and this is exactly how it worked. Writing @as(usize, @intCast(v)) is very heavyweight and ends up obscuring the equation, arithmetic bugs aren’t type conversion bugs, but they’re still bugs.

jibal · November 14, 2024, 8:51am

This is extremely difficult to get right and very hard to tell whether the code is right or wrong by reading it, and therefore is extremely error-prone … the opposite of the intent.

Note that if foo has an unsigned type then foo - 3 compiles but foo + -3 doesn’t, because Zig’s overly simplistic rules require -3 to be converted to foo’s type, but it can’t be because Zig views signed and unsigned integer types as being incommensurate , when in fact they aren’t. Even if the signed value is a smaller type they can’t be added … as far as Zig is concerned, you can no more add signed and unsigned numbers than you can add numbers and structs.

One might be inclined to try to use @bitCast, which is almost certain to introduce a bug.

I know that people tend to rationalize every Zig design decision but this one is clearly wrong … the language requires a conversion and disallows the conversion.

A simple rule would be to allow the arithmetic but require that the type of the result is specified – sometimes one wants to add a positive increment to a signed value, resulting in a signed value, and sometimes one wants to add a signed offset to a unsigned value, resulting in an unsigned value. C’s mistake is to always make the result unsigned, and Zig’s mistake is to disallow the arithmetic altogether.

Zig has overflow detection (in safe builds) and should depend on that rather than trying to prevent something at comptime that it makes no sense to prevent. After all, adding two values of the same type is allowed but can overflow … unless you use +% or +| … but you can’t even use those between signed and unsigned values, which makes no sense since the result is well-defined.

Calder-Ty · November 14, 2024, 11:45am

Please refrain from copy/pasting LLM output, especially as an means of providing an authority for your argument. The forum is meant to be a discussion between people.

Sze · November 14, 2024, 2:52pm

I think you should use more paragraphs, having it as one continuous wall of text almost made me stop reading it and after reading it, it is still difficult to focus in on specific points. That also contributes to me not feeling like I really have a good understanding of the point you are trying to make.

Maybe you are onto a good idea with specifying the result type, but I haven’t spent enough time thinking about the details of this topic, to be sure if I completely agree.

Currently I am relatively fine with requiring explicit casts, it can be annoying at times, but after a while you create functions for the things that are needed repeatedly and then you can create tests for those functions to be sure there aren’t any weird boundary/overflow issues.

mnemnion · November 14, 2024, 3:44pm

I removed the LLM post, just like I would remove a copy/paste of the first page of results from a search engine.

Please, everyone: think carefully before sharing chatbot content here. It’s not a blanket ban (e.g. you’re working on some code with a chatbot, you need to share that code as part of a thread, ok) but we’re here to share our own thoughts primarily, and LLMs don’t think. They’re also notoriously sycophantic and will back you up on almost anything, it’s noise.

As far as this goes, I agree with you. For a u type and an i type (in that order), with a u result type of width ≥ of the u arithmetic type, and the i type ≤ to the u arithmetic type, it’s actually safer to add and subtract those types than it is to add and subtract two u of the width of that u. Safer in the sense that a larger number of the possible values are compatible with that result type (no overflow or underflow) if one of them is signed. For i and u of equal width, it turns out to be 3/4 of the values, and only 1/2 + 1 for two u, so this is not a small difference.

What we’d get in particular is a much cleaner and easier to read expression of this pattern:

unsigned_index += signed_offset_of_index;

Instead of this mouthful which I have many copies of in one of my projects:

idx = @intCast(@as(isize, idx) + op.label);

YMMV but I see the difference between these two as pure added noise.

The problem, and it is a problem, is that this replaces a very simple rule with a more complex rule. Note that I didn’t say complicated, and I do think that it would be worth the extra complexity, but a) it’s not clear that I would win that argument and b) it seems like a poor time to bring it up in the evolution of the language. Soon, perhaps, but not now.

I’m certainly not one to defend every last decision in the language, but I’m a staunch advocate of the philosophy which guides those decisions. If anyone wants to add complexity to Zig, they’re going to need to justify it in terms of: simplicity elsewhere, correctness, some positive result which is impossible or unduly difficult without a language feature, and so on.

I think adding an arithmetic compatibility rule would clear that bar by virtue of those first two terms. But if the language team accepted every proposal for ‘just this one little feature’ without pushback, it wouldn’t be Zig anymore.

kj4tmp · November 14, 2024, 4:57pm

I’m still a bit confused. Could someone perhaps explain it as a list of operations?

Take the example of:

const num1: u8 = 1;
const num2: i8 = 2;
const sum: u8 = num1 + num2;

Somethig like this?

Compiler sees + operator and starts peer type resolution between num1 and num2.
Peer type resolution results in …? And -128 of num2 is incompatable?
Compiler gives error of incomparable types.

Why doesn’t the compiler cast both to u8 and change the operator to - to handle negative operand? I guess this would require extra unseen load / store operations because of the representation of signed integers twos complement which is against how low level zig wants to be.

jibal · November 14, 2024, 6:32pm

If idx is usize then you can’t cast it to isize (and if that were allowed, it could overflow). Maybe there are situations where this works but it doesn’t work in my code, where I have a function that takes an isize offset parameter and tries to add it to a usize base. If the offset argument happens to be a negative comptime_int, then the compiler is convinced that it can’t be converted to usize.

I ended up with

const i: usize = if (offset < 0)
                     base -% @as(usize, @intCast(-offset))
                 else
                     base +% @as(usize, @intCast(offset));

but I have no confidence that it always produces the right result.

I wrote C code for 30 years and learned how to avoid casts in most circumstances … every one of them is a code smell and a potential bug. Zig does a lot of things right but this is an area where it could benefit from some of that field experience and do things a bit differently.

dee0xeed · November 14, 2024, 6:49pm

it does:

const i: usize = if (offset < 0)
                     base -% @as(usize, @intCast(-offset))
                 else
                     base +% @as(usize, @intCast(offset));

(opening and closing triple back ticks are on theirs own lines)

mnemnion · November 14, 2024, 7:33pm

Fixed the formatting for you, hope you don’t mind.

No, this is correct, I was writing it from memory and got that wrong. Which makes the use of peer type resolution between operands of addition and subtraction more annoying, rather than less.

But this is a help thread, rather than a brainstorming session, so it’s not the most useful place to either pick at the status quo or try to work out a better system to replace it.

So in the interests of offering a solution, I’ve used a function like this:

pub inline fn u2i(v: usize) isize {
    return @intCast(v);
}

Which can be generalized to other types where necessary. It comes with disadvantages, because the top half of the usize range will panic if you do this, and so it’s pretty unsatisfying to risk that just to perform an operation which would work out just fine without the casting.

But if, as is frequently the case, you never expect to see unsigned values bigger than 2^31 - 1, this will get the job done.

jibal · November 14, 2024, 8:01pm

In safe builds. On real hardware with safety checks removed it would work fine.

(assuming @bitSizeOf(usize) == 32, which it generally isn’t)

In that case just use isize everywhere rather than usize. Which also will work fine on real hardware in unsafe builds with array/slice bounds checks removed. (Which encourages unsafe builds, unfortunately.)

mnemnion · November 14, 2024, 8:16pm

Good point actually, so you can do this instead:

pub inline fn u2i(v: usize) isize {
    @setRuntimeSafety(false);
    return @intCast(v);
}

As wacky as that is, it’s rational to do it… and this is a great illustration of why Zig’s rule here is, counterintuitively, not as good as it could be.

For anyone who might be confused as to why this is not a terrible idea, read up on two’s complement. Signed and unsigned numbers are a thing programmers know about, CPUs (simplifying considerably!) do not.

jibal · November 15, 2024, 1:29am

Simplifying to the point of simply being wrong. While the sign bit is not treated specially for addition and subtraction, that is not true for multiplication and division. Also, there are signed and unsigned comparisons … the former must account for the sign bit while the latter need not. It’s precisely because CPUs treat them differently that they need to be distinguished in programming languages designed around real hardware … otherwise, all numbers could be signed (as are comptime_int, which aren’t tied to hardware, which is why there’s no need for comptime_unsigned_int).

I started out as an assembly language programmer and then programmed in C for many years. Anyone not familiar with the details of twos-complement arithmetic is hobbled when writing in close-to-the-metal languages like Zig (as opposed say to Java, at least in the early versions that I used that had no unsigned types), and I strongly recommend that they become familiar with it.

I also recommend reading What Every Computer Scientist Should Know About Floating-Point Arithmetic if you ever do any math programming.

Along the same lines:

Edit: Sorry, the Unicode link I put here before was ancient and is obsolete. This one is up to date:
The Absolute Minimum Every Software Developer Must Know About Unicode in 2023 (Still No Excuses!) @ tonsky.me

(However, it links to What every software developer must know about Unicode in 2023 | Hacker News which has some valid criticism … anyway, this is a complete diversion from the topic at hand; sorry about that.)

mnemnion · November 15, 2024, 2:50am

Yes, that is the topic at hand.

jibal · November 15, 2024, 3:22am

Courtesy of an answer on a StackOverflow page( What’s the simple way of mixing signed-ness calculation in zig? - Stack Overflow) that referenced what Zig’s translate-c generates (what a good idea!),

I came up with this:

    const idx = base +% @as(usize, @bitCast(@as(isize, off)));

which is working for me even when off is comptime_int. I figure I’ll put that in a function in my library, along with some of mnemnion’s suggestions.

Thanks for the discussion.