One reason why is that this prevents overflow when a range of a signed integer is asymmetric… for example [-128,127] for instance. It can’t map -128 to positive 128 because it will overflow by one. In this case, adding that extra bit (what was originally the sign bit in the case of an unsigned integer of the same bitwidth) will more than compensate for that.
If you need to go from unsigned to signed and always guarentee that the cast will work, you can use an unsigned integer with one less bit and cast to a signed integer with one more bit.
u7 can safely cast to i8
u31 can safely cast to i32
etc...
And then using @intCast, increase the bit width of the unsigned value to the signed target and then do your arithmetic that way.
As you’re noticing, Zig makes it painful (not terribly, but noticeable) on purpose. There’s a lot of issues with mixing lanes and Zig makes those apparent.
So C has a bunch of implicit conversion rules. It’s supposed to make arithmetic expressions easy to write, and it does do that. But it’s a notorious source of bugs. The actual rules are quite complex, the results can be surprising, and it’s far too easy to hit undefined behavior.
Type coercions are only allowed when it is completely unambiguous how to get from one type to another, and the transformation is guaranteed to be safe.
This is much better, and also it can be extremely annoying. You’ve hit the case I find most annoying: addition or subtraction between ‘peer’ signed and unsigned types.
That’s despite the fact that both of these functions hold the same hazard: the returned value might be negative. The second one also poses a risk of overflow, but of course it’s easy to run that risk with two usize as well.
Just because it’s annoying doesn’t mean I disagree with it. Adding more rules to make writing code more ergonomic adds back some of the complexity we’re trying to get away from.
But I have this function in probably a majority of my libraries:
I think this is central enough to doing useful things with integer values in Zig that it should probably be a builtin. Before result location semantics, @intCast used to take two arguments, and this is exactly how it worked. Writing @as(usize, @intCast(v)) is very heavyweight and ends up obscuring the equation, arithmetic bugs aren’t type conversion bugs, but they’re still bugs.
This is extremely difficult to get right and very hard to tell whether the code is right or wrong by reading it, and therefore is extremely error-prone … the opposite of the intent.
Note that if foo has an unsigned type then foo - 3 compiles but foo + -3 doesn’t, because Zig’s overly simplistic rules require -3 to be converted to foo’s type, but it can’t be because Zig views signed and unsigned integer types as being incommensurate , when in fact they aren’t. Even if the signed value is a smaller type they can’t be added … as far as Zig is concerned, you can no more add signed and unsigned numbers than you can add numbers and structs.
One might be inclined to try to use @bitCast, which is almost certain to introduce a bug.
I know that people tend to rationalize every Zig design decision but this one is clearly wrong … the language requires a conversion and disallows the conversion.
A simple rule would be to allow the arithmetic but require that the type of the result is specified – sometimes one wants to add a positive increment to a signed value, resulting in a signed value, and sometimes one wants to add a signed offset to a unsigned value, resulting in an unsigned value. C’s mistake is to always make the result unsigned, and Zig’s mistake is to disallow the arithmetic altogether.
Zig has overflow detection (in safe builds) and should depend on that rather than trying to prevent something at comptime that it makes no sense to prevent. After all, adding two values of the same type is allowed but can overflow … unless you use +% or +| … but you can’t even use those between signed and unsigned values, which makes no sense since the result is well-defined.
Please refrain from copy/pasting LLM output, especially as an means of providing an authority for your argument. The forum is meant to be a discussion between people.
I think you should use more paragraphs, having it as one continuous wall of text almost made me stop reading it and after reading it, it is still difficult to focus in on specific points. That also contributes to me not feeling like I really have a good understanding of the point you are trying to make.
Maybe you are onto a good idea with specifying the result type, but I haven’t spent enough time thinking about the details of this topic, to be sure if I completely agree.
Currently I am relatively fine with requiring explicit casts, it can be annoying at times, but after a while you create functions for the things that are needed repeatedly and then you can create tests for those functions to be sure there aren’t any weird boundary/overflow issues.
I removed the LLM post, just like I would remove a copy/paste of the first page of results from a search engine.
Please, everyone: think carefully before sharing chatbot content here. It’s not a blanket ban (e.g. you’re working on some code with a chatbot, you need to share that code as part of a thread, ok) but we’re here to share our own thoughts primarily, and LLMs don’t think. They’re also notoriously sycophantic and will back you up on almost anything, it’s noise.
As far as this goes, I agree with you. For a u type and an i type (in that order), with a u result type of width ≥ of the u arithmetic type, and the i type ≤ to the u arithmetic type, it’s actually safer to add and subtract those types than it is to add and subtract two u of the width of that u. Safer in the sense that a larger number of the possible values are compatible with that result type (no overflow or underflow) if one of them is signed. For i and u of equal width, it turns out to be 3/4 of the values, and only 1/2 + 1 for two u, so this is not a small difference.
What we’d get in particular is a much cleaner and easier to read expression of this pattern:
unsigned_index += signed_offset_of_index;
Instead of this mouthful which I have many copies of in one of my projects:
idx = @intCast(@as(isize, idx) + op.label);
YMMV but I see the difference between these two as pure added noise.
The problem, and it is a problem, is that this replaces a very simple rule with a more complex rule. Note that I didn’t say complicated, and I do think that it would be worth the extra complexity, but a) it’s not clear that I would win that argument and b) it seems like a poor time to bring it up in the evolution of the language. Soon, perhaps, but not now.
I’m certainly not one to defend every last decision in the language, but I’m a staunch advocate of the philosophy which guides those decisions. If anyone wants to add complexity to Zig, they’re going to need to justify it in terms of: simplicity elsewhere, correctness, some positive result which is impossible or unduly difficult without a language feature, and so on.
I think adding an arithmetic compatibility rule would clear that bar by virtue of those first two terms. But if the language team accepted every proposal for ‘just this one little feature’ without pushback, it wouldn’t be Zig anymore.
Compiler sees + operator and starts peer type resolution between num1 and num2.
Peer type resolution results in …? And -128 of num2 is incompatable?
Compiler gives error of incomparable types.
Why doesn’t the compiler cast both to u8 and change the operator to - to handle negative operand? I guess this would require extra unseen load / store operations because of the representation of signed integers twos complement which is against how low level zig wants to be.
If idx is usize then you can’t cast it to isize (and if that were allowed, it could overflow). Maybe there are situations where this works but it doesn’t work in my code, where I have a function that takes an isize offset parameter and tries to add it to a usize base. If the offset argument happens to be a negative comptime_int, then the compiler is convinced that it can’t be converted to usize.
I ended up with
const i: usize = if (offset < 0)
base -% @as(usize, @intCast(-offset))
else
base +% @as(usize, @intCast(offset));
but I have no confidence that it always produces the right result.
I wrote C code for 30 years and learned how to avoid casts in most circumstances … every one of them is a code smell and a potential bug. Zig does a lot of things right but this is an area where it could benefit from some of that field experience and do things a bit differently.
Fixed the formatting for you, hope you don’t mind.
No, this is correct, I was writing it from memory and got that wrong. Which makes the use of peer type resolution between operands of addition and subtraction more annoying, rather than less.
But this is a help thread, rather than a brainstorming session, so it’s not the most useful place to either pick at the status quo or try to work out a better system to replace it.
So in the interests of offering a solution, I’ve used a function like this:
Which can be generalized to other types where necessary. It comes with disadvantages, because the top half of the usize range will panic if you do this, and so it’s pretty unsatisfying to risk that just to perform an operation which would work out just fine without the casting.
But if, as is frequently the case, you never expect to see unsigned values bigger than 2^31 - 1, this will get the job done.
In safe builds. On real hardware with safety checks removed it would work fine.
(assuming @bitSizeOf(usize) == 32, which it generally isn’t)
In that case just use isize everywhere rather than usize. Which also will work fine on real hardware in unsafe builds with array/slice bounds checks removed. (Which encourages unsafe builds, unfortunately.)
As wacky as that is, it’s rational to do it… and this is a great illustration of why Zig’s rule here is, counterintuitively, not as good as it could be.
For anyone who might be confused as to why this is not a terrible idea, read up on two’s complement. Signed and unsigned numbers are a thing programmers know about, CPUs (simplifying considerably!) do not.
Simplifying to the point of simply being wrong. While the sign bit is not treated specially for addition and subtraction, that is not true for multiplication and division. Also, there are signed and unsigned comparisons … the former must account for the sign bit while the latter need not. It’s precisely because CPUs treat them differently that they need to be distinguished in programming languages designed around real hardware … otherwise, all numbers could be signed (as are comptime_int, which aren’t tied to hardware, which is why there’s no need for comptime_unsigned_int).
I started out as an assembly language programmer and then programmed in C for many years. Anyone not familiar with the details of twos-complement arithmetic is hobbled when writing in close-to-the-metal languages like Zig (as opposed say to Java, at least in the early versions that I used that had no unsigned types), and I strongly recommend that they become familiar with it.