On type choices and "idiomatic" way to add a negative number to usize

timfayz · November 28, 2023, 1:50pm

I’m writing a parser that keeps current character position as i of the type usize. Later I realized that I need to shift it sometimes. Shifting forward is no brainer – the shift: usize is added with saturation +| to i. However, when I changed shift to the type isize and tired to add it to usize, I obviously got an error of incompatible types.

var i: usize = 100;
const shift: isize = -20;
// i + shift magic 
expect(i == 80)
expect(@TypeOf(i) == usize)

Tinkering with @as didn’t give me a result and I’m left with the fact that I don’t know how to add negative integers to unsigned ones.

Update:

While writing this post, I got the solution:

var i: usize = 100;
const shift: isize = -20;
// magic 
const i_shifted: usize = @max(0, @as(isize, @intCast(i)) + shift);
// works fine
assert(i == 80)
assert(@TypeOf(i) == usize)

However,

(1) I’m unsure whether it is the right way to do it (just in case, shift and i should be assumed subjects to runtime changes).

(2) I’m unsure about my type choices. Let me explain.

My current stream of bytes is represented as a string literal in the source code but I plan to move to reading from file. With a bit of C background, I get used to the idea (not sure where I get it) that usize represents the “maximum size of addressable memory” and it is suitable for generic “indexing”. In its turn, isize em… its signed counterpart? Basically, using usize put you in a “safe zone” — if a platform supports only 16bit size integers, your program will be allegedly switched to that amount and keep working.

However, after looking at zig/lib/std/fs/File.zig, I noticed that types used for specifying seeking offsets are u64 and i64. Now, I wonder whether Zig is limited to only 64bit platforms and filesystems when it comes to handling files in a generic platform-independent way.

IntegratedQuantum · November 28, 2023, 2:30pm

I generally try to avoid confusing type casts in arithmetic like your

const i_shifted: usize = @max(0, @as(isize, @intCast(i)) + shift);

I think it would be better and easier to understand if you use signed integers and then cast on use:

var i: isize = 100;
const i_shifted: isize = i + shift;
// Potential usage:
...string[@intCast(i_shifted)]...

Zig should work fine on 32 bit platforms. Here th 64 bit int is actually a requirement of the operating system.
The reason that the OS needs 64 bit offsets, even on 32 bit platforms, has probably something to do with supporting disks larger than 4GiB.

Sze · November 28, 2023, 2:59pm

I think I would try to not ever have a negative shift value, so basically at the first point where you might need to make a decision, capture the current index and hold on to that until you know what to do with it.

So my question is do you really need negative shift and why?
Shouldn’t i = 0; with shift = -20 be an error instead of i remaining zero silently?

I think you should think about the boundary conditions of your code, use asserts and tests to make sure those work correctly.

LucasSantos91 · November 28, 2023, 4:46pm

You’ll get undefined behaviour if i is larger than what fits in a isize, so you’ll never be able to use the full range of usize, which, in turn, defeats the purpose using a usize in the first place. It would have been better to just go with an isize. In order to properly subtract signed numbers from positive numbers, do this:

const s: isize = -3;
var u: usize = 5;
if(s < 0) 
    u -|= @intCast(-s)
else
    u +|= @intCast(s);

timfayz · November 28, 2023, 5:16pm

You’ll get undefined behaviour if i is larger than what fits in a isize

That’s right.

It would have been better to just go with an isize

But I don’t want for the end user to define or change i to a negative number (and do assertions everywhere to ensure i didn’t accidentally become negative).

That’s interesting but how does this trick work and eliminate the need to cast u as @as(isize, @intCast(u)) before the operation takes place?

Also, I often see @intCast(number) without any context about what it should be casted to. Is it like a hint to the compiler to figure that out himself?

timfayz · November 28, 2023, 5:27pm

Shouldn’t i = 0; with shift = -20 be an error instead of i remaining zero silently?

Nope. It is by design that it should be “trimmed” to 0 silently.

So my question is do you really need negative shift and why?

The i of the parser is already there representing where the process ended, and sometimes I need to print a syntax error with a bit of a shift to hint a user where the things went wrong. Say, a parser stopped at 100 but I know the error begins at 98. So I run self.syntaxError(-2, "Be mindful of your actions my friend.")

I think you should think about the boundary conditions of your code

I certainly should and I do (at least, to the best of my skills)

ianprime0509 · November 28, 2023, 6:09pm

@intCast, and many other conversion-related builtins such as @ptrCast and @enumFromInt, uses its result type to determine what to cast to. There is a PR currently open/being worked on to document the concept of result types (and the broader topic of result location semantics, or RLS, in general): langref: add basic documentation of RLS by mlugg · Pull Request #18043 · ziglang/zig · GitHub

Briefly, the compiler is using context to figure out what the intended type is, as in the simplest example:

const a: usize = @intCast(x);

Here, the compiler knows that the result type of @intCast(x) is usize from context, so that’s the type you get from the cast. There are many other scenarios where the compiler can do this, and if you ever end up in a scenario where the compiler can’t determine the result type, you can use @as to provide one explicitly: @as(usize, @intCast(x)).

chung-leong · November 28, 2023, 7:13pm

Isn’t this one of the situations for which @bitCast is designed?

i = @bitCast(@as(isize, @bitCast(i)) + shift);

permutationlock · November 28, 2023, 7:48pm

You would still need case checks to avoid weird overflow:

test "false positive overflow" {
    var n: usize = std.math.maxInt(isize);
    const m: isize = 1;
    // the following is overflow unless we switch to '+%'
    n = @bitCast(@as(isize, @bitCast(n)) + m);
    try std.testing.expectEqual(n + 1, n);
}

test "real overflow ignored" {
    var n: usize = std.math.maxInt(usize);
    const m: isize = 1;
    n = @bitCast(@as(isize, @bitCast(n)) + m);
    try std.testing.expectEqual(@as(usize, 0), n);
}

LucasSantos91 · November 29, 2023, 1:18am

Instead of casting u to a isize, we cast s to usize.
The difference between signed and unsigned variants is one bit. Consider a u8 and an i8.
u8 can go from 0 to 255.
i8 can go from -128 to 127.
The absolute value of i8 can go from 0 to 128, which always fits in a u8. So if the signed number is positive, we can cast it and add it to the unsigned number. If the number is negative, we flip it to positive, cast it and subtract it from the unsigned number.

slonik-az · November 29, 2023, 11:16am

And then it can underflow with disastrous results. In safe modes the program exits with panic, in unsafe modes with undefined behavior. When talking about pointers or array indexes, on current 64-bit architectures addressable memory is at most 48 bits. Casting to isize and subtracting two isize numbers should be fine. Of course one needs to check whether the result is non-negative before casting back to usize.

Sze · November 29, 2023, 11:47am

Where does it underflow?
I couldn’t find a place where it would underflow, note for example the use of the +|= and -|= operators.

Alternative operators are provided for wrapping and saturating arithmetic on all targets. +% and -% perform wrapping arithmetic while +| and -| perform saturating arithmetic.

Runtime Integer Values

timfayz · November 29, 2023, 12:51pm

permutationlock:

test "real overflow ignored" {
    var n: usize = std.math.maxInt(usize);
    const m: isize = 1;
    n = @bitCast(@as(isize, @bitCast(n)) + m);
    try std.testing.expectEqual(@as(usize, 0), n);
}

Thank you. It’s a good example where the things could go wrong with unconscious bit fiddling. It took me sometime to process.

Ok, I realized what my problem was – believe it or not but somehow my brain got stuck with the idea that since unsigned integers cannot be negative, you can’t subtract from them! So, I think the answer from @LucasSantos91 solves the problem in the most optimal way (given my level of understanding):

IntegratedQuantum · November 29, 2023, 12:53pm

Not an underflow, but it can overflow here:

-s can overflow. So it would be safer to use @abs(s) instead of @intCast here.

timfayz · November 29, 2023, 1:08pm

How? Why?_

IntegratedQuantum · November 29, 2023, 1:12pm

For the sake of simplicity consider i8 which ranges from -128 to 127
So there is one leftover negative value -128 that cannot be negated, since +128 can’t be represented.
@abs resolves this problem by returning a u8 if you give it an i8.

Sze · November 29, 2023, 1:41pm

So here is the resulting function applyShift2 and some tests:

const std = @import("std");

// make it a function to make it easier to discuss alternative functions
fn applyShift(orig: usize, shift: isize) usize {
    const s: isize = shift;
    var u: usize = orig;
    if (s < 0)
        u -|= @intCast(-s)
    else
        u +|= @intCast(s);
    return u;
}

fn applyShift2(orig: usize, shift: isize) usize {
    const s: isize = shift;
    var u: usize = orig;
    if (s < 0)
        u -|= @abs(s)
    else
        u +|= @intCast(s);
    return u;
}

// like 2 but without the var
pub fn applyShift3(u: usize, s: isize) usize {
    return if (s < 0) u -| @abs(s) else u +| @as(usize, @intCast(s));
}

const shiftFn = *const fn (usize, isize) usize;

test "overflow and underflow" {
    const t = std.testing;
    const utils = struct {
        fn testIt(f: shiftFn) !void {
            try t.expectEqual(@as(usize, 0), f(0, 0));
            try t.expectEqual(@as(usize, @intCast(std.math.maxInt(isize))), f(0, std.math.maxInt(isize)));
            try t.expectEqual(@as(usize, 0), f(0, std.math.minInt(isize)));
            try t.expectEqual(@as(usize, std.math.maxInt(usize)), f(std.math.maxInt(usize), 0));
            try t.expectEqual(@as(usize, std.math.maxInt(usize)), f(std.math.maxInt(usize), std.math.maxInt(isize)));
        }
    };
    // try utils.testIt(applyShift); // integer overflow
    try utils.testIt(applyShift2);
    try utils.testIt(applyShift3);
}

I wonder if there is some other test that should be added, I currently can’t come up with more.
The applyShift3 is like 2 but using if expression to avoid the var and I quite like how it becomes a fairly readable oneliner.

LucasSantos91 · November 29, 2023, 1:42pm

Completely forgot about @abs. You’re right.