Determining lower/upper bound for safe conversion from f32 to i32

chung-leong · March 30, 2024, 5:02pm

How can I determine the minimum and maximum values for f32 that can safely be converted to i32 using @intFromFloat()? The following code triggers an error:

const std = @import("std");

test "intFromFloat" {
    const f: f32 = std.math.maxInt(i32);
    const i: i32 = @intFromFloat(f);
    std.debug.print("{d} => {d}\n", .{ f, i });
}

test.zig:5:34: error: float value '2147483648' cannot be stored in integer type 'i32'
    const i: i32 = @intFromFloat(f);

mperillo · March 30, 2024, 5:41pm

You need to know the number of bits used in the mantissa.

As an example: Single Precision, in particular Precision limitations on integer values.

maksverver · March 30, 2024, 5:53pm

The largest value that can be stored exactly in both i32 and f32 is 0b111111111111111111111111000000 = (1 << 31) - (1 << 7) == 2147483520.

That’s 31 binary digits of which the leading 24 are 1 and the trailing 7 are 0. That’s because i32 can store at most 31 bits (the 32nd bit is used to store the sign) so the value cannot exceed (1 << 31) - 1, and f32 can store at most 24 significant bits (of which 23 bits are stored explicitly in the mantissa, while the leading 1 bit is implicit), so you arrive at the above value as the largest integer that is exactly representable by both i32 and f32.

There is a bit more going on, though. When you assign a constant to an f32 quantity, zig apparently rounds the value to the nearest number. That’s why, in your example, std.math.maxInt(i32) which is (1 << 31) - 1 (which is representable by i32 but not f32) gets rounded up to (1 << 31) (which is representable by f32 but not i32). So similarly, there are some values higher than 2147483520 that you can assign to f32 but they will be rounded down to 2147483520 so they are then again representable by i32.

Note that means it’s not true that values between 0 and 2147483520 can safely roundtrip between i32 and f32, because they are still rounded due to the limited precision of f32. So for example, assigning 2147483520 - 1 == 2147483519 to an f32 causes it to be rounded up to 2147483520 too. The range of integers that can be safely converted from i32 to f32 and back to i32 is from 0 to (1 << 24) inclusive (!), again because of the 24-bit mantissa of f32. In other words, (1 << 24) + 1 is the smallest positive integer that cannot be represented by f32.

dimdin · March 30, 2024, 6:13pm

This prints 2147483647.

    const max_i = std.math.maxInt(i32);
    std.debug.print("{d}\n", .{ max_i });

Please note the last digit (it is not 8).
Float 32 precision is between 6 to 9 decimal digits, and the code converts 10 decimal digits. This number cannot be represented exactly as a 32bit binary float.