0.15.2 Is this a bug? f32 gives a different result from f16,f64 & f128

Hi Guys.

Sorry to bug you but it would seem 0.15.2 handles f32 differently to f16,f64,f128. I solved my issue by moving to f64. But it would appear that f16 would also work. I tested the same code in 0.12.0 and got the same result. So clearly I am missing something here.

When adding 9.6 and minus 4 I get 5.6 for F16,F64,F128 but 5.6000004 for F32. I would expect them all to be the same but with different levels of precision. The fact F16 gives me the correct answer but F32 does not is perplexing.

Tried it on windows 11 and Linux (output in uploaded main.zig)

// a:f16 + b:f16 = 5.6e0 , 5.6
// a:f32 + b:f32 = 5.6000004e0,5.6000004
// a:f64 + b:f64 = 5.6e0,5.6
// a:f128 + b:f128 = 5.6e0,5.6

Is this a bug or have I stuffed up somewhere? I am guessing the latter but just can’t see it.
9.6 and plus 4 is just fine.

All the best

main.zig (1.1 KB)

const std = @import("std");
const zig_bug = @import("zig_bug");

pub fn main() !void {

    const a16:f16 = 9.6;
    const b16:f16 = -4;
    std.debug.print("a:f16 + b:f16 = {e} , {d} \n",.{a16+b16,a16+b16});

    const a32:f32 = 9.6;
    const b32:f32 = -4;
    std.debug.print("a:f32 + b:f32 = {e},{d} \n",.{a32+b32,a32+b32});

    const a64:f64 = 9.6;
    const b64:f64 = -4;
    std.debug.print("a:f64 + b:f64 = {e},{d} \n",.{a64+b64,a64+b64});

    const a128:f128 = 9.6;
    const b128:f128 = -4;
    std.debug.print("a:f128 + b:f128 = {e},{d} \n",.{a128+b128,a128+b128});



// windows output
//a:f16 + b:f16 = 5.6e0 , 5.6
//a:f32 + b:f32 = 5.6000004e0,5.6000004
//a:f64 + b:f64 = 5.6e0,5.6
//a:f128 + b:f128 = 5.6e0,5.6
//
//zig version
//0.15.2
//winver
//24H2 (OS Bild 26100.7171)

// linux output    
//    a:f16 + b:f16 = 5.6e0 , 5.6
//    a:f32 + b:f32 = 5.6000004e0,5.6000004
//    a:f64 + b:f64 = 5.6e0,5.6
//    a:f128 + b:f128 = 5.6e0,5.6
//     uname -a
//    Linux debian 6.10.3-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.3-1 (2024-08-04) x86_64 GNU/Linux
//zig version
0.15.2

}

The results are surprising, but they’re correct. It’s just a coincidence that you get extra decimal digits when using f32. If you want more details:

  • f16 has 10 significand/mantissa bits (excluding the implicit leading 1). Therefore, 9.6 rounds to the floating-point (FP) number 9.6015625 = (1 + 205 / 2^10) × 2^3, while 4 is represented exactly as (1 + 0 / 2^10) × 2^2. Their difference is D = (1 + 410 / 2^10) × 2^2 = 5.6015625. Importantly, D is closer to 5.6 than any other 16-bit FP number, so the decimal string "5.6" has enough digits to unambiguously specify D.

  • f64 has 52 significand bits. 9.6 rounds to (1 + 900,719,925,474,099 / 2^52) × 2^3, while 4 is still represented exactly. Their difference is D = (1 + 1,801,439,850,948,198 / 2^52) × 2^2 ≈ 5.5999999999999996. Again, D is closer to 5.6 than any other 64-bit FP number, so Zig prints D as 5.6.

  • The same thing happens for f128.

When we use f32 (23 significand bits), we get that 9.6 rounds to (1 + 1,677,722 / 2^23) × 2^3, and D = (1 + 3,355,444 / 2^23) × 2^2 ≈ 5.6000004. However, if we decrement the significand, we find that the next FP number down is D′ = (1 + 3,355,443 / 2^23) × 2^2 ≈ 5.59999990, which is closer to 5.6! Therefore, Zig prints D as 5.6000004, which has enough digits to specify D instead of D′.

Hopefully my explanation doesn’t have any errors…

18 Likes

Both results are correct. This has to do with floating-point precision and FP error: floating-point is at most an approximation and not a completely accurate representation of a decimal number. Honestly, I’m surprised f64 and 128 didn’t give you the 5.600… result but with more digits. 5.6 for f16 is expected given how low it’s accuracy is.

1 Like

Honestly, I’m surprised f64 and 128 didn’t give you the 5.600… result but with more digits. 5.6 for f16 is expected given how low it’s accuracy is.

I agree. I was expecting F16 to give me a low resolution. But it gave me the correct answer.
While F32 gave me the incorrect answer. I was running on the understanding that

More digits → closer to the correct answer

But I seem to be wrong. I was expecting 5.600001, 5.6000000001 5.6.0000000000001 etc
not 5.6000004 - Where does .4 come from …

I am even more confused that 9.6 + 4 gave me the correct answer but 9.6 - 4 does not.

Clearly I need to go back to school.

Thanks guys for explaining that I am an idiot.

All good for the soul

LOL

If this was a bug, it would be a hardware bug in your processor (which has happened – the Pentium FPU bug), not in Zig. Zig’s just converting your code into machine code instructions.

Could have also been a bug in Zig’s float printing implementation.

1 Like

or bad codegen for either the printing or float operations

For the printing… yes, possible but unlikely.

For the operations… no, that’s my point, A single FP add is a single assembly instruction. There’s nothing to get wrong. Especially as it’s f32 giving the unintuitive value. If it was f16 then there’s more scope as some processors don’t understand those types natively and they have to be emulated.

1 Like