Bitshifting: Integer changes types when in array for some reason

NEETdemon · August 6, 2024, 11:03am

Howdy again everyone.

I am trying to get the output of an integer shifted 8 bits to the left.

This works:

const std = @import("std");

pub fn main() !void {
  const num = 42;
  const flipped = num << 8;

  std.debug.print("{d}\n", .{flipped}); // == 10752
}

But for some reason I get an error with this:

const std = @import("std");

pub fn main() !void {
  const nums = [1]u8{ 42 };
  const flipped = nums[0] << 8;

  std.debug.print("{d}\n", .{flipped});
}

With the error when I compile:
error: type 'u3' cannot represent integer value '8'

Is this a glitch, or is there something I am not getting? I normally don’t shift bits in any of my projects so this is all new to me. Any tips, hints, and explanations would be greatly appreciated . As usual, thank you again for the help!

Southporter · August 6, 2024, 11:11am

In the first example you don’t specify the type of the integer so it is a comptime int (i.e. of unspecified size). In the second, you say that it is a u8. A u8 can’t be shifted 8 times because that would cause an overflow. If you specify the array as a larger type (i.e. u16, usize), it should work.

dimdin · August 6, 2024, 11:38am

In the reference for operator << says:

Bit Shift Left a << b

Moves all bits to the left, inserting new zeroes at the least-significant bit.

b must be comptime-known or have a type with log2 number of bits as a.

The displayed u3 is derived from log2(a bits)=log2(8)=3
and 8 cannot fit u3 it needs u4.

To get u4 for b, you need u16 for a instead of u8.

How this makes sense?

For every a: u8 value, a << 8 results to 0.
a << 7 makes sense because the least significant bit of a, survives as most significant bit.

mnemnion · August 6, 2024, 5:52pm

This is a good illustration of why Zig requires the shift value to be a log₂ integer width of the shifted value, so if you’re shifting a u64, you need to do it with a u6.

For one thing, the result of over-shifting an integer varies by CPU. Some treat it as clearing the value, while others will give you the unmodified value back.

But on a deeper level, an over-shift is never what you want. If you wanted to set the value to zero, shifting the bits out isn’t a good way to express that. If you wanted the value to be unchanged from a shift, and don’t want to special-case checking that, you can use a modulus, and this will be correct on every platform.

At some point you’ll likely be passing in a dynamic value for shifting, and if you can make it the correct width in advance, you should do that. But sometimes you’ll have a u8 and need to use it to shift a u64.

You have two options there. If it’s invalid for the number to be larger than a u6 can represent, you’ll want to use @intCast:

const shiftand: u6 = @intCast(byte);
// Now it's legal to shift a u64

Passing a value which is too large for u6 becomes safety-checked illegal behavior when you take this approach.

If you’re only interested in the low bits, and it’s fine for the number to not fit in a u6, use @truncate:

const shiftand = @truncate(byte);
// Now you have the low six bits of the byte

I do a lot of this in RuneSet, which uses u64 masks to store the lower six bytes of UTF-8 code units, and the higher two bytes to determine which masks to use when.

renoX · August 9, 2024, 8:52am

IMHO this is a “technically correct” error message which is going to create lots of confusion for each new Zig devs.

dee0xeed · August 9, 2024, 11:04am

It has created already When I encountered this compiler message for the very first time I was completely at a loss for a while. But soon I realized what’s going on - compiler just forbids me to shift u16 (for ex.) right by more than 15 bits. We need 4 bits to hold 15, any wider integer can potentially hold larger number and shifting right by a number > 15 will effectively zero a var, which is likely not what a programmer wanted.

mnemnion · August 9, 2024, 3:16pm

This is exactly why Zig restricts integer width for bitshifting, because you certainly would think this is correct, but on IA-32 architectures, it isn’t.

The 8086 does not mask the shift count. However, all other IA-32 processors (starting with the Intel 286 processor) do mask the shift count to 5 bits, resulting in a maximum count of 31. This masking is done in all operating modes (including the virtual-8086 mode) to reduce the maximum execution time of the instructions.

So if you’re shifting a 32 bit value, the shiftand is masked off to a maximum count of 31. What happens if you shift by 32? It’s masked off to zero. So it won’t zero your variable, it will leave it unaltered. Other architectures will zero it. This is perilous!

A language has choices here: leave the result implementation defined (major footgun), define one of these outcomes and emit assembly to ensure it (defensible, but not the best choice, because in practice over-shifting represents an error, and there will be circumstances where the object code emitted is suboptimal), or make it illegal behavior to over-shift. Zig has an excellent integer type system, where an integer type can be of any specified width up to u65536, so it can not only declare over-shifting illegal, it can enforce it.[0]

Also, if you think about the implications of an architecture masking off the high bits before a shift, then using @intCast or @truncate to get the appropriate width should be free in Fast/Small release modes.

[0]: This works because ‘ordinary’ integer widths are a power of two. The interaction between oddball integer widths and shiftands is not quite that clean, but in practice, this isn’t a problem. Exercise for the interested reader.

dee0xeed · August 9, 2024, 3:43pm

That is horrific! Never knew about that, thanks for the info.