Understanding arbitrary bit-width integers

The zig language supports arbitrary bit-width for integers. I see that this can be useful for instance to specify the boundaries of a parameter: e.g. i7 => this only goes from -64 to 63.

However, besides having a “more precise typing system”, what’s the purpose / application? To my understanding, an i7 will still occupy a full byte in memory (packed structs seem to be an exception here)? Does it for example help in arithmetic operations (performance-wise) to know there are only so many bits to consider?

p.s. I just started learning zig for fun, and normally, I don’t dabble on the bit-level :wink:

I use u31 pretty often because it can coerce to an i32 as well as a usize without requiring explicit casting.

Apart from that compiler could theroretically do more optimizations on them. For example a ?u31 or !u31 can be put into 4 bytes, whereas a ?u32 or !u32 requires 8 bytes.
But that optimization isn’t implemented yet.

And then I think it’s also useful when you need big numbers. I have used u128 a couple of times when u64 was too small.

Additionally low-bit numbers are sometimes useful for their overflow behavior

var inu8: u8 = ...;
inu8 = (inu8 + 1)%16;
var inu4: u4 = ...;
inu4 +%= 1; // This is much simpler to use.
6 Likes

Couple of things we use oddly-sized integers at TigeBeetle:

7 Likes

Sometimes you have to use integers with bit length < 8 and this is forced by the compiler.
Consider this code:

    pub fn init(a: Allocator, ctx_len: u5) !BitPredictor {
        var bp = BitPredictor{};
        bp.p0 = try a.alloc(u16, @as(u32, 1) << ctx_len);
        @memset(bp.p0, P0MAX / 2);
        return bp;
    }

Any type of ctx_len longer than u5 can potentially result in overflow, so the compiler performs some smart checks. Let’s try use u6 instead of u5. We’ll get this error:

src/bit-predictor.zig:17:49: error: expected type 'u5', found 'u6'
        bp.p0 = try a.alloc(u16, @as(u32, 1) << ctx_len);
                                                ^~~~~~~
src/bit-predictor.zig:17:49: note: unsigned 5-bit int cannot represent all possible unsigned 6-bit values

Compiler deduced that for shifting u32 1 to the left we can use u5 as a maximum and you have to use u5 (or shorter integers).

4 Likes

Thanks a lot for your replies and examples! I didn’t actively code in a language before that gave me so much control - still getting used to it. More control means more responsibility I guess :wink:

Non-zero or non-negative?

Here’s a neat trick you can pull at comptime with the help of arbitrary-width integer:

Suppose you want to assign a number to a set of functions based on their signatures. Functions with different arguments or return values would get different numbers. Functions with the same arguments and return values would get the same number.

First, the sample input:

const ns = struct {
    fn apple(arg1: u32, arg2: u32) void {
        _ = arg1;
        _ = arg2;
    }

    fn orange(arg1: u32, arg2: u32) u32 {
        return arg1 + arg2;
    }

    fn banana(arg1: u32, arg2: u32) void {
        _ = arg1;
        _ = arg2;
    }
};

As you can see, apple and banana have the same signature. If you run this code:

std.debug.print("apple: {s}\n", .{@typeName(@TypeOf(ns.apple))});
std.debug.print("orange: {s}\n", .{@typeName(@TypeOf(ns.orange))});
std.debug.print("banana: {s}\n", .{@typeName(@TypeOf(ns.banana))});

you would get:

apple: fn(u32, u32) void
orange: fn(u32, u32) u32
banana: fn(u32, u32) void

Now, the code for the counter:

const counter = create: {
    comptime var next = 0;

    break :create struct {
        fn get(comptime anything: anytype) comptime_int {
            _ = anything;
            const slot = next;
            next += 1;
            return slot;
        }
    };
};

Due to comptime memoization, counter.get() will only increment the counter if the argument given is something it hasn’t seen before. Since @typeName() return the same text string for apple and banana, you should get the same number, right?

const apple_slot = counter.get(@typeName(@TypeOf(ns.apple)));
const orange_slot = counter.get(@typeName(@TypeOf(ns.orange)));
const banana_slot = counter.get(@typeName(@TypeOf(ns.banana)));
std.debug.print("{d} {d} {d}\n", .{ apple_slot, orange_slot, banana_slot });

Output:

0 1 2

Nope. This is because strings are fat-pointers. Two identical strings stored at different memory location will be considered different by Zig. Here’s where arbitrary bit-with integer comes in. By convert strings into giant integers, we can force Zig to compare at comptime the actual data that the pointers point to:

fn signature(comptime f: anytype) comptime_int {
    const name = @typeName(@TypeOf(f));
    comptime var int = 0;
    inline for (name) |c| {
        int = (int << 8) | @as(comptime_int, @intCast(c));
    }
    return int;
}

const apple_slot = counter.get(signature(ns.apple));
const orange_slot = counter.get(signature(ns.orange));
const banana_slot = counter.get(signature(ns.banana));
std.debug.print("\n{d} {d} {d}\n", .{ apple_slot, orange_slot, banana_slot });

Result:

0 1 0
6 Likes

Mind officially blown! :open_mouth: Thanks for sharing this.

You can also get this functionality by using the type itself unless I am missing something.

const apple_slot = counter.get(@TypeOf(ns.apple));
const orange_slot = counter.get(@TypeOf(ns.orange));
const banana_slot = counter.get(@TypeOf(ns.banana));
std.debug.print("{d} {d} {d}\n", .{ apple_slot, orange_slot, banana_slot });
0 1 0
4 Likes

Curious. I didn’t realize that @typeName() will give you a different pointer even when the input is the same.

    std.debug.print("{d} {d}\n", .{ @intFromPtr(@typeName(@TypeOf(ns.apple)).ptr), @intFromPtr(@typeName(@TypeOf(ns.apple)).ptr) });
2164496 2164514

In any event, in the original code I was using sub-strings of the @typeName() as key. The code above was simplified from that.

3 Likes

I think this is worth linking here @cancername - great example why at the moment, arbitrary bit width comes with a downside (performance penalty): LLVM seems “confused”

Issue on github: LLVM: Non power of two integer arithmetic emits slow assembly · Issue #19616 · ziglang/zig · GitHub

I wouldn’t call them smart. I hit this too often where I know the value and the compiler tries to be smart but I just wind up with casts all over the place to placate it.

How do you get compiler enforced non-zero numbers with bit widths?