# Understanding arbitrary bit-width integers

The zig language supports arbitrary bit-width for integers. I see that this can be useful for instance to specify the boundaries of a parameter: e.g. i7 => this only goes from -64 to 63.

However, besides having a “more precise typing system”, what’s the purpose / application? To my understanding, an i7 will still occupy a full byte in memory (packed structs seem to be an exception here)? Does it for example help in arithmetic operations (performance-wise) to know there are only so many bits to consider?

p.s. I just started learning zig for fun, and normally, I don’t dabble on the bit-level

I use `u31` pretty often because it can coerce to an `i32` as well as a `usize` without requiring explicit casting.

Apart from that compiler could theroretically do more optimizations on them. For example a `?u31` or `!u31` can be put into 4 bytes, whereas a `?u32` or `!u32` requires 8 bytes.
But that optimization isn’t implemented yet.

And then I think it’s also useful when you need big numbers. I have used `u128` a couple of times when `u64` was too small.

Additionally low-bit numbers are sometimes useful for their overflow behavior

``````var inu8: u8 = ...;
inu8 = (inu8 + 1)%16;
var inu4: u4 = ...;
inu4 +%= 1; // This is much simpler to use.
``````
6 Likes

Couple of things we use oddly-sized integers at TigeBeetle:

7 Likes

Sometimes you have to use integers with bit length < 8 and this is forced by the compiler.
Consider this code:

``````    pub fn init(a: Allocator, ctx_len: u5) !BitPredictor {
var bp = BitPredictor{};
bp.p0 = try a.alloc(u16, @as(u32, 1) << ctx_len);
@memset(bp.p0, P0MAX / 2);
return bp;
}
``````

Any type of `ctx_len` longer than `u5` can potentially result in overflow, so the compiler performs some smart checks. Let’s try use `u6` instead of `u5`. We’ll get this error:

``````src/bit-predictor.zig:17:49: error: expected type 'u5', found 'u6'
bp.p0 = try a.alloc(u16, @as(u32, 1) << ctx_len);
^~~~~~~
src/bit-predictor.zig:17:49: note: unsigned 5-bit int cannot represent all possible unsigned 6-bit values
``````

Compiler deduced that for shifting `u32` 1 to the left we can use `u5` as a maximum and you have to use `u5` (or shorter integers).

4 Likes

Thanks a lot for your replies and examples! I didn’t actively code in a language before that gave me so much control - still getting used to it. More control means more responsibility I guess

Non-zero or non-negative?

Here’s a neat trick you can pull at comptime with the help of arbitrary-width integer:

Suppose you want to assign a number to a set of functions based on their signatures. Functions with different arguments or return values would get different numbers. Functions with the same arguments and return values would get the same number.

First, the sample input:

``````const ns = struct {
fn apple(arg1: u32, arg2: u32) void {
_ = arg1;
_ = arg2;
}

fn orange(arg1: u32, arg2: u32) u32 {
return arg1 + arg2;
}

fn banana(arg1: u32, arg2: u32) void {
_ = arg1;
_ = arg2;
}
};
``````

As you can see, apple and banana have the same signature. If you run this code:

``````std.debug.print("apple: {s}\n", .{@typeName(@TypeOf(ns.apple))});
std.debug.print("orange: {s}\n", .{@typeName(@TypeOf(ns.orange))});
std.debug.print("banana: {s}\n", .{@typeName(@TypeOf(ns.banana))});
``````

you would get:

``````apple: fn(u32, u32) void
orange: fn(u32, u32) u32
banana: fn(u32, u32) void
``````

Now, the code for the counter:

``````const counter = create: {
comptime var next = 0;

break :create struct {
fn get(comptime anything: anytype) comptime_int {
_ = anything;
const slot = next;
next += 1;
return slot;
}
};
};
``````

Due to comptime memoization, `counter.get()` will only increment the counter if the argument given is something it hasn’t seen before. Since @typeName() return the same text string for apple and banana, you should get the same number, right?

``````const apple_slot = counter.get(@typeName(@TypeOf(ns.apple)));
const orange_slot = counter.get(@typeName(@TypeOf(ns.orange)));
const banana_slot = counter.get(@typeName(@TypeOf(ns.banana)));
std.debug.print("{d} {d} {d}\n", .{ apple_slot, orange_slot, banana_slot });
``````

Output:

``````0 1 2
``````

Nope. This is because strings are fat-pointers. Two identical strings stored at different memory location will be considered different by Zig. Here’s where arbitrary bit-with integer comes in. By convert strings into giant integers, we can force Zig to compare at comptime the actual data that the pointers point to:

``````fn signature(comptime f: anytype) comptime_int {
const name = @typeName(@TypeOf(f));
comptime var int = 0;
inline for (name) |c| {
int = (int << 8) | @as(comptime_int, @intCast(c));
}
return int;
}

const apple_slot = counter.get(signature(ns.apple));
const orange_slot = counter.get(signature(ns.orange));
const banana_slot = counter.get(signature(ns.banana));
std.debug.print("\n{d} {d} {d}\n", .{ apple_slot, orange_slot, banana_slot });
``````

Result:

``````0 1 0
``````
6 Likes

Mind officially blown! Thanks for sharing this.

You can also get this functionality by using the type itself unless I am missing something.

``````const apple_slot = counter.get(@TypeOf(ns.apple));
const orange_slot = counter.get(@TypeOf(ns.orange));
const banana_slot = counter.get(@TypeOf(ns.banana));
std.debug.print("{d} {d} {d}\n", .{ apple_slot, orange_slot, banana_slot });
``````
``````0 1 0
``````
4 Likes

Curious. I didn’t realize that @typeName() will give you a different pointer even when the input is the same.

``````    std.debug.print("{d} {d}\n", .{ @intFromPtr(@typeName(@TypeOf(ns.apple)).ptr), @intFromPtr(@typeName(@TypeOf(ns.apple)).ptr) });
``````
``````2164496 2164514
``````

In any event, in the original code I was using sub-strings of the @typeName() as key. The code above was simplified from that.

3 Likes

I think this is worth linking here @cancername - great example why at the moment, arbitrary bit width comes with a downside (performance penalty): LLVM seems “confused”

I wouldn’t call them smart. I hit this too often where I know the value and the compiler tries to be smart but I just wind up with casts all over the place to placate it.

How do you get compiler enforced non-zero numbers with bit widths?