How to inspect the content of a value bit by bit?

timfayz · November 30, 2023, 1:42pm

In order to ease the burden of processing examples at On type choices and "idiomatic" way to add a negative number to usize, I decided to implement a simple function that would inspect (print) the content of integers bit by bit. As a working solution, I wrote the following:

const std = @import("std");

pub fn dumpBits(comptime T: anytype, comptime value: T) ![]const u8 {
    if (@typeInfo(T) != .Int) return error.ValueShouldBeAnInteger;

    const T_info = @typeInfo(T);
    const Uint = std.meta.Int(.unsigned, T_info.Int.bits);
    comptime var mask: Uint = 1 << (T_info.Int.bits - 1);
    const casted: Uint = @bitCast(value);

    comptime var out: []const u8 = "";
    inline while (mask != 0) : (mask /= 2) {
        const bit = if (casted & mask == 0) "0" else "1";
        out = out ++ bit;
    }

    return out;
}


pub fn main() !void {
    std.log.debug("max usize: {s}", .{try intToBits(u8, std.math.maxInt(u8))});
    std.log.debug("      -42: {s}", .{try intToBits(i8, -42)});
}

It gives:

debug: max usize: 11111111
debug:       -42: 11010110

Feels right. However, I felt like the solution could be a bit more simple and generic, for example, the one that would allow me to print bits of a struct. So I made a sketch that doesn’t work:

pub fn dumpBits(comptime T: anytype, comptime value: T) ![]const u8 {
    comptime var out: []const u8 = "";
    const len = @bitSizeOf(T);
    const casted: [len]u1 = @bitCast(value); // error because [len]u1 with padding takes ~8x more bits
    inline for (casted) |bit| {
        out = out ++ if (bit == 1) "1" else "0";
    }
    return out;
}

I really like this solution, and would be nice if it worked, because we simply cast something into array of bits and print them one by one. However, running this function:

pub fn main() !void {
    const Struct = packed struct {
        f1: u8,
        f2: i8,
    };
    const data = Struct{ .f1 = 1, .f2 = -1 };
    std.log.debug("{s}", .{try dumpBits(Struct, data)});
}

Gives:

howtos/q_bitFiddling.zig:23:29: error: @bitCast size mismatch: destination type '[16]u1' has 121 bits but source type 'q_bitFiddling.main.Struct' has 16 bits
    const casted: [len]u1 = @bitCast(value); // error because [len]u1 with padding takes ~8x more bits
                            ^~~~~~~~~~~~~~~

At this point, I’m unsure about how to handle paddings, bit casting like this, and potential concerns with endianness (like when do I need to think about them).

gonzo · November 30, 2023, 2:23pm

You could also use the b formatting option for formatted output (std.log.debug, etc).

Sze · November 30, 2023, 2:28pm

I think std.mem.asBytes is your friend here, I haven’t dug into what it actually does or tested it with lots of different types, but it seems to do what you want here.

Here is some code adapted from some of my code where I also wanted to look at some memory, this prints bytes in a similar way like some hexdump tools:

const std = @import("std");

fn dumpBits(value: anytype) !void {
    printBits(std.mem.asBytes(&value));
    std.debug.print("\n", .{});
    printBytes(std.mem.asBytes(&value));
    std.debug.print("\n", .{});
}

fn printBits(memory: []const u8) void {
    printMemoryFmt("{b:0>8} ", memory);
}

fn printBytes(memory: []const u8) void {
    printMemoryFmt("{X:0>2} ", memory);
}

fn printMemoryFmt(comptime fmt: []const u8, memory: []const u8) void {
    // const wrap = 8 * 4; // this is what I used for bytes originally to fit on my screen/terminal
    const wrap = 8 * 2;
    var i: u32 = 0;
    for (memory) |byte| {
        std.debug.print(fmt, .{byte});
        i += 1;
        if (i % 8 == 0) {
            std.debug.print("   ", .{});
        } else if (i % 4 == 0) {
            std.debug.print(" ", .{});
        }
        if (i >= wrap) {
            std.debug.print("\n", .{});
            i = 0;
        }
    }
}

pub fn main() !void {
    const Struct = packed struct {
        f1: u8,
        f2: i8,
    };
    const data = [1]Struct{.{ .f1 = 1, .f2 = -1 }} ** 20;

    try dumpBits(data);
}

Here is the output:

00000001 11111111 00000001 11111111  00000001 11111111 00000001 11111111    00000001 11111111 00000001 11111111  00000001 11111111 00000001 11111111    
00000001 11111111 00000001 11111111  00000001 11111111 00000001 11111111    00000001 11111111 00000001 11111111  00000001 11111111 00000001 11111111    
00000001 11111111 00000001 11111111  00000001 11111111 00000001 11111111    
01 FF 01 FF  01 FF 01 FF    01 FF 01 FF  01 FF 01 FF    
01 FF 01 FF  01 FF 01 FF    01 FF 01 FF  01 FF 01 FF    
01 FF 01 FF  01 FF 01 FF

IntegratedQuantum · November 30, 2023, 2:45pm

This type of cast is possible with some tricks:

const vector: @Vector(len, u1) = @bitCast(value); // Vectors are packed in memory.
const casted: [len]u1 = vector; // But they can also coerce to an array of the same type.
// for(vector) |bit| { Note that we can't use the vector directly, since vectors can't be iterated.

timfayz · November 30, 2023, 5:26pm

Hm…it seems it works but the output is strange (I put a space on byte boundaries myself):

10000000 11111111

Why one is 10000000 instead of 00000001? It is like reversed. My machine is little endian but I assume it affects only byte-wise, not bitwise order.

Speaking of @Sze answer. I think it’s cool. Looks like disassembly

Sze · November 30, 2023, 5:39pm

I think this is interesting, thank you for asking this question, I learned two new things today:

comptime value: anytype Is interesting, I had not really considered it specifically or how it is different from value: anytype, it allows you to express some comptime things neatly
vector array conversion trick from @IntegratedQuantum’s answer

pub fn dumpBits(comptime value: anytype) []const u8 {
    return dumpBitsImpl(@TypeOf(value), value);
}

pub fn dumpBitsImpl(comptime T: type, comptime value: T) []const u8 {
    comptime var out: []const u8 = "";
    const len = @bitSizeOf(T);
    const vector: @Vector(len, u1) = @bitCast(value);
    const casted: [len]u1 = vector;
    inline for (casted) |bit| {
        out = out ++ if (bit == 1) "1" else "0";
    }
    return out;
}

Comparing the output of this function I noticed that for some reason within every byte the bits are arranged from least significant bit to most significant bit, I suspect that this is just what @bitCast happens to do when it casts to a u1 Vector, but I think it would be good to dig more into the source code and figure out whether that is the cause and why that happens.

const std = @import("std");

fn dump(value: anytype) !void {
    printBits(std.mem.asBytes(&value));
    std.debug.print("\n", .{});
    printBytes(std.mem.asBytes(&value));
    std.debug.print("\n", .{});
}

fn printBits(memory: []const u8) void {
    printMemoryFmt("{b:0>8} ", memory);
}

fn printBytes(memory: []const u8) void {
    printMemoryFmt("{X:0>2} ", memory);
}

fn printMemoryFmt(comptime fmt: []const u8, memory: []const u8) void {
    const wrap = 8 * 2;
    var i: u32 = 0;
    for (memory) |byte| {
        std.debug.print(fmt, .{byte});
        i += 1;
        if (i % 8 == 0) {
            std.debug.print("   ", .{});
        } else if (i % 4 == 0) {
            std.debug.print(" ", .{});
        }
        if (i >= wrap) {
            std.debug.print("\n", .{});
            i = 0;
        }
    }
}

pub fn dumpBits(comptime value: anytype) []const u8 {
    return dumpBitsImpl(@TypeOf(value), value);
}

pub fn dumpBitsImpl(comptime T: type, comptime value: T) []const u8 {
    comptime var out: []const u8 = "";
    const len = @bitSizeOf(T);
    const vector: @Vector(len, u1) = @bitCast(value);
    const casted: [len]u1 = vector;
    inline for (casted) |bit| {
        out = out ++ if (bit == 1) "1" else "0";
    }
    return out;
}

pub fn main() !void {
    const Struct = packed struct {
        f1: u8,
        f2: i8,
    };
    const data = [1]Struct{.{ .f1 = 0b11101101, .f2 = 0 }} ** 4;

    try dump(data);
    std.debug.print("{s}\n", .{dumpBits(data)});
}

Outputs:

11101101 00000000 11101101 00000000  11101101 00000000 11101101 00000000    
ED 00 ED 00  ED 00 ED 00    
1011011100000000101101110000000010110111000000001011011100000000

~~I don’t think you need to be concerned with endianness because you simply want to see the value printed, so I think it is fine to see the value with whatever endianness it happens to have.~~ see @AndrewCodeDev’s answer. Regarding the other two padding and bit casting, I am not sure if I understand all possible details that could come up with those, I looked a bit at the std.mem.asBytes code and it is very readable. (But I think looking at many different types and the asBytes result types would be a good exercise (I haven’t done yet) to gain more intuition towards how types are mapped to their bytes)

One thing about bitcasting is that it only works on types with defined memory layout.
When I change Struct to be a non-packed struct, I get:

dumpbytes2.zig:43:38: error: TODO: implement writeToMemory for type '[4]dumpbytes2.main.Struct'
    const vector: @Vector(len, u1) = @bitCast(value);
                                     ^~~~~~~~~~~~~~~

I wonder whether that should instead be a compile error telling me that struct isn’t allowed? “It’s a compile error to bitcast a value of undefined layout;” @bitCast

AndrewCodeDev · November 30, 2023, 5:40pm

Bit order usually follows the same endianness as the byte order for a given computer system. That is, in a big endian system the most significant bit is stored at the lowest bit address; in a little endian system, the least significant bit is stored at the lowest bit address.

Byte and Bit Order Dissection | Linux Journal.

You have to reverse the order of struct fields for endianness because of this… for instance, they provide an example:

struct iphdr {
#if defined(__LITTLE_ENDIAN_BITFIELD)
        __u8    ihl:4,
                version:4;
#elif defined (__BIG_ENDIAN_BITFIELD)
        __u8    version:4,
                ihl:4;
#else
...

You can see that the bit field struct is being reversed (ihl and version are swapped).

Sze · November 30, 2023, 6:12pm

I wasn’t aware that bit order is also based on endianness, from somewhere I had picked up the thought that “endianness is only about bytes” so that seems to be wrong and I will read the article later.
I also haven’t dealt with individual bits a lot so maybe that is why I haven’t thought about bit order a lot.

AndrewCodeDev · November 30, 2023, 6:17pm

It’s all good - chances are, if you had this impression then a lot of people do as well (so it’s valuable public discussion - I tell myself that when I make mistakes lol). I came across this when writing quantization logic for floats to u8 using packed structs… I ended up playing around with an absolute value function that just toggles the sign bit like a boolean… you usually don’t run into this stuff if you’re not doing something really particular with bits