Casting to and from packed structs

I’m working on a toy 8086 emulator and I’m decoding assembly at the moment. That means I’m taking in a stream of bytes and trying to interpret the opcodes, operands, etc contained within them.

The mov instruction is encoded as two bytes:

    6   1 1    2  3   3
|------|-|-| |--|---|---|
   op   d w   mo reg  rm

The details of what those fields mean aren’t terribly relevant. I thought I would be clever and write out a packed struct containing these fields so that I could @bitCast either a u16 or [2]u8 to my struct.

pub const RawMov = packed struct(u16) {
    op: u6,
    d: u1,
    w: u1,
    mod: u2,
    reg: u3,
    rm: u3,
};

What I discovered implementing this was pretty surprising: the bits are all kinds of out of order! I’m hoping that someone can make sense of this for me.

  • The fields of a packed struct are filled from the least significant bit (LSB) first. How are the bits of the fields themselves filled?
  • Reading a [2]u8 from a std.io.Reader then casting to [16]u1 also did weird things if I remember correctly, but I flailed for long enough that I could be misremembering.
  • Casting a u16 to [16]u1 puts the LSB at index 0, which is reverse of what I’d expect.
  • If you slice the [16]u1 and @bitCast to a u<N>, it will reverse back those bits.

Here’s what I want (for now I’m doing this via masks and bit shifts):

  • Given a sequence of bits already encoded in the correct order
  • Cast those bits into a struct with minimal ceremony

Is that possible? What are the rules that I seem to be missing?

I think (not sure!) the compiler is free to mess up the fields. I believe they are ordered by size or - if equal in size - in the order of declaration. Don’t know a solution.

Be careful! When a ip reads assembly, it reads from low addresses to high addresses. It first reads opdw, then moregm.

You should be able to notice that in this method of drawing, from left to right, the byte order goes from low to high, while the bit order goes from high to low; this way of illustrating has an inherent contradiction.

The memory model I’m used to is drawn like this: from low bytes to high bytes, arranged from right to left. It’s clearer to me this way:

  2  3   3       6   1 1
|--|---|---| |------|-|-|
 mo reg  rm     op   d w

A packed struct strictly follows the ordering of fields from low to high bits, whether in bytes or bits, so it should be:


pub const RawMov = packed struct(u16) {
    w: u1,
    d: u1,
    op: u6,
    rm: u3,
    reg: u3,
    mo: u2,
};

Perhaps defining it byte by byte is a better approach.

pub const MovOpcode = packed struct(u8) {
    w: u1,
    d: u1,
    op: u6,
};
pub const ModRM = packed struct(u8) {
    rm: u3,
    reg: u3,
    mo: u2,
};
pub const RawMov = packed struct {
    opcode: MovOpcode,
    modrm: ModRM,
6 Likes

the fields bits are also LSB->MSB

arrays are not bit nor byte packed, so a [16]u1 has an identical layout and size to a [16]u8, you can cast a [2]u8 to your struct directly.

Your machine is almost certainly little endian, so the least significant byte is first. Also see above on arrays.

Again, see above about arrays.

I would guess @npc1054657282 is right about the original source of your problem being a mismatch between your packed struct and the encoded bytes.

1 Like

The important thing to realize here is that packed structs are literally just integers; their fields don’t have any significance beyond being convenient aliases for a group of logically consecutive bits. As integers, their byte order depends on endianness, their bit order doesn’t. On a little-endian system:

pub fn main() !void {
    const int: u16 = 0b10101010_11001100;
    std.debug.print("{b}\n", .{int});
    const bytes: [2]u8 = @bitCast(int);
    std.debug.print("{b}{b}\n", .{ bytes[0], bytes[1] });
    const read = std.mem.readInt(u16, &bytes, .little);
    std.debug.print("{b}\n", .{read});
}
const std = @import("std");
1010101011001100
1100110010101010
1010101011001100

You’d have to use peekInt/takeInt or std.mem.readInt with the appropriate endianness.

2 Likes

Thanks everyone, I’ve got it working now so I’ll add some notes here for posterity.

Just to be clear, the instructions I’m decoding are a stream of individual bytes, so the byte order as they arrive is correct, and doesn’t have anything to do with the endianness of my machine. If was to buffer them and cast them to an integer value, the endianness would matter, as @Justus2308 shows above.

So, the given bytes I start out with are:

10001001 11011001

My new struct definition is:

// NOTE: Fields within a byte are in the opposite order
//       on the struct that they appear within the byte,
//       but each byte is in the same order that it appears
//       in the byte-stream.
pub const RawMov = packed struct(u16) {
    // byte 0
    w: u1,
    d: u1,
    op: u6,
    // byte 1
    rm: u3,
    reg: u3,
    mod: u2,
};

I buffer and cast like so:

var buf: []u8 = ...;
buf[0] = 0b10001001;
buf[1] = 0b11011001;
const raw: RawMov = @bitCast(buf[0..2].*);
std.debug.assert(raw.op == 0b100010);
2 Likes