Where is the data in an aligned integer?

Consider a u24. On most systems, this will align to 4, and thus require 4 bytes to store.

Is the data for the u24 in the lowest memory addresses of the 4 bytes (bytes 0,1,2) , the highest memory addresses (bytes 1,2,3), or does this depend on the endian-ness of the host?

1 Like

I guarantee you that a u24 is a u32 with bounds checks. Anything else would make arithmetic impossible.

By itself, of course. As a field of a packed struct, that’s a different story. But it would be shifted down.

Running on x86 (little endian) the:

const std = @import("std");

pub fn main() !void {
    var foo: u24 = undefined;
    foo = 0x112233;
    const ptr: [*]const u8 = @alignCast(@ptrCast(&foo));
    std.debug.print("0={x} 1={x} 2={x} 3={x}\n", .{
        ptr[0],
        ptr[1],
        ptr[2],
        ptr[3],
    });
}

displays:

0=33 1=22 2=11 3=aa
2 Likes

I’m surprised to see the unused bits set to undefined. I assume that’s some sort of safety measure, at the cost of having to mask it off in Debug and (maybe?) Safe modes?

I guess the main reason I ask it about packed structs.

As we know, packed structs have a backing integer. I wonder where the padding due to alignment is? Its definitely not between the fields, so I am wondering which side of the fields its on?

Nope, that’s an artifact of the initial undefined:

test "unused u24 bits" {
    const foo24: u24 = 0x112233;
    const ptr: [*]const u8 = @alignCast(@ptrCast(&foo24));
    // prints 0=33 1=22 2=11 3=0
    std.debug.print("0={x} 1={x} 2={x} 3={x}\n", .{
        ptr[0],
        ptr[1],
        ptr[2],
        ptr[3],
    });
}

That’s in debug mode, to be clear.

So that’s interesting that it leaves the high byte undefined, I would have expected them to get zeroed on assignment. Huh.

Sorry what padding? There’s no padding inside a packed struct, do you mean the padding of a non-aligned backing integer for a packed struct?

There’s only one place it can be: the standard defines the least significant bit as the first field. So that padding must be high bits.

correct, I’m referring to the padding of the backing integer due to alignment, not inside the struct.

The padding is always last: packed struct guaranteed in-memory layout

1 Like

To be perhaps excessively pedantic, the padding is always high. It pays to be careful about the distinction between least and most significant aka low and high, on the one hand, and first and last on the other. The latter could mean the same thing, or could be something which depends on the endianness of the target.

Summarizing packed struct:

  • Fields are ordered and big endian.
  • Padding bytes are like an extra field at the end of the struct.
  • Bytes in fields are ordered in the host endianness (little endian for x86)

x86 example:

const T = packed struct(u64) {
    foo: u24,
    bar: u32,
}

T{.foo = 0x102030, .bar = 0x01020304}
}

The bytes order are:

30 20 10 04 03 02 01 AA
|      | |         |
-------- -----------
   foo       bar

(AA is the padding)

2 Likes

This test succeeds on big and little endian:

test "where is the data in a u24" {
    var memory: [4]u8 align(4) = .{ 0x00, 0x11, 0x22, 0xaa };
    const number: *u24 = @alignCast(@ptrCast(&memory));

    switch (@import("builtin").cpu.arch.endian()) {
        .little => {
            try std.testing.expectEqual(@as(u24, 0x221100), number.*);
            std.debug.print("ran little endian\n", .{});
        },
        .big => {
            try std.testing.expectEqual(@as(u24, 0x001122), number.*);
            std.debug.print("ran big endian\n", .{});
        },
    }
}

ran using standard build script and zig build test -Dtarget=powerpc64-linux using host x86_64 and QEMU.

jeff@jeff-debian:~/repos/gatorcat$ zig build test -Dtarget=powerpc64-linux
test
└─ run test stderr
ran big endian
jeff@jeff-debian:~/repos/gatorcat$ zig build test
test
└─ run test stderr
ran little endian

which means the data is in the lowest memory addresses.

Does this contradict you @mnemnion (data is always in “high” bits) or am I testing incorrectly?

What I said was:

Equivalently, the data is in the low bits. The first field of a packed struct has the even/odd bit, and so on.

1 Like

Ah, I was thinking “high” = “most significant” for some reason.

so “high” = “highest memory address” which is where the padding is.

1 Like