What is alignment?

  1. What is alignment? Why do I need to care about it?
  2. What are the consequences of unaligned loads? Unaligned stores?
  3. Why is a type’s alignment only powers of 2?
  4. When I don’t set alignment explicitly with @align(), what alignment is given? Why?
  5. How do other programming language deal with alignment? Has Zig been designed differently or the same? Why?
8 Likes

Basically it’s about memory addresses. An u8 (byte) is the unit that addresses in machine code actually index. The address 1 is the second byte in the address space and so on. However, some values are larger than one byte, such as u16 (two bytes making up a 16-bit integer). To read and write these from/to CPU register, the same byte addresses are used, but usually machines can only efficiently read and write these values if their location in memory starts at a multiple of the size in bytes. So reading a u16 from address 1 is unaligned, but from address 2 is aligned.

Unaligned memory access is possible in Zig with pointer casting or packed structs, but may generate much slower machine code. Usually the penalty is only reduced performance.

When defining and allocating normal structs, the compiler will ensure good alignment by inserting padding, so usually you needn’t worry about it too much. I think normal structs are even allowed to reorder fields to minimize padding while keeping alignment good, but I don’t know if that’s the case or am I mixing it up with Rust’s behavior.

1 Like

The memory address needs to be a multiple of the alignment to be optimal/valid.

You care about it mostly for performance, CPU’s fetch memory in batches aligned to a static grid, alignment ensures data is placed optimally in this grid to minimise reads/writes. Some instructions may be faster with certain alignments, some instructions require certain alignments for correctness.

can increase the number of reads/writes slowing down the program, some instructions may be slower, some instructions may halt your program for incorrect alignment, some instructions may do undefined behaviour.

Halting and undefined behaviour can be ignored if you’re not doing inline asm, you can assume the compiler will choose an instruction that will at least work with the chosen alignment, even if it’s not optimal, if that’s not the case it’s a bug and should be reported.

Cant say on all the reasons, but the most useful one is that given a larger aligned address, it will also be aligned to all smaller alignments.

for types like integers which are purely a sequence of bytes, it will the number of bytes rounding up to the next power of 2 if it isn’t already.

For composite types it will be the largest alignment of all fields, because that gauruntees all fields will be correctly aligned (given correct padding).

Padding will ensure the next field/element is properly aligned. With exception to extern and packed types, zig will arrange the fields from largest to smallest alignment as that means no padding between fields which can otherwise bloat the size, there may be padding after the fields if needed to ensure when storing say an array of the type, the next element is properly aligned, meaning indexing can ignore alignment.

pretty much the same, they may be more or less sophisticated with field ordering. I can’t think of another language that has alignment as part of the type system and gives you so much control over it as zig. The closest is probably C, it allows you to override field alignment like zig, but I don’t think it does anything else. I’m not a C expert, though.

3 Likes

Although the underlying layer of the computer is addressed by bytes, the actual access granularity is not based on bytes, but on word length.
If you specify an ‘u64’ alignment as 1 byte, then it is possible that it will be located at 0x1003-0x100a, and when the computer actually accesses it, it will need to access 0x1000–0x1007 and 0x1008-0x100f separately to obtain the full ‘u64’ due to the granularity of the word length. For a normally aligned ‘u64’, it can always only be located in 0x1000 or 0x1008, so access efficiency is improved. In contrast, 1-byte-aligned memory has no padding, making access less efficient and occupying less memory.
Cache rows, memory buses, etc., involve a larger granularity of memory alignment, and specifying alignment ensures that only one memory row/one burst is loaded when the cache/memory misses.
Another key application of memory alignment involves solving the problem of false-sharing of multi-threaded data. In multithreaded development, if two objects are on one cache line, even if they belong to completely different threads without any competition between them, writing to one of the objects will affect the access efficiency of the other object, because the granularity of the cache consistency protocol under multithreading is based on the cache line. Therefore, specifying that objects are cached row aligned can effectively mitigate this - two objects that are also cached row aligned cannot be in one cache row.

4 Likes

Came back just to specify, the performance penalty of unaligned memory is not that big of a deal, even on relatively old processors, recent ones have no measurable difference.

That, is not talking about niche or exotic processors, nor ‘ancient’ ones either.

It is also specifically the reads/writes, unaligned memory could lead to more cache misses, which is a bigger deal, but only important if that’s your bottleneck.

Profiling is your best friend :slight_smile:

2 Likes

I worked a lot with packed structs and never noticed speed differences when experimenting with non-packed versions inside a ‘non-packed clone testprogram’.
Often they perform even better because the amount of memory used (when having arrays or lists) is smaller.

1 Like

Are function pointers special? What’s the alignment of a function?

Function alignment on platforms with fixed-length instruction sets has a minimum requirement. For example, functions on the ARM platform must be at least 4-byte aligned.
Platforms with variable-length instruction sets, such as x64, allow functions to be 1-byte aligned, but the compiler generally still sets a default alignment. The reason is the same as for data alignment: performance.

1 Like
3 Likes

I like to think of an aligned pointer as a ‘shifted’ pointer. The bigger the alignment, the fewer pointer bits you need to address an item (e.g. with a 256-byte alignment the pointer ‘loses’ 8 bits of information, since the lowest 8 bits will always be hardwired to zero - and if you’re careful you could use those lower pointer bits to store some other information (not sure if that’s allowed in Zig though, theoretically Zig could use those pointer bits for its own needs).

Whenever you see some hard alignment rule you can almost be sure that the hardware engineers wanted to (or had to) save a few address lines.

For instance with 256-byte alignment you only need 8 address bits (instead of 16) to address 64 kilobytes and if typical addressable items are close to 256 bytes anyway, why waste address bits for a granularity that will never be needed?

  1. What are the consequences of unaligned loads? Unaligned stores?

Nothing on modern CPUs (where ‘modern’ means ca 2010, and including virtual CPUs like WASM). In the past you’d get a segfault-like panic on some CPUs when trying to access unaligned data (for instance trying to read a 16-bit word from an ‘odd’ address). After that you’d get a performance hit (because unaligned accesses were split into two accesses), today on modern CPUs unaligned accesses are fine.

However alignment still has one nice effect: an item aligned to its (2^N) size can never straddle cache lines (unless the item is bigger than the cache line). And AFAIK read/write across cache line is still a performance hit (because a new cache line may need to be pulled in).

Also important to mention: while on regular CPUs alignment has become mostly irrelevant, GPU programming is still full of alignemnt restrictions, there’s a whole alignment zoo for pretty much anything between 4- and 256-bytes (the ‘alignment zoo’ has basicaly replaced the old ‘buffer zoo’).

  1. Why is a type’s alignment only powers of 2?

…basically because of the ‘address bits’ thing. But you could also think of array indices as a sort of ‘compressed’ aligned pointers (with the alignment being the size of array items and the lower zero-bits shifted out to make more room at the top so that you can address more memory with fewer bits), and this would be an example of a non-power-of-2 “alignment”. The downside is that non-power-of-2 alignment complicate the address computation (you may need an actual multiplication instead of a bitshift - plus all code which converts the index into a pointer needs to know the item size, while an aligned pointer doesn’t need to carry this information since it’s already ‘pre-shifted’ (at the cost of wasted lower bits).

PS: in a way, “unaligned” accesses are also just 8-bit aligned accesses :wink: With today’s cache-line architecture it’s an interesting thought experiment to shift regular 64-bit pointers 3 bits to the left (the upper bits are unused anyway) to get bit-addressable pointers. This would fit nicely with Zig’s packed structs and flexible-bit-width integers. IIRC there’s an ARM extension which has such bit-pointers.

7 Likes

Daniel Lemire did some performance tests in 2012 which pretty much confirms that unaligned accesses on ‘modern’ (back in 2012) desktop CPUs don’t make a performance difference:

It makes sense when you consider that modern CPUs load and store data a whole cache line at a time, and for accesses within a cache line, alignment simply doesn’t matter, since it’s just like accessing a bit-range from a 256- or 512-bits wide on-chip “register”.

Alignment does still matter for read/write accross cache lines though, simply because an item that’s aligned to its own (2^N) size can never straddle cache lines. This is a more subtle side effect than direct unaligned reads/writes.

10 Likes

Attaching an alignment to function pointers is pure pedantry, if you ask me. Positions of functions in memory is determined by the compiler. There’s no danger of programmers accidentally creating misaligned function pointers. Even if you’re generating functions at runtime, you’d be protected on the instruction encoding side. You’d use a [*]u32 to write the function (if the platform is ARM or RISC-V), not a function pointer.

Function pointers should have the same alignment as *anyopaque because for all intents and purposes functions are opaque.

2 Likes

Read/write across page boundaries too, where in the worst case a single instruction can cause two page faults.

But in practice, I think you’d only see that severely impacting performance in memory constrained environments, where much of the program is regularly swapped out.

3 Likes

Worth noting too, that function pointers aren’t regular pointers in WebAssembly. They’re function table indices. If you do a @intFromPtr(), you get something like 5 or 41.

Function pointers themselves ought to be treated as opaque.

4 Likes

I think it’s relatively unlikely the compiler will use those bits but I wouldn’t be comfortable just rawdogging tagged pointers either way.

Something a little more type safe and less error prone is preferable IMO – and Zig gives us the tools required for bit precise data structures.

I imagine something like this is a lot more ergonomic to use than masking pointers manually :slight_smile:

fn TaggedPointer(comptime T: type, comptime D: type, comptime alignment: usize) type {
    const unused_bits = @ctz(alignment);
    if(@bitSizeOf(D) > unused_bits) @compileError("data does not fit within aligned pointer");

    return packed struct(usize) {
        data: D,
        _: std.meta.Int(.unsigned, unused_bits - @bitSizeOf(D)) = undefined,
        addr: std.meta.Int(.unsigned, @bitSizeOf(*anyopaque) - unused_bits),
        
        pub fn init(p: *const align(alignment) T, data: D) @This() {
            return .{
            	.data = data,
                .addr = @truncate(@intFromPtr(addr) >> unused_bits),
            };
        }
        
        pub fn asTagged(p: *const align(alignment) T) @This() {
            return @bitCast(@intFromPtr(p));
        }

        pub fn ptr(p: @This()) *const align(alignment) T {
            // for architectures with Top-Byte-Ignore a @bitCast(p) would be enough if @sizeOf(D) <= 8
            return @ptrFromInt(@as(usize, p.addr) << unused_bits);
        }

        pub fn ptrMut(p: @This()) *align(alignment) T {
            return @ptrFromInt(@as(usize, p.addr) << unused_bits);
        }
    };
}
2 Likes

I’ve got a related question. Consider the case when we implement an interface in Zig. The interface stores the field ptr: *anyopaque. That field is passed to the method implementation, where it’s converted to the pointer of the concrete type. When we do the conversion, we involve @alingnCast in addition to @ptrCast. Here is an typical example:

const File = struct {
    id: i32,

    fn write(ptr: *anyopaque, data: []const u8) !void {
        // This re-establishs the type: *anyopaque -> *File
        const self: *File = @ptrCast(@alignCast(ptr));
        // the implementation of write
        unreachable;
    }

    // Here we implement the interface Writer
    fn writer(self: *File) Writer {
        return Writer{
            .ptr = self,
            .vtable = .{
                .write = File.write,
            },
        };
    }
};

We know to things:

  1. As it’s metioned above, the alignment is associated with the type.
  2. The ptr has the same alignment, because it’s the pointer of the same type.

Given that, why do we do the @alignCast?

When you say the performance penalty has no measurable difference, are you including multiplying the operation millions / billions of times? I haven’t done any benchmarking on this, but I’m curious if you have (or anyone else). If there is an immeasurably small difference, at what order of magnitude does it start to become an interestingly large difference?

Because *anyopaque has the type alignment of 1, the actual pointer alignment will be of the underlying type, but that’s not present in the type information anymore, because you erased the type information.

Hence, you use @alignCast to assert that it is the correct alignment, this is required by the compiler.

@alignCast only asserts the alignment it doesn’t change it, the name is misleading.
Also, it is unnecessary when the input and result have the same alignment in type information, you won’t get a compile error in this situation.

4 Likes

ofc if there is a difference you will see it when you scale it up enough.

I was referring to this blog post which @floooh conveniently linked right after I read it lol, it also has a follow-up that is linked in that blog.

At least with the test load in that and it’s follow up, the difference is mostly not worth considering.

Depending on the cpu, operations, and load, it ranges from extremely small to unmeasurable.

There will be much bigger optimisations you can do in your programs.

3 Likes

I guessed that the type alignment of *anyopaque is 1 too. However, it’s 8. The alignment of anyopaque is 1, but we inspect the alignment of the pointer itself. Let me bring some more code

test "test alignment" {
    const Rectangle = struct {
        x: u32,
        y: u32,
        w: u32,
        h: u32,
    };

    try std.testing.expectEqual(@alignOf(*anyopaque), @alignOf(*Rectangle));
}

The test fails only if the alignments of the types are used rather than their pointers.

That’s where my confusion grows from.