Use of align in Zig

I was building a simple directory watcher in zig . I have taken some reference from Google gemini . Here is the code

const std = @import("std");
const linux = std.os.linux;

pub fn main() !void {
    const fd_usize = linux.inotify_init1(0);
    const fd: i32 = @intCast(fd_usize);
    defer std.posix.close(fd);

    // var buf: [4096]u8 align(@alignOf(linux.inotify_event)) = undefined;
    var buf: [4096]u8 = undefined;

    if (fd < 0) {
        std.debug.print("inotify init", .{});
        std.process.exit(1);
    }
    _ = linux.inotify_add_watch(fd, ".", linux.IN.MODIFY | linux.IN.DELETE | linux.IN.CREATE | linux.IN.MOVED_TO);
    std.log.info("Watching current dir for changes", .{});

    while (true) {
        const length = try std.posix.read(fd, &buf);

        var i: usize = 0;
        while (i < length) {
            const event = @as(*linux.inotify_event, @ptrCast(@alignCast(&buf[i])));

            // Check if the event has a filename associated with it
            if (event.len > 0) {
                const name = event.getName() orelse "genaric";

                if (event.mask & linux.IN.CREATE != 0) {
                    std.debug.print("File Created: {s}\n", .{name});
                } else if (event.mask & linux.IN.MODIFY != 0) {
                    std.debug.print("File Modified: {s}\n", .{name});
                } else if (event.mask & linux.IN.DELETE != 0) {
                    std.debug.print("File Deleted: {s}\n", .{name});
                }
            }
            // Move to the next event in the buffer
            i = i + @sizeOf(linux.inotify_event) + event.len;
        }
    }
}

Here when declaring a buffer for the file descriptor Gemini uses the

// var buf: [4096]u8 align(@alignOf(linux.inotify_event)) = undefined;
    var buf: [4096]u8 = undefined;

upper line . The program compiles with both version . What is the use of this align ?

align syntax in zig

In the case of const/var name: T align(n) it is specifying that the variable will be stored at n alignment in memory.

You can do the same thing with fields, name: T align(n) specifies that the field needs to be at n alignment in memory, this can affect the total alignment of the type it is in, depending on the alignment of other fields.

When taking a pointer to a field/variable, the pointer will have the alignment specified, see next paragraph for what that means.

There is similar application with pointers, *alignt(n) T, specifies the address being pointed too is at alignment n, this alignment is part of the type of the pointer. This works for any kind of pointer with any child type. If you have a field/variable of a pointer, you can specify both, e.g. name: *align(x) T align(y).

In all cases, if you don’t specify an explicit alignment, it will instead implicitly use the alignment of the type of the variable/field or the child type of the pointer.

what is alignment in the first place

data is only allowed to be in a memory address that is a multiple of its alignment. An alignment of 1 allows any address, as every address is a multiple of 1.
an alignment of 0 is disallowed for most types as only 0 is a multiple of 0 which is often the null address. Only 0 sized types can have an alignment of 0, as they have no data to put in memory.

Alignment increases in log2, so 1 2 4 8 etc are valid alignments, but 3 is not.

Alignment is a result of how CPUs interact with memory, modern processors can deal with improperly aligned data with unmeasurable performance differences (in simple cases). But older or niche architectures will have performance impacts or even trigger a fault if data is improperly aligned.

Therefore, alignment is important for the compiler to generate correct and performant code for older and niche processors.

All types have an alignment, for composite types such as structs or unions, the alignment will be the largest of their fields. Their size may also increase to be a multiple of their alignment, this ensures contiguous data, such as in arrays or slices, that each element will be aligned properly.

With extern structs, or languages that don’t optimise field order like c, padding will be inserted so that the next field(s) will be aligned.

11 Likes

I’ve been curious about the impacts of alignment and recently came across this article: Dot product on misaligned data – Daniel Lemire's blog

I can’t speak to the quality of the experiment, but it seems to show that unaligned accesses aren’t the performance hit that I see a lot of people claim that they are. Maybe this is still the case for microcontrollers or maybe it’s just ā€œfolk wisdomā€ that’s been passed down while hardware has gotten better?

2 Likes

Hah, I linked that in a comment on lobste.rs a few days ago and this the third of fourth time I’ve seen in referenced since. Very Baader–Meinhof.

I’ve been meaning to try put together some benchmarks. I suspect that, for sequential access, and sufficiently large datasets, then even if unaligned access is slightly slower in certain microbenchmarks, the fact that your data will take up less cache lines overall (because its packed more tightly, and prefetching will negate much of the performance drains of crossing a cache line), and will cause fewer page faults overall (especially with something like madvise MADV_SEQUENTIAL), will cause it to be faster than aligned data.

But for the data to be useful I’d have to spin up more than a few cloud instances with dedicated vCPUs from different generations and manufacturers, and I haven’t been bothered writing a script to automate it yet.

1 Like

Another use of alignment is to isolate thread-contended variables on their own cache line. Data structures intended for multithreaded use often rely on atomic reads and writes for correctness. Between-core synchronization of memory acts on cache lines, there’s no smaller unit which can be synchronized.

Atomic operations are generally on one machine word or less, but if other operations happen on that same cache line, this can get expensive, because again, the CPU can move a cache line, but nothing smaller. So it’s important to isolate these contended words on a cache line, with padding and alignment specific to the CPU.

Which you can’t do by just requesting a cache line’s worth of memory, because 64 or 128 bytes (whatever) doesn’t have to be cache aligned. It could be over two cache lines, which would permit adjacent data to affect synchronization (this is called false sharing). So to guarantee isolation, you ask for a cache line’s worth of memory, with cache alignment. The word can go anywhere in that.

It’s also true that in the ordinary case, ā€˜natural alignment’ doesn’t matter as much as it used to, at least in most cases. It can be useful for other reasons, like pointer tagging, but for most ā€˜modern’ CPUs it’s not a big deal to store an offset word any longer.

Fortunately, Zig lets you under-align pointers as well, if that’s what you need. Just have to ask for it.

5 Likes