Did you know @Vector(2, u64) is 16-byte aligned? I didn’t until today! In fact I thought types would always have at most 8-byte alignment on a 64-bit architecture. I’m working on a (generic) datastructure and internally I’m doing some wonky (but safe) things, among others pointer-casting. My datastructure works with primitive and aggregate types, until I tried it out with vector-types which gave me the error error: cast increases pointer alignment.
The part that errors is essentially something like this:
(Runnable example)
Works just fine because, as expected, u128 has 8-byte alignment (of course this will give nonsensical numeric values and maybe more unexpected depending on your endianness, but that’s besides the point).
My questions are:
Is there some acrobatics I can do with @alignCast, @ptrCast and friends to make the first example work?
Would it be unsafe to do so? Can I run into actual trouble because of the 16-byte alignment thing?
Are there other types that have alignment greater-than 8-bytes on a 64-bit architecture?
We override the pointer alignment from the vector’s “natural alignment” to our expected 8. (To be more portable, you can replace 8 with @alignOf(u64).)
Question 2: the alignment thing isn’t a problem when you use this strategy. However, what is a problem is the fact that we might change the in-memory layout of vectors down the line. The layout of vectors is a bit up in the air currently - you shouldn’t necessarily assume that reinterpreting memory between vectors and arrays is safe! It’s likely that this specific case will continue to work (but there’s a fair chance this won’t be guaranteed by the language spec, so will effectively be UB), but exotic integers are trickier and will almost certainly work differently to how they do today…
Question 3: off the top of my head, vectors are the only type I know whose natural alignment exceeds the target’s word size. However, user-defined structs and unions are able to have the alignment of any field specified, by writing e.g. x: u32 align(64), and the type itself will inherit the alignment of its most-aligned field. By specifying alignment yourself, you can make it any power of two up to and including 1 << 28 bytes (256 MiB).
That does it, thanks. I missed the align keyword entirely. For anyone else reading:
Values which have the same representation at runtime can be cast to increase the strictness of the qualifiers, no matter how nested the qualifiers are:
I don’t know of a short syntax to do this for constants (because apparently we can’t use align on arrays), but you could over align a pointer that points to valid memory and then use that. Here is an example that uses alignedAlloc to get such a pointer:
That’s interesting, because mlugg’s solution is about down-aligning the *const @Vector(2, u64) to a *const align(8) @Vector(2, u64) but yours is in the case where we must “up-align” the [2]u64 to [2]align(16)u64 (which isn’t valid so we need to jump through hoops).
Oh and if anyone is curious, I ended up doing this with my datastructure
Which feels very sketchy, but allows me now to reinterpret any pointer onto items without worrying about @alignOf(T) being greater-than 8 (or 4 on a 32-bit architecture).
(The whole reason I want to do something like that is because the datastructure allocates two seperate slices for itself, however for cache-locality i thought it’d be optimal to allocate the memory for both of them in one allocation so that both slices are close to eachother in memory. What made this tricky is that the slice types differ. I’m in the process of benchmarking whether this matters ).
I think with vectors up alignment is more likely to give better performance for big data sets (wide vectors), because if things are properly aligned and the compiler is able to make use of simd operations then more data can be processed with fewer instructions. I would guess that if you down align the vectors that then either for the beginning simd isn’t used or it does some trickery that I wouldn’t know of because I am not deeply familiar with simd (but that probably has some overhead).
So if you end up benchmarking things make sure to benchmark lots of different scenarios and maybe even look at and try to understand the generated code, if you really want to know. But this is just my vague advice, I eventually want to get practical experience playing around with simd things, but so far I always had other things grabbing my attention.
To create good benchmarks you probably would need to do a deep dive on all that performance counter stuff, another area I haven’t explored yet. Otherwise it is probably difficult to actually get meaningful measurements sometimes, or at least it is probably easier with those to get clearer measurements. From what I have heard aligning things to cache lines may be more important and minimizing the amount of cache lines that need to be touched. I think in the ideal case you would be able to tell whether your cache lines are fully packed with useful data and when, where, how many cache misses you have, etc.
I wonder whether somebody already has created some utility library for zig that can help access/measure/evaluate that kind of information.
SSE instructions that reads directly from memory require 128-bit addresses. If the compiler can’t be sure that the vector is correctly aligned, then it has to encode the operation as an unaligned load followed by an operation on register. So the cost is an extra instruction plus a register. The latter probably matters more, I suspect.