I have some binary data (bytecode) in a []const u8 and I need to cast it to a [*c]const u32 for use in vulkan. I have tried using using @ptrCast(@alignCast(code.ptr)), but this gives a runtime panic for ‘incorrect alignment’. I don’t understand why or how to fix it.
Note: code.len is guaranteed to be a multiple of 4.
This is half of the solution, however this only ensures that the data actually has the correct alignment, if it doesn’t have the right alignment then you will still get a runtime panic in safe modes and undefined behavior in unsafe modes.
@alignCast checks the alignment it doesn’t move the data to create a different alignment.
The easiest thing to do would be to use the []align(@alignOf(u32)) u8 type for the binary data to begin with, so it already has the right alignment.
Otherwise you have a choice, either check if it has correct alignment and if not get a correctly aligned buffer and copy the data over, or you could always copy the data (even if its already aligned). Basically your choice if you want to make it conditional or not.
Alignment is about where the thing starts in memory, align(1) is arbitrary alignment, with align(2) the least significant bit needs to be 0, with align(4) the 2 least significant bits need to be 0 and so on. You can basically think of it as needing to fit into different sized slots that are aligned to addresses that can be divided by powers of two. When the alignment is wrong your slot isn’t aligned and instead is partially in one slot and partially in another.
If the alignment isn’t correct you need to look at the address and the code that placed your data and see how you can change that code so that your alignment requirements are satisfied.
u32 have an alignment of 4, that is, they should always be located at an address that is a multiple of 4. u8 have an aligment of 1, so they can be located at any address.
Just having your bytecode be a multiple of 4 is not sufficient to guarantee it will be at an address that is a multiple of 4.
When loading your shader bytecode, if you’re doing it at runtime, allocate some space with the right alignment, and load it directly there. If you’re doing it at comptime through @embedFile, you don’t control the address where Zig will embed your file, but you can simply create some comptime space with the proper alignment. Here’s how I’m doing it in my codebase:
So assuming you’re talking about the bits of the address, align(4) needs the addresses to be divisible by 4. Ok, why? Is it because cpus can’t load 4 bytes starting from a non align(4) address, so it would need 2 loads or something like that?
Maybe this deserves an explain topic, or a doc topic with some links to learning resources on memory alignment. I had never had to deal with alignment before using Zig, and I think others may be in the same spot.
Many cpus are able to operate on data that is unaligned (some may only deal with aligned data), but if they can do it the operations used to do it are still different (whether the instructions themselves are different or just the resulting micro ops (what the cpu internally uses to execute the instructions)) (I am not that deep into cpu knowledge to tell you if the instruction is always different for unaligned access) and with data being described as having a natural alignment, I think that basically comes from how the best performing operations expect their data to be aligned.
If someone feels like they can do a good job at describing the details accurately, they certainly could create a explain or docs topic, however I am not sure if I can do a really good job at it.
I think mostly I would tend to point you towards the things Casey Muratori has done, for good explanations about hardware and cpus and getting a better understanding for how those operate, he is certainly more practiced, teaching and explaining these things.
The Handmade Hero guide is really good for finding snippets / explainers in the various episodes, you can use it search for different topics and get related segments in the episodes.
Also take a look at
There is also Table of Contents - by Casey Muratori - Computer, Enhance!
I haven’t looked into that yet, it is a paid course, but it is basically a more focused version of the bits and pieces of explanations you can find peppered throughout Handmade Hero and probably also more than that.
I think explaining alignment makes more sense in the context of how the hardware works and that is a big topic, I have learned about it through various videos and talks and reading about things here and there, but it is difficult to pinpoint where and when exactly.
I still think a topic about alignment specifically could make sense (and it also could link to other relevant resources / learning material), but I am currently too overloaded with other things, to think about it in more detail right now, maybe later.
Seems like it would be fairly harmless for the compiler to guarantee at least usize alignment for embedded files, making the return type *align(@sizeOf(usize)) const [N:0]u8.
An ideal solution would allow setting an alignment of choice, but defaulting to word alignment would solve for a lot of use cases at basically zero cost. A 100,000 branch quota is a pretty big workaround!
Sure, and it’s great that it works, I’m always seeing new uses of comptime which impress me. But it has the vibe of using a howitzer to swat a fly. I’d guess that after “this file is actually bytes”, “this file is words” (or half-words in your case) is the most common thing to need @embedFile to do, and aligning the pointer to a word is basically free.
Being able to do a comptime shuffle means it matters less for sure.
Great find! I tried some approaches with type- and align-casting the pointer, and none of that worked (I didn’t expect it to but the experiment was worth running), dereferencing the pointer at comptime and aligning the resulting type apparently does work. Neat.
It even makes sense semantically, at runtime that would create a copy of the data, and there’s nothing preventing such a copy from being aligned however the code directs it to be. But since it’s happening at comptime, when addresses aren’t quite real yet, it should be easy to optimize that expression to be a pure operation on the type (I don’t know if this is happening, just that it could be).