Optimizing atomic modifications of packed structs

Continuing the discussion from What's everybody working on? (July Edition), about embedded programming. (The context from the other thread is required for this to make sense)

My codebase here.

I was just talking about this with my colleague, who noticed extra stores to the stack that seemed to be ultimately unused with my mmio code. Folding into a single inline for like @PinieP did in their linked reply, instead of two as I had seems to get rid of those stores! Thanks for that tip!

One fear my colleague also expressed was that he thinks that the compiler might decide to put more stuff in between the ldrex and strex instructions, other than just the actual modifications to the register values, because they are in two separate asm volatile expressions, with the actual modification written in Zig put in between. He says that for that reason, the Linux kernel codebase makes the whole register modification a single asm volatile expression so that it doesn’t get broken up. Is that fear founded? I didn’t see it happen, but then, my codebase is really small right now.

Edit: Clarification

We would need to look at the code to see if this particular instance is correct, but, in general, yes, your colleague is correct. Inline asm is a single isolated instruction, which specifies its inputs and outputs. Outside that block, both before and after, zig is allowed to use its registers however it sees fit. So if you have multiple assembly blocks, in between those blocks you cannot assume anything about the state of the machine.