Testing asm volatile generates "incorrect" code

consider the following code, which inspects interrupt status before/after actually disabling all CPU interrupts:

export var mask: u32 = 0;
export var flag: bool = false;

export fn intrTest() void {
    mask = get_PRIMASK();  // correctly retrieves 'primask' register
    flag = isEnabled(); // incorrectly returns true, regardless of 'primask'
    asm volatile ("cpsid i" ::: "memory"); // disable interrupts
    mask = get_PRIMASK(); // ditto
    flag = isEnabled(); // ditto
}

pub fn get_PRIMASK() u32 {
    const m: u32 = 0;
    asm volatile (
        \\mrs %[m], primask        
        :
        : [m] "r" (m),
        : "memory"
    );
    return m;
}

pub fn isEnabled() bool {
    return (get_PRIMASK() == 0);
}

[[ i’m targeting a cortex M0+… here’s a godbolt link… ]]

the results of compilation under zig 0.13.0 are as follows:

intrTest:
        movs    r0, #0
        mrs     r0, primask
        ldr     r1, .LCPI0_0
        str     r0, [r1, #4]
        mrs     r0, primask
        movs    r2, #1
        strb    r2, [r1]
        cpsid i
        mrs     r0, primask
        str     r0, [r1, #4]
        mrs     r0, primask
        strb    r2, [r1]
        bx      lr
.LCPI0_0:
        .long   .L_MergedGlobals

i’m suspicious of the movs r2, #1 instruction, as it effectively “asserts” that the primask register equals 0 – isEnabled in my source code…

continuing further, the strb r2, [r1] instruction that occurs after interrupts are disabled with cpsid i simply stores the same value of 1…

while there is a mrs r0, primask instruction at the beginning of my isEnabled function, isn’t it strange that there’s no subsequent code that compares this value to 0???

while i originally though the adjacent calls to get_PRIMASK were giving the compiler problems here, a smaller example that only uses isEnabled exhibits the same problem… for whatever reason, the results of get_PRIMASK inside isEnabled are simply being swallowed…

i just recently discovered this problem in much larger codebase – where the generated code had the same anomaly… the results are easily inspected using a debugger, logic analyzer, etc…

needless to say, having isEnabled always return true is leading to all sorts race conditions in my embedded apps :crazy_face:

Consider adding a memory fence after saving the flag.
e.g.

    flag = isEnabled(); // incorrectly returns true, regardless of 'primask'
    asm volatile ("dmb");

EDIT:
You can also use the zig builtin:

    flag = isEnabled(); // incorrectly returns true, regardless of 'primask'  
    @fence(.seq_cst);

The following is heavier and safer:

    @atomicStore(bool, &flag, isEnabled(), .seq_cst);

adding the fence has no effect… the same movs r2, #1 instruction is present, instead of a test on primask…

[[ my understanding is that the cortex-m0+ architecture, while supporting the dmb instruction, treats it as a nop… fences are required on some the more advanced architectures with more complex instruction pipelines ]]

try:

pub fn get_PRIMASK() u32 {
    return asm volatile (
        \\mrs r0, primask
        : [ret] "={r0}" (-> u32)
    );
}

EDIT:
or

pub fn get_PRIMASK() u32 {
    return asm volatile (
        \\mrs %[ret], primask
        : [ret] "=r" (-> u32)
    );
}
1 Like

i believe that works!!!

to increase my understanding, explain what’s different in your version of get_PRIMASK from my original…

Before entering the optimizer the code for:

pub fn get_PRIMASK() u32 {
    const m: u32 = 0;
    asm volatile (
        \\mrs %[m], primask        
        :
        : [m] "r" (m),
        : "memory"
    );
    return m;
}

is:

get_PRIMASK:
        sub     sp, #4
        movs    r0, #0
        str     r0, [sp]
        mrs     r0, primask
        add     sp, #4
        bx      lr

for:

pub fn get_PRIMASK() u32 {
    return asm volatile (
        \\mrs %[ret], primask
        : [ret] "=r" (-> u32)
    );
}

is

get_PRIMASK:
        mrs     r0, primask
        bx      lr

Normally the optimizer must erase the stack manipulation in the first case and inline both functions as:

        mrs     r0, primask

I believe that when the compiler optimizes the first/more complex case, it triggers a bug, because the resulting code makes no sense.

Providing some clarification here because there is very little documentation around inline ASM, and I would like to save anyone from tearing their hair out in the future :melting_face:

First, the optimizer will never optimize your inline assembly. As a matter of fact, it does not even analyze it. Hence the need to tell the compiler what registers/memory you are clobbering. If assembly were analyzed, the compiler would know exactly what is clobbered already.

Second, I’ll explain why the first example doesn’t work. What you’re asking the compiler to do is enter your inline asm block with the value (0) into an arbitrary register (hint: this allows the compiler to optimize the boundaries of your inline asm, but not the asm itself, because the compiler is allowed to choose the best register)! The inline asm block then reads primask into a register, but does not change the value of m. For that, you need a pointer. I think this is what you were going for (although dimdin’s suggestion is optimal):

asm volatile(
    \\ mrs r0, primask
    \\ str r0, %[m]
    :
    : [m] "p" (&m)
    : "r0", "memory"
)

The thread solution works because you are allocating an output register. This tells the compiler: at the end of my inline asm expression, the result of the assembly block should be in this register. So you read primask into the register, your asm block ends, and the value is correctly returned from the block.

Hope this helps!

6 Likes

I don’t think this does what you think it does. m is const, so you can’t write to it, even in assembly. Also, you’re not passing the address of m to the assembly, you’re passing the value of m, which is 0.
This function is equivalent to this:

/*Assembly*/ mrs (register containing 0), primask
/*Zig*/return 0;

Since get_PRIMASK always returns 0, is_enabled can be optimized to simply return true;. Hence why there is no comparison to 0.

I think what you meant was:

pub fn get_PRIMASK() u32 {
    var m: u32 = 0;
    asm volatile (
        \\mrs %[m], primask        
        :
        : [m] "r" (&m),
        : "memory"
    );
    return m;
}

But @dimdin code is better anyways.

2 Likes

I thought I might chirp in here because I have a few head wounds in this area. In my particular case I needed access to the v7m load and store instructions that perform a memory access in an unprotected manner, i.e. ldr?t and str?t. I started by cribbing the clang header file from CMSIS version 6. For ldrbt that looks like (I advise welding goggles when reading CMSIS code):

__STATIC_FORCEINLINE uint8_t __LDRBT(volatile uint8_t *ptr)
{
  uint32_t result;

  __ASM volatile ("ldrbt %0, %1" : "=r" (result) : "Q" (*ptr) );
  return ((uint8_t)result);    /* Add explicit type cast here */
}

After many trials, I found the following inserted the correct code:

pub inline fn ldrbt(
    ptr: *const mmio.ByteRegister,
) mmio.ByteRegister {
    return asm volatile ("ldrbt %[ret], [%[ptr]]"
        : [ret] "=r" (-> mmio.ByteRegister),
        : [ptr] "r" (ptr),
    );
}

The lesson I learned was to drive into my head that the three sections after the assembly language template are for outputs, inputs, and clobbers, in that order. In this case, the output is the byte retrieved via the input pointer. Just have to keep repeating to myself what I’m trying to accomplish. Many fewer bruises now.