Inline assembly blocks not executed sequentially?

fixer · October 7, 2024, 2:35am

I’m not sure if this is directly related to zig or llvm but I have strange issue with inline assembly. I have an interrupter handler and I want to use inline assembly in it. I have this code:

export fn PendSV_Handler() void {
    asm volatile ("" ++
            "   CPSID       I                                       \n" ++
            "   LDR     R2,     [%[current_task]]                   \n" ++
            "   CMP.W   R2,     #0                                  \n" ++ // if current_task != null
            "   BEQ.N   SpEqlNextSp                                 \n" ++
            "   PUSH    {R4-R11}                                    \n" ++ // push registers r4-r11 on the stack
            "   STR     SP,     [R2, %[offset]]                     \n" ++ // save the current stack pointer in current_task
            "SpEqlNextSp:                                           \n" ++
            "   LDR     SP, [%[next_task], %[offset]]               \n" ++ // Set stack pointer to next_task stack pointer
            "   STR     %[next_task],   [%[current_task], #0x00]    \n" ++ // Set current_task to next_task
            "   POP     {r4-r11}                                    \n" ++ // pop registers r4-r11
            "   CPSIE   I                                           \n" //    enable interrupts
        :
        : [current_task] "l" (&OsTask.TaskControl.current_task),
          [next_task] "l" (OsTask.TaskControl.next_task),
          [offset] "l" (OsCore.g_stack_offset),
        : "R2"
    );
}

It turns into this disassembly.

When the function is entered the first thing that happens is the input parameters are moved into registers and then ‘CPSID I’ is executed. There are 6 instructions executed before ‘CPSID I’. This makes sense to me. However, ‘CPSID I’ needs to be executed immediately when the function is entered. So I changed the code so that ‘CPSID I’ is in its own inline assembly thinking that this would cause it to execute prior to the input parameters moving into registers.

 export fn PendSV_Handler() void {
    asm volatile ("" ++
            "   CPSID       I                                       \n"); //  disable interrupts
    asm volatile ("" ++
            "   LDR     R2,     [%[current_task]]                   \n" ++
            "   CMP.W   R2,     #0                                  \n" ++ // if current_task != null
            "   BEQ.N   SpEqlNextSp                                 \n" ++
            "   PUSH    {R4-R11}                                    \n" ++ // push registers r4-r11 on the stack
            "   STR     SP,     [R2, %[offset]]                     \n" ++ // save the current stack pointer in current_task
            "SpEqlNextSp:                                           \n" ++
            "   LDR     SP, [%[next_task], %[offset]]               \n" ++ // Set stack pointer to next_task stack pointer
            "   STR     %[next_task],   [%[current_task], #0x00]    \n" ++ // Set current_task to next_task
            "   POP     {r4-r11}                                    \n" ++ // pop registers r4-r11
            "   CPSIE   I                                           \n" //    enable interrupts
        :
        : [current_task] "l" (&OsTask.TaskControl.current_task),
          [next_task] "l" (OsTask.TaskControl.next_task),
          [offset] "l" (OsCore.g_stack_offset),
        : "R2"
    );
}

However, the disassembly doesn’t have ‘CPSID I’ execute first. The input parameters are still partially moved into registers first. Now there are 3 instructions that are executed before ‘CPSID I’.

I thought that the two separate inline assembly blocks would execute sequentially, but that isn’t the case. Is this intended functionality? Is there a way for me force ‘CPSID I’ to execute immediately when the function is entered?

permutationlock · October 7, 2024, 5:55am

I haven’t done any ARM inline assembly in Zig, but maybe take a look at the naked calling convention, e.g. callconv(.Naked).

LucasSantos91 · October 7, 2024, 12:11pm

Technically, they are being executed sequentially, the second one is coming after the first one. They are not being executed back to back.
Separating your inline assembly did nothing for you here. The instructions you’re seeing before the block are the function prologue and the inline assembly prologue. In order to meet the ABI requirements, the function needs to save some stuff which will be restored later, hence the function prologue and epilogue. The inline assembly is a black box to the compiler, in order for the compiler to make sense of it, you specify the inputs, outputs and clobber and, based on that, the compiler will insert a prologue and an epilogue.
In order to remove the function prologue and epilogue, you need to use the naked calling convention. This is usually sufficient to remove the inline assembly prologue and epilogue as well.
FYI, separating inline assembly blocks is usually a bad ideia. The compiler may reorder your blocks. In order to prevent that, you need to mark the first block as volatile, or the second block needs to depend on an output of the first block. But each inline assembly will still have its prologue and epilogue.

fixer · October 7, 2024, 5:14pm

I don’t think this is quite right. PendSV_Handler is an interrupt handler so is it never called from software. There isn’t a function prologue. All of the instructions before CPSID I are the assembly input parameters (&OsTask.TaskControl.current_task, OsTask.TaskControl.next_task, and OsCore.g_stack_offset) being moved into registers. If it was a function prologue the disassembly would have instructions pushing registers onto the stack.

I’m not sure what the difference is that you’re making between sequentially and back to back. The two blocks aren’t executed sequentially. Three instructions from the second block are executed. Then the single instruction from the first block is executed and then rest of the instructions are from the second block

It did do something. Instead of the 6 instructions for moving the inputs into registers executing before the CPSID I only 3 of those instructions are executed before CPSID I.

Aren’t they already marked as volatile? “asm volatile( xxx ) ;” ? Does something more need to be used here to make them volatile? Why would the compiler reorder the inline assembly blocks if inline assembly is a black box to the compiler?

Thank you both for suggesting this. I didn’t know this calling convention existed. I did try it out, but it did not solve my issue. The only change that happened when I changed PendSV_Handler to naked was that the last instruction changed from BX LR to UDF #254. I think this makes sense as PendSV_Handler is an interrupt handler so it doesn’t have a prologue.

andy_mango · October 7, 2024, 6:17pm

On ARM v7-M (and v8-M), exception handlers may be written as exported functions in Zig with no special considerations. If you wish more control, then using .Naked calling conventions ends up needing to take total control in assembly. The following is an SVC handler that uses .Naked and dispatches system and device services through a jump table and returns a value to the caller. It might give you some hints, although SVC is a bit different in that it has an actual call site in a program.

fn svcHandler() callconv(.Naked) SvcResult {
    asm volatile (
        \\tst    lr,#4
        \\ite    eq
        \\mrseq  ip,msp
        \\mrsne  ip,psp
        \\push   {ip}
        \\ldr    r0,[ip, #24]
        \\ldrb   r0,[r0, #-2]    @ SVC immed operand
        \\cmp    r0,#(2f-1f)/4   @ number of realms
        \\ittt   cs
        \\movcs  r0,#1           @ SvcResult.unknown_realm
        \\strcs  r0,[ip]
        \\bxcs   lr
        \\adr    r1,1f
        \\ldr    r1,[r1, r0, lsl #2]
        \\push   {r1}
        \\ldm    ip,{r0-r3}  @ regs from exc frame
        \\pop    {ip}        @ handler pointer
        \\push   {lr}
        \\blx    ip          @ handler call
        \\pop    {lr}
        \\pop    {ip}
        \\str    r0,[ip]     @ return value
        \\bx     lr
        \\.align 2
        \\1:
        \\.word  sysSvcHandler
        \\.word  devSvcHandler
        \\2:
    );
}
comptime {
    @export(&svcHandler, .{ .name = "svcHandler", .linkage = .strong });
}
````

fixer · October 7, 2024, 7:11pm

I think my question is getting lost. I am writing this inline assembly in an interrupt handler, but my question isn’t about interrupt handlers. I don’t think the interrupt handler is playing a part in my issue.

My issue is: I have two inline assembly blocks one right after the other; I would expect all of the assembly instructions (in this case it’s just one instruction) in the first assembly block to execute before any of the assembly instructions in the second block are executed. This is not happening. Instead, 3 instructions from the second assembly block are executed first. Then the instruction from the first assembly block is executed. Then the remaining instructions from the second block are executed.

My question is: Is there a way to force the compiler to execute the instruction from the first assembly block before any instructions from the second are executed?

This is striking me as a possible bug. If the inline assembly blocks are black boxes to the compiler then I would expect the compiler to execute them exactly as written and not do any rearranging.

andy_mango · October 7, 2024, 8:15pm

To my knowledge and experience, if you want to guarantee instruction sequencing in a function, you need to use the .Naked calling convention and supply the entire function body as an inline assembly block. Hence the example I posted. What you are seeing is the compiler fetching the inputs to your second inline block. The order of that is non-determinant as the compiler has no idea that you think it is critical to execute cpsid i as the very first instruction but must make sure that the inputs are loaded properly before entering the second inline block. I don’t consider this surprising behavior. It’s the way gcc works and I suspect clang does also. I also suspect that if you wrote this as an exported Zig function it would work properly. If you are creating a critical section by disabling interrupts I don’t think a few instructions of prologue and epilogue would matter.

fixer · October 7, 2024, 8:47pm

Aren’t @export and the export key word equivalent? I thought that they were. I’ll give it a try with @export and see what happens.

andy_mango · October 7, 2024, 9:44pm

The export keyword implicitly gives the function a .C calling convention. If you use .Naked, the export needs to be done using @export since it gives you more options for the edge cases.

fixer · October 7, 2024, 11:15pm

The export key word and @export create the same disassembly when used with naked. The only thing naked does is change the last instruction from BX LR to UDF #254.

I’m going to export the variables I need to use in the assembly code and reference them directly instead of using the inline assembly input parameters.

LucasSantos91 · October 8, 2024, 3:57am

You have to tell Zig that, hence why you need naked. Otherwise, it will add a prologue and epilogue.

That’s not necessary, the compiler can save the data however it wants, including in registers.

They are.

No. That’s a prolog, either the function’s or the assembly block’s.

Exactly as you ordered it in code, just separated by prologues and epilogues.

I meant it didn’t help you acomplish what you wanted, not that it didn’t change anything.

I was just describing what is needed to achieve ordering, not commenting on your code. As I said before, separating the block is not what you want anyways.