Confusing ReleaseSmall optimization of `asm volatile` code

I noticed while messing around with a custom _start function and implementing syscall wrappers on macOS (aarch64-macos is the target) that -O ReleaseSmall causes some odd optimizations that seem incorrect.

Here’s a minimal example:

pub export fn main() u8 {
    _ = myWrite(.out, "hi", 2);
    return 0;
}

pub const Fd = extern struct {
    num: i32,

    pub const out: Fd = .{ .num = 1 };
};

const write_sys = 0x200_0004;
pub export fn myWrite(fd: Fd, ptr: [*]const u8, n: usize) usize {
    return syscall3(@intCast(fd.num), @intFromPtr(ptr), n, write_sys);
}

pub inline fn syscall3(arg0: usize, arg1: usize, arg2: usize, sys: u64) usize {
    return asm volatile ("svc #0x80"
        : [ret] "={x0}" (-> usize),
        : [sys] "{x16}" (sys),
          [arg0] "{x0}" (arg0),
          [arg1] "{x1}" (arg1),
          [arg2] "{x2}" (arg2),
        : .{});
}

This makes an inline syscall3 function to call a syscall with three arguments, and uses it to wrap the write syscall as myWrite.

Reading the generated assembly on Compiler Explorer without optimizations, you should see this snippet in the main function:

        adrp    x8, ___anon_1943@PAGE
        add     x8, x8, ___anon_1943@PAGEOFF
        ldr     x0, [x8]
        adrp    x1, ___anon_1947@PAGE
        add     x1, x1, ___anon_1947@PAGEOFF
        mov     w8, #2
        mov     x2, x8
        bl      _example.myWrite

and this one outside defines those ___anon values:

___anon_1943:
        .long   1

___anon_1947:
        .asciz  "hi"

The first snippet loads Fd.out into the first argument to myWrite correctly, and the rest loads the string and other arguments and then calls myWrite.

Now, if we look at the generated assembly with ReleaseSmall optimizations on Compiler Explorer, we see this as the contents of the main function:

        adrp    x1, ___anon_1947@PAGE
        add     x1, x1, ___anon_1947@PAGEOFF
        mov     w16, #4
        movk    w16, #512, lsl #16
        mov     x0, #0
        mov     w2, #2
        svc     #0x80
        mov     w0, #0
        ret

This is syscall3 and myWrite inlined with the passed arguments. It puts the string into the second argument, the syscall number into w16, and the number of characters into the third argument, as it should.

However, the first argument x0 gets #0, which is telling the write syscall to write the string to the stdin file descriptor. On my macOS machine, running this code without optimizations correctly writes to stdout and displays on the terminal, and running it with optimizations also displays to the terminal, but the string is written to stdin, which is not the same behavior nor what I meant it to do.

I checked on Compiler Explorer for a few other versions, and the equivalent code seems to have the same apparent miscompilation at least as far back as Zig 0.10.0, though in that version it does mov x0, xzr rather than mov x0, #0, both of which are ways to zero a register.

Is this use case not supported, or is this a compiler bug, or something else?