Cross compile Zig fails on target with illegal instruction

I’ve tried cross compiling the default zig app for my target board which is running a Microchip SAMA5D3 (Cortex-A5) but run into an illegal instruction error.

From the Host:

> mkdir armtest
> cd armtest
> zig init-exec
> zig build -Dtarget=arm-linux-gnueabihf
> file zig-out/bin/armtest 
zig-out/bin/armtest: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, with debug_info, not stripped

… copy to target and run …

~ # /armtest
Illegal instruction (core dumped)
~ # uname -a
Summit Linux V2470034 4.19.231 #1 PREEMPT none armv7l GNU/Linux
~ # strace /armtest
execve("/armtest", ["/armtest"], 0xbeafddb0 /* 14 vars */) = 0
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPC, si_addr=0x98b10} ---
+++ killed by SIGILL (core dumped) +++
Illegal instruction (core dumped)

Any help on how I could further troubleshoot this?
Zig version: 0.12.0-dev.47+0461a64a9

You probably need to specify the model of the cpu to zig build with the mcpu option. If you don’t, Zig will choose a baseline cpu to work with. Looking around in the std lib, I think the baseline cpu for ARM is Cortex-A7.

I modified my build.zig to specify the target and CPU for a Cortex-A5.

    const exe = b.addExecutable(.{
        .name = "armtest",
        // In this case the main source file is merely a path, however, in more
        // complicated build scripts, this could be a generated file.
        .root_source_file = .{ .path = "src/main.zig" },
        .target = .{
            .cpu_arch = std.Target.Cpu.Arch.arm,
            .cpu_model = std.zig.CrossTarget.CpuModel{ .explicit = &std.Target.arm.cpu.cortex_a5 },
            .cpu_features_add = std.Target.arm.cpu.cortex_a5.features,
            .os_tag = std.Target.Os.Tag.linux,
            .abi = std.Target.Abi.gnueabihf,
        },
        .optimize = optimize,
    });

I still get an ‘illegal instruction’ error. I captured a core dump:

Core was generated by `./armtest'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00098b70 in memset (dest=0xaa000 <os.linux.tls.main_thread_tls_buffer> "", c=0 '\000', len=32) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/compiler_rt/memset.zig:21
21                  if (n == 0) break;
(gdb) where
#0  0x00098b70 in memset (dest=0xaa000 <os.linux.tls.main_thread_tls_buffer> "", c=0 '\000', len=32) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/compiler_rt/memset.zig:21
#1  0x0002e478 in os.linux.tls.prepareTLS (area=...) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/os/linux/tls.zig:280
#2  0x0002c774 in os.linux.tls.initStaticTLS (phdrs=...) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/os/linux/tls.zig:339
#3  0x0002bcb4 in start.posixCallMainAndExit () at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/start.zig:404
#4  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) bt  full
#0  0x00098b70 in memset (dest=0xaa000 <os.linux.tls.main_thread_tls_buffer> "", c=0 '\000', len=32) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/compiler_rt/memset.zig:21
        d = <optimized out>
        n = <optimized out>
        d = <optimized out>
        n = <optimized out>
#1  0x0002e478 in os.linux.tls.prepareTLS (area=...) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/os/linux/tls.zig:280
        dtv = 0x0
        tcb_ptr = 0x20
#2  0x0002c774 in os.linux.tls.initStaticTLS (phdrs=...) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/os/linux/tls.zig:339
        tls_area = {ptr = 0xaa000 <os.linux.tls.main_thread_tls_buffer> "", len = 32}
        tp_value = 0
#3  0x0002bcb4 in start.posixCallMainAndExit () at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/start.zig:404
        auxv = 0xbe960dec
        at_hwcap = 123094
        phdrs = {ptr = 0x10034, len = 7}
        argv = 0xbe960da4
        argc = 1
        envp_optional = 0xbe960dac
        envp_count = 15
        envp = {ptr = 0xbe960dac, len = 15}
#4  0x00000000 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) quit

Total guess, but it seems like SAMA5D3 does not support NEON, so maybe doing:

.cpu_features_sub = std.Target.arm.featureSet(&.{ .neon }),

might help.

1 Like

@squeek502, correct, the SAMA5D3 does not support NEON.

I added that line to my build.zig but still see the illegal instruction.

The core dump appears to be the same.

Core was generated by `/armtest'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00098b70 in memset (dest=0xaa000 <os.linux.tls.main_thread_tls_buffer> "", c=0 '\000', len=32) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/compiler_rt/memset.zig:21
21                  if (n == 0) break;
(gdb) where
#0  0x00098b70 in memset (dest=0xaa000 <os.linux.tls.main_thread_tls_buffer> "", c=0 '\000', len=32) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/compiler_rt/memset.zig:21
#1  0x0002e478 in os.linux.tls.prepareTLS (area=...) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/os/linux/tls.zig:280
#2  0x0002c774 in os.linux.tls.initStaticTLS (phdrs=...) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/os/linux/tls.zig:339
#3  0x0002bcb4 in start.posixCallMainAndExit () at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/start.zig:404
#4  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) bt full
#0  0x00098b70 in memset (dest=0xaa000 <os.linux.tls.main_thread_tls_buffer> "", c=0 '\000', len=32) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/compiler_rt/memset.zig:21
        d = <optimized out>
        n = <optimized out>
        d = <optimized out>
        n = <optimized out>
#1  0x0002e478 in os.linux.tls.prepareTLS (area=...) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/os/linux/tls.zig:280
        dtv = 0x0
        tcb_ptr = 0x20
#2  0x0002c774 in os.linux.tls.initStaticTLS (phdrs=...) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/os/linux/tls.zig:339
        tls_area = {ptr = 0xaa000 <os.linux.tls.main_thread_tls_buffer> "", len = 32}
        tp_value = 0
#3  0x0002bcb4 in start.posixCallMainAndExit () at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/start.zig:404
        auxv = 0xbee3bdf4
        at_hwcap = 123094
        phdrs = {ptr = 0x10034, len = 7}
        argv = 0xbee3bdb4
        argc = 1
        envp_optional = 0xbee3bdbc
        envp_count = 13
        envp = {ptr = 0xbee3bdbc, len = 13}
#4  0x00000000 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

As a test, try removing all features:

        .target = .{
            .cpu_arch = std.Target.Cpu.Arch.arm,
            .cpu_model = std.zig.CrossTarget.CpuModel{ .explicit = &std.Target.arm.cpu.cortex_a5 },
            .cpu_features_sub = std.Target.arm.cpu.cortex_a5.features,
            .os_tag = std.Target.Os.Tag.linux,
            .abi = std.Target.Abi.gnueabihf,
        },

Just to check that removing some set features could fix it.

@squeek502, removing all features still resulted in an illegal instruction error.

1 Like

Ok, last attempt before this should be reported as a bug.

The full list of features enabled when using -target arm-linux-gnueabihf -mcpu cortex_a5 was obtained by running this on a dummy main.c file:

zig build-obj main.c -target arm-linux-gnueabihf -mcpu cortex_a5 --verbose-cc

(in theory there’s --verbose-llvm-cpu-features for this but I couldn’t get that to do anything)

So, this should actually disable all CPU features that get enabled for the cortex_a5 cpu target:

.cpu_features_sub = std.Target.arm.featureSet(&.{
    .aclass,      .d32,    .db,        .dsp,     .fp16,      .fp64,           .fpregs,          .fpregs64,
    .v4t,         .v5t,    .v5te,      .v6,      .v6k,       .v6m,            .v6t2,            .has_v7,
    .has_v7clrex, .v8m,    .mp,        .neon,    .perfmon,   .ret_addr_stack, .slow_fp_brcc,    .slowfpvfmx,
    .slowfpvmlx,  .thumb2, .trustzone, .v7a,     .vfp2,      .vfp2sp,         .vfp3,            .vfp3d16,
    .vfp3d16sp,   .vfp3sp, .vfp4,      .vfp4d16, .vfp4d16sp, .vfp4sp,         .vmlx_forwarding,
}),

If that doesn’t work, then it seems like that’s got to be a bug. If it does work, though, then try narrowing it down to the minimal set of features you need to disable to get it to work.

@squeek502, now a segmentation fault rather than illegal instruction, feels like progress!

coredump:

Core was generated by `/armtest'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000224f8 in _start () at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/start.zig:243
243         asm volatile (switch (native_arch) {
(gdb) where
#0  0x000224f8 in _start () at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/start.zig:243
(gdb) bt full
#0  0x000224f8 in _start () at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/start.zig:243
No locals.

Instead of guessing, you should use the disasm command in gdb to find out which instruction is causing Illegal Instruction. It will point right at it.

1 Like

@andrewrk, do you mean disas? I get an unknown command error for disasm.
Running just disas gives a dump of assembler but I’ll have to figure out which line the failure happened on, this command is something new to me.

Program terminated with signal SIGILL, Illegal instruction. 
#0  0x00098b70 in memset (dest=0xaa000 <os.linux.tls.main_thread_tls_buffer> "", c=0 '\000', len=32) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/compiler_rt/memset.zig:21
21                  if (n == 0) break; 
(gdb) where
#0  0x00098b70 in memset (dest=0xaa000 <os.linux.tls.main_thread_tls_buffer> "", c=0 '\000', len=32) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/compiler_rt/memset.zig:21

(gdb) disas /m memset
...
0x00098b70 <+40>:    vdup.8  q8, r1
...

Appears even with removing neon from the feature set VDUP was still used.

From my build.zig

    const exe = b.addExecutable(.{
        .name = "armtest",
        .root_source_file = .{ .path = "src/main.zig" },
        .target = .{
            .cpu_arch = std.Target.Cpu.Arch.arm,
            .cpu_model = std.zig.CrossTarget.CpuModel{ .explicit = &std.Target.arm.cpu.cortex_a5 },
            .cpu_features_sub = std.Target.arm.featureSet(&.{.neon}),
            .os_tag = std.Target.Os.Tag.linux,
            .abi = std.Target.Abi.gnueabihf,
        },
        .optimize = optimize,
    });

Yes I meant disas. It prints assembly instructions and then a => token pointing directly at the assembly instruction that caused the Illegal Instruction signal.

@andrewrk, thanks, it does appear to be calling VDUP.8, which is a feature from neon (which should be removed based on my build.zig settings for the target).
I can create a bug report for the issue.

Core was generated by `/armtest'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00098b70 in memset (dest=0xaa000 <os.linux.tls.main_thread_tls_buffer> "", c=0 '\000', len=32) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/compiler_rt/memset.zig:21
21                  if (n == 0) break;
(gdb) where
#0  0x00098b70 in memset (dest=0xaa000 <os.linux.tls.main_thread_tls_buffer> "", c=0 '\000', len=32) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/compiler_rt/memset.zig:21
#1  0x0002e478 in os.linux.tls.prepareTLS (area=...) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/os/linux/tls.zig:280
#2  0x0002c774 in os.linux.tls.initStaticTLS (phdrs=...) at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/os/linux/tls.zig:339
#3  0x0002bcb4 in start.posixCallMainAndExit () at /home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/lib/std/start.zig:404
#4  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) disas
Dump of assembler code for function memset:
   0x00098b48 <+0>:     push    {r4, r5, r11, lr}
   0x00098b4c <+4>:     add     r11, sp, #8
   0x00098b50 <+8>:     cmp     r2, #0
   0x00098b54 <+12>:    beq     0x98ba4 <memset+92>
   0x00098b58 <+16>:    cmp     r2, #16
   0x00098b5c <+20>:    bcs     0x98b6c <memset+36>
   0x00098b60 <+24>:    mov     r3, r2
   0x00098b64 <+28>:    mov     r12, r0
   0x00098b68 <+32>:    b       0x98b98 <memset+80>
   0x00098b6c <+36>:    bic     lr, r2, #15
=> 0x00098b70 <+40>:    vdup.8  q8, r1
   0x00098b74 <+44>:    add     r12, r0, lr
   0x00098b78 <+48>:    and     r3, r2, #15
   0x00098b7c <+52>:    mov     r4, lr
   0x00098b80 <+56>:    mov     r5, r0
   0x00098b84 <+60>:    vst1.8  {d16-d17}, [r5]!
   0x00098b88 <+64>:    subs    r4, r4, #16
   0x00098b8c <+68>:    bne     0x98b84 <memset+60>
   0x00098b90 <+72>:    cmp     lr, r2
   0x00098b94 <+76>:    beq     0x98ba4 <memset+92>
   0x00098b98 <+80>:    strb    r1, [r12], #1
   0x00098b9c <+84>:    subs    r3, r3, #1
   0x00098ba0 <+88>:    bne     0x98b98 <memset+80>
   0x00098ba4 <+92>:    pop     {r4, r5, r11, pc}
End of assembler dump.

Hmmm so it’s in the function memset which is intended to be provided by glibc when you target arm-linux-gnueabihf. The way this works is that Zig’s compiler_rt provides memset as a weak symbol, which allows it to be overridden by libc. Unfortunately it looks like glibc is not actually overriding the compiler_rt-provided memset.

So I think there are two issues here:

  1. compiler_rt not respecting the CPU features, although I’d like you to double check that zig is in fact seeing the target CPU features as not including neon or any other CPU feature that depends on neon.
  2. weak memcmp symbol from compiler_rt is not overridden when building Python 3.11.0 on Linux and MacOS · Issue #13303 · ziglang/zig · GitHub

compiler_rt not respecting the CPU features, although I’d like you to double check that zig is in fact seeing the target CPU features as not including neon or any other CPU feature that depends on neon

@andrewrk, in seeing how I could double check this for you I thought to build with verbose output enabled and that showed something interesting.

> zig build --verbose
/home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/zig build-exe /home/mjohn/zig/workspace/armtest/src/main.zig --cache-dir /home/mjohn/zig/workspace/armtest/zig-cache --global-cache-dir /home/mjohn/.cache/zig --name armtest -target arm-linux-gnueabihf -mcpu cortex_a5-neon --listen=- 

I found the -mcpu cortex_a5-neon interesting as that feature should be removed based on the build.zig. Is there anything else I can run to remove neon support?

Even passing the target and cpu options from the command line appear to include neon:

> zig build --verbose -Dtarget=arm-linux-gnueabihf -Dcpu=cortex_a5
/home/mjohn/zig/zig-linux-x86_64-0.12.0-dev.47+0461a64a9/zig build-exe /home/mjohn/zig/workspace/armtest/src/main.zig --cache-dir /home/mjohn/zig/workspace/armtest/zig-cache --global-cache-dir /home/mjohn/.cache/zig --name armtest -target arm-linux-gnueabihf -mcpu cortex_a5-neon --listen=- 

EDIT:

Actually looks like the cortex_a5-neon means all cortex_a5 features minus neon.

Looking to jump back into this issue, as I’m curious if compiler_rt now uses the standard library definitions from the cross compiler rather then the defaults provided by zig. Anyone aware of the progress on this issue? #13303 on github. Wondering if its worth the effort to update my build.zig file to test with the latest zig.

Edit: (2024-03-22) I went through the effort to update my build.zig file and looks like cross compile is still broken.