Illegal instruction hit in build executable

Hello. Zig day 1 for me.
I’m running into, I think, a sanitizer issue.

I see a lot of chatter on how to get around this to assist in troubleshooting. I think my problem is unique in that the problem crops up in the build executable itself and these workarounds do not appear to help me in this case when applied.

This is on Linux (tried clean installs of debian/fedora/mint) and only within virtualbox (!?). I’ve tried 0.14 and the daily builds - same behavior.

This happens on the default ‘zig init’ project that is generated. I follow up with ‘zig build’ and boom :collision:

I’ve traced it down to lib/compiler_rt/memset.zig:21 via gdb in the generated build executable.

I’ve tried to not spew too much information here. Since I’m so new to this, I tried to do as much homework as I could in order to get the conversation started. My brain can’t let me let go of this just yet. Is there next steps I should be taking here to narrow this down? What would you like to see?

I, for one, would like to understand this. I’m learning a lot from the experience so far. Thank you for that :+1:

Illegal instruction happens when an executable (in this case the compiler) is using an instruction that your processor (or in this case most likely VirtualBox) does not know (such as e.g. certain vector extensions).

This might help you too: tensorflow - How to enable AVX / AVX2 in VirtualBox 6.1.16 with Ubuntu 20.04 64bit? - Stack Overflow

If you can’t get it working, it might also be worth trying the 32 bit compiler (zig-linux-x86-0.*) which has less hardware requirements, but I would only use that as a last resort, since that is slower.

It sounds like you got the wrong binary for your platform.

1 Like

It looks like I do have the AVX/AVX2 problem. Good find!

I’ve tried all the tricks outlined in the above threads and still have the problem.

I then tried x86 to see what would happen and I get the same SIGILL, but this time in a different spot — lib/std/os/linux/tls.zig:519 (in 0.14)

I’ll keep chipping at this.
Thanks for the pointers.

Maybe I just figure out how to build from source and this goes away for me?

EDIT: zig 0.13 works out of the box for me.

I don’t think I have the wrong binary since I can ‘zig init’ successfully.

Does it show you what the illegal instruction is exactly?

As for compiling from source, it takes a while (even longer, since you’d need to do it inside the VM that you intend to use it in), but if you want to go for it, then this is the link with the (as far as I know) easiest instructions: Building Zig from source for dummies? - #6 by matklad

Nothing more than the line where gdb breaks.
I’ll go figure out how to disasm there and get the exact instruction for you.

Is the idea to make sure this is an AVX/2 instruction and not a sanitizer trap?

EDIT: it is VMOVD - https://uops.info/html-instr/VMOVD_XMM_R32.html

Try to build using the flag --mcpu baseline.
The x86_64 baseline features are: std.Target.x86.cpu.x86_64.

You can also use the minus to remove a feature (e.g. --mcpu native-avx2) or the plus to add a feature (e.g. --mcpu baseline+avx2).

Just tried. I think this is still related to my problem — anything like that is ignored by whatever is building the build executable.

Flags like that are only used/consumed by the build executable for the further stages?

That said, I tried:

zig build-exe --mcpu baseline main.zig

… (to avoid the generated build executable directly) and it fails with:

error: unrecognized parameter: ‘–mcpu’

But I am happy to report that a plain old ‘zig build-exe main.zig’ does work and results in the main executable. But when executed fails for the same reason the build executable fails at (the VMOVD).

That’s a SSE2 vector instruction, so even older than AVX. Edit: Nevermind, it’s AVX
There must be something wrong with your VirtualBox installation or config.

Is that also the same instruction that fails if you use the 32 bit version?

UD2
Sorry for such a brutal hack as a screenshot:

And thank you for continuing to walk me through this.

1 Like

Well, now that is a trap. I don’t understand why the mmap would fail there though. We’d need to get the errno value.

Also is this still the compiler crashing? It looks to me like it started generating some output there, at least there seems to be a cache directory.

1 Like

:+1:

I don’t think this is the compiler crashing - This is the build executable crashing and therefore that’s why we’re in the cache dir. It just so happens that it crashes on UB2 for x86 and VMOVD for x86_64.

This is still day one for me and as far as I can tell, ‘zig build’ ends up generating an executable named build (which further pulls the strings to progress the build?) and that executable is what breaks for me. So I can’t even get to the step where we hit making main.zig … that’s why I tried ‘zig build-exe’ to skip past all that.

Also, v0.13 works out of the box for me.

And, I think the root problem is Virtualbox — but it’s happening to expose some interesting behavior to me. We can investigate as deep as you guys/gals feel like digging. I’m in!

mmap is returning an error. It could be that you’re just running out of memory. If you could see the value of errno at the moment of crash, it would narrow this down.

This is the build executable crashing

That’s good news! it means that we can now change the code it’s using and add some debugging statements.

If you go into tls.zig and replace the trap with the following code, then we can get the error code:


        if (@as(isize, @bitCast(begin_addr)) < 0) {
            var buf: [4096] u8 = undefined;
            @panic(std.fmt.bufPrint(&buf, "Error: {}", .{std.posix.errno(begin_addr)}) catch "bufPrint error");
        }

Now sadly printing isn’t initialized at this point in the program, so you will still need to use gdb to get the actual output. After the error happens you can use print msg in gdb to get the error string.

2 Likes

OH MAN! I haven’t even had the time (yet) to realize that it’s all right there! This is seriously cool.

I tried using gdb to just grab errno for @LucasSantos91, but now realize why this isn’t easily possible.

I slightly modified what you had fed me.
Here we are:

SUCCESS (!?)

1 Like

Could you also try printing the begin_addr? I’m starting to think that it might just be a valid address which just happens to have a negative value.

You are right:

(Again, sorry for the screenshots, but simple copy-paste isn’t working for me for some reason)

Alright then, according to the mmap documentation, only -1 is a failure, so you can change this to if (@as(isize, @bitCast(begin_addr)) == -1) @trap(); and it should all work?

3 Likes