Aarch64/armv8 PinePhone target strange memory issues

Target: aarch64/armv8 PinePhone A53 target (Zig 0.15.2 running on Asahi Linux).

Some strange memory issues again this week. For example intFromBool was giving either 0xff or 0xfe, I dont recall sorry. But I also have had a var that should have been initialised to 0, was showing the same strange value (found because an if statement was failing). Also a self-numbering? enum(u8) with similar high value, until I explicitly set some non 0 values.

Currently floatFromInt is crashing, but since this is on the device, I have no way to debug the reason. What I can say is that when I take the calculated value (0x7070) from the crashing var, and instead use it as a const literal, the floatFromInt does not crash…

I was very reluctant to post, since I know it is very little information to go on. But combined with the other strange memory issues, and previous issues I have mentioned with array-of-array crash, and non-aligned 16bit access from the middle bytes of a packed u32 struct crashing. I was wondering if someone could please have a look at the compiler for this target.

I found workarounds for the other issues, but this one has me stumped. I even had a look at the assembly, but couldnt locate how it was doing to conversion. I was expecting to find something along the lines of FCVTZS. I did identify that the core does not have FEAT_FP16, and switched to f32, but no difference.

I know that the Zig aarch64 / Asahi Linux support is very new, so maybe I am the only one using this obscure combination :slight_smile: Zig itself is 100% stable on Asahi, it is only my target device where I am seeing these issues. ie. when I try to create test cases locally, they work.

Any thoughts or ideas would be appreciated.

building releaseSmall with

.target = b.resolveTargetQuery(.{
.cpu_arch = .aarch64,
.os_tag = .freestanding,
.abi = .none,
.cpu_model = .{ .explicit = &std.Target.arm.cpu.cortex_a53 },
}),

I don’t have knowledge about this, and can only think of a few possible suggestions:
Would this still happen if the backend is switched to LLVM?
Would this still happen if the allocator is changed (if possible, consider c_allocator)?
Can a minimal reproducible case be generated?

I am more inclined to think that there is a memory corruption issue in the code, such as a UAF.

1 Like

okay, so I finally figured out the float stuff. The FEAT_FP16 flag got me wondering if there was something similar for normal FP usage. It turns out that (CPACR_EL1) has some access control for FP that traps if not set! :slight_smile:

I also just realised that my test using const that worked, was probably evaluated at compile time, and thus a false test. Hence my false assumption that this was also related to the reading of the memory behind the var.

What a roller-coaster. But actually quite funny, since I am ~7,000 LoC in, and this is my first need for a float! Although I’m not convinced the phone is really at -14.2C, but Zig code is pretty cool :wink:

1 Like