Mach-O linker: __thread_bss TLS offset misaligned when __thread_data is present (x86_64-macos)
Summary
Zig 0.15.1’s Mach-O linker produces misaligned TLS template offsets when both __thread_data and __thread_bss sections are present. This causes runtime crashes (EXC_I386_GPFLT on native x86_64, misaligned-address panic under Rosetta) for any thread_local variable in __thread_bss that requires 16-byte alignment (e.g., std::mutex, std::condition_variable, or explicit alignas(16)).
Minimal Reproducer
tls_padding.c:
_Thread_local long tls_init_a = 0xAAAAAAAAAAAAAAAA;
_Thread_local long tls_init_b = 0xBBBBBBBBBBBBBBBB;
main.cpp:
#include <cstdio>
#include <cstdint>
struct alignas(16) Aligned16 {
char data[32];
};
static thread_local Aligned16 tls_var;
int main() {
tls_var.data[0] = 42;
uintptr_t addr = reinterpret_cast<uintptr_t>(&tls_var);
if (addr % 16 != 0) {
printf("FAIL: tls_var at %p is not 16-byte aligned (mod 16 = %lu)\n",
&tls_var, addr % 16);
return 1;
}
printf("OK: tls_var at %p is 16-byte aligned\n", &tls_var);
return 0;
}
Build (cross-compile from Linux):
zig cc -target x86_64-macos -c tls_padding.c -o tls_padding.o
zig c++ -target x86_64-macos -o test_tls main.cpp tls_padding.o
Result on macOS x86_64: crashes with misaligned address / EXC_I386_GPFLT.
Without tls_padding.o linked, it works fine.
Root Cause
Inspecting the output binary’s Mach-O sections:
__thread_data: addr=0x100060f38 size=0x10 align=2^3 (8)
__thread_bss: addr=0x100060f50 size=0x30 align=2^4 (16)
The TLS offset of __thread_bss = 0x100060f50 - 0x100060f38 = 0x18. Since 0x18 mod 16 = 8, variables in __thread_bss requiring 16-byte alignment are misaligned at runtime.
The linker aligns __thread_bss correctly in virtual address space (VA 0x...f50 is 16-byte aligned), but the TLS template offset (0x18) is not. At runtime, dyld allocates a 16-byte-aligned block per thread and indexes into it using these offsets. Since base + 0x18 is only 8-byte aligned, the movaps instruction used to initialize the aligned struct faults.
Why it happens: __thread_data starts at VA 0x...f38 (which is 8 mod 16, due to preceding __thread_vars section). After 0x10 bytes of __thread_data, the end is at 0x...f48 (also 8 mod 16). The linker rounds up to the next 16-byte VA (0x...f50), giving a gap of 0x18. But this gap is computed in VA space, not offset space. The correct TLS offset should be align_up(0x10, 16) = 0x10, not 0x18.
The pattern is deterministic: an even number of 8-byte _Thread_local variables triggers the bug (their combined size causes __thread_data to land at 8 mod 16 in VA space); odd numbers happen to be fine.
Real-world Impact
This causes a segfault in barretenberg (C++ cryptography library) when cross-compiled with Zig and linked with a Rust static library (issue). The Rust lib introduces __thread_data (from chrono, crossbeam, regex, std crates), and C++ thread_local variables containing std::mutex (16-byte aligned on libc++) crash at construction.
Suggested Fix
When computing TLS section offsets within the template, use align_up(accumulated_offset, section_alignment) rather than VA_bss - VA_data. The virtual addresses may have alignment properties that don’t match the runtime TLS block’s base alignment.