Hi, posting here because it feels too massive for a zulip chat.
While trying to get stack traces in a pretty specific use case I stumbled upon a dynamic library that has a separate debuginfo file that contains some DW_UT_partial units.
As expected after reading the std/debug/Dwarf.zig code, the parsing fails because the parser gives up when encountering a unit that is not a DW_UT_compile:
fn scanAllFunctions(di: *Dwarf, gpa: Allocator, endian: Endian) ScanError!void {
var fr: Reader = .fixed(di.section(.debug_info).?);
var this_unit_offset: u64 = 0;
while (this_unit_offset < fr.buffer.len) {
fr.seek = @intCast(this_unit_offset);
const unit_header = try readUnitHeader(&fr, endian);
if (unit_header.unit_length == 0) return;
const next_offset = unit_header.header_length + unit_header.unit_length;
const version = try fr.takeInt(u16, endian);
if (version < 2 or version > 5) return bad();
var address_size: u8 = undefined;
var debug_abbrev_offset: u64 = undefined;
if (version >= 5) {
const unit_type = try fr.takeByte();
if (unit_type != DW.UT.compile) return bad(); // <=== here
address_size = try fr.takeByte();
debug_abbrev_offset = try readFormatSizedInt(&fr, unit_header.format, endian);
} else {
debug_abbrev_offset = try readFormatSizedInt(&fr, unit_header.format, endian);
address_size = try fr.takeByte();
}
and so I get stack traces like this one:
info: loading system libc...
info: testing libc printf segfault...
Segmentation fault at address 0x8
???:?:?: 0x7f0f7fb5155d in ??? (/lib64/libc.so.6)
???:?:?: 0x7f0f7fa45697 in ??? (/lib64/libc.so.6)
???:?:?: 0x7f0f7fa46513 in ??? (/lib64/libc.so.6)
???:?:?: 0x7f0f7fa3a2c2 in ??? (/lib64/libc.so.6)
{project_path}/examples/segfault.zig:37:15: 0x12a64b6 in main (segfault.zig)
{zig_install_dir}/lib/std/start.zig:750:30: 0x12a7013 in callMain (std.zig)
{zig_install_dir}/lib/std/start.zig:203:5: 0x12663a1 in _start (std.zig)
It makes me a little sad because without the separate debuginfo file being accessible, it at least get function names form the .so file symbol table:
info: loading system libc...
info: testing libc printf segfault...
Segmentation fault at address 0x8
???:?:?: 0x7f862093155d in __strlen_avx2 (/lib64/libc.so.6)
???:?:?: 0x7f8620825697 in __printf_buffer (/lib64/libc.so.6)
???:?:?: 0x7f8620826513 in __vfprintf_internal (/lib64/libc.so.6)
???:?:?: 0x7f862081a2c2 in __printf (/lib64/libc.so.6)
{project_path}/examples/segfault.zig:37:15: 0x12a64b6 in main (segfault.zig)
{zig_install_dir}/lib/std/start.zig:750:30: 0x12a7013 in callMain (std.zig)
{zig_install_dir}/lib/std/start.zig:203:5: 0x12663a1 in _start (std.zig)
I made a small patch to the std lib to test this hypothesis, and it went well until encountering a DW_AT_abstract_origin referencing a DIE in another unit, which seems to be unsupported judging from this comment:
// Follow the DIE it points to and repeat
const ref_offset = try this_die_obj.getAttrRef(AT.abstract_origin, this_unit_offset, next_offset);
fr.seek = @intCast(ref_offset);
this_die_obj = (try parseDie(
&fr,
attrs_bufs[2],
abbrev_table, // wrong abbrev table for different cu
unit_header.format,
endian,
address_size,
)) orelse return bad();
So I guarded the parseDie call with a check that ref_offset was in the range of the current unit, breaking with null if it wasn’t the case, to not fail the DWARF file in this case. After that, stack traces were finally “more complete”:
info: loading system libc...
info: testing libc printf segfault...
Segmentation fault at address 0x8
../sysdeps/x86_64/multiarch/strlen-avx2.S:76: 0x7fbfd605055d in __strlen_avx2 (../sysdeps/x86_64/multiarch/strlen-avx2.S)
/usr/src/debug/glibc-2.43-4.fc44.x86_64/stdio-common/vfprintf-process-arg.c:443:17: 0x7fbfd5f44697 in __printf_buffer (vfprintf-internal.c)
/usr/src/debug/glibc-2.43-4.fc44.x86_64/stdio-common/vfprintf-internal.c:1548:7: 0x7fbfd5f45513 in __vfprintf_internal (vfprintf-internal.c)
/usr/src/debug/glibc-2.43-4.fc44.x86_64/stdio-common/printf.c:33:10: 0x7fbfd5f392c2 in __printf (printf.c)
{project_path}/examples/segfault.zig:37:15: 0x103762f in main (segfault)
{zig_install_dir}/lib/std/start.zig:203:5: 0x1021eed in _start (segfault)
Questions are:
- is it worth an issue or a PR, regarding how niche the use case is?
- if it is, how would one create a test for it? I can’t see a way besides using specific ELF and debuginfo files that reproduce the issue without the patch, but that would mean including a binary blob fixture somewhere…
And one opinion:
I think failing to parse DWARF from the separate debuginfo file should fall back to getting info from the original ELF file if possible instead of giving up completely. Note that this is doable in userland thanks to customizable SelfInfo, but this behavior by default makes more sense to me.
Also, vaguely related: https://codeberg.org/ziglang/zig/issues/31790.
I understand loading dynamic libraries without linking libc and still having perfect stack traces is not the most wanted feature for the zig std lib, but hey, it is my thing ![]()