Remote Inspection of a Stack Trace from a Freestanding Embedded Target

haydenridd · December 20, 2024, 3:26am

Currently, it’s relatively complex to log a full stack trace on a freestanding embedded target as is explained in these posts:
https://andrewkelley.me/post/zig-stack-traces-kernel-panic-bare-bones-os.html

However, you don’t actually need to log the full stack trace on target. As long as you just log the “raw” stack trace of addresses like so (MicroZig’s default panic handler does this):

error: microzig PANIC: PANIC
error: stack trace:
error:   0: 0x10010277
error:   1: 0x1000A549
error:   2: 0x10005A83
error:   3: 0x10001777
error:   4: 0x10000219
error:   5: 0x100001BF
error:   6: 0x100003DF
error:   7: 0x100006A3
error:   8: 0x20041F87
info: triggering breakpoint...

You can reconstruct the more verbose stack trace (file, function, line numbers, etc.) on your PC assuming you have the exact build of the .elf that’s currently running on your embedded target. I started to mess around with this idea like so:

const std = @import("std");
const DW = std.dwarf;
var gpa: std.heap.GeneralPurposeAllocator(.{}) = .{};
const alloc = gpa.allocator();

pub fn main() !void {
    var sections: DW.DwarfInfo.SectionArray = DW.DwarfInfo.null_section_array;
    var di = try std.debug.readElfDebugInfo(
        alloc,
        "zig-out/firmware/some-firmware.elf",
        null,
        null,
        &sections,
        null,
    );
    const symbol_info = di.getSymbolAtAddress(alloc, 0x10010277) catch |err| return err;
    defer symbol_info.deinit(alloc);
    const li = symbol_info.line_info.?;
    std.debug.print("{s}:{d}:{d}\n", .{ li.file_name, li.line, li.column });
}

And after getting an immediate panic:

thread 89350 panic: index out of bounds: index 11259029135298612, len 1811696
/home/hayden/.zvm/0.13.0/lib/std/debug.zig:1177:74: 0x104ccdf in readElfDebugInfo (trace_analyzer)
        const str_shdr: *const elf.Shdr = @ptrCast(@alignCast(&mapped_mem[math.cast(usize, str_section_off) orelse return error.Overflow]));
                                                                         ^
/home/hayden/Documents/iot_poc/pico_fw/trace_analyzer.zig:7:44: 0x105af49 in main (trace_analyzer)
    var di = try std.debug.readElfDebugInfo(
                                           ^
/home/hayden/.zvm/0.13.0/lib/std/start.zig:524:37: 0x103cb35 in posixCallMainAndExit (trace_analyzer)
            const result = root.main() catch |err| {
                                    ^
/home/hayden/.zvm/0.13.0/lib/std/start.zig:266:5: 0x103c651 in _start (trace_analyzer)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x0 in ??? (???)
Aborted (core dumped)

I quickly realized that the debug info inspection facilities in std.debug seem to assume you’re inspecting the same system you’re currently compiling for. Meaning, this little “trace analyzer” I’ve written is compiled for 64-bit x86, and thus as a quick example the Ehdr datatype in std.elf is thinking it needs to be 8 bytes long when in reality the elf I’m inspecting is for a 32 bit MCU and should be 4 bytes long:

pub const Ehdr = switch (@sizeOf(usize)) {
    4 => Elf32_Ehdr,
    8 => Elf64_Ehdr,
    else => @compileError("expected pointer size of 32 or 64"),
};

This is almost certainly the source of my panic.

So, long winded way of asking: Am I going to have to write my own ELF debug info examiner or is there a different way to leverage std.debug I’m not thinking of? This kind of “after the fact” stack trace analysis could be super useful for constrained embedded where you don’t have the flash space to load lots of debug info into the binary itself.

haydenridd · December 20, 2024, 5:22am

After more poking around std/debug/Dwarf.zig provides some more illumination with this comment:

github.com

ziglang/zig/blob/0ff0bdb4a71d8fd055272abfdadea2f23f99574a/lib/std/debug/Dwarf.zig#L3


      
          //! Implements parsing, decoding, and caching of DWARF information.
          //!
          //! This API does not assume the current executable is itself the thing being
          //! debugged, however, it does assume the debug info has the same CPU
          //! architecture and OS as the current executable. It is planned to remove this
          //! limitation.
          //!
          //! For unopinionated types and bits, see `std.dwarf`.
          
          const builtin = @import("builtin");
          const native_endian = builtin.cpu.arch.endian();
          
          const std = @import("../std.zig");

So it does look like I’ll need to write at least some of the debug info parsing myself for the time being however would love to hear if anyone else has gone down this road before!

dimdin · December 20, 2024, 10:42am

llvm-symbolizer example:

❯ cat test.zig
const std = @import("std");

pub fn main() void {
    foo();
}

❯ ./test
Illegal instruction at address 0x1037044
/home/din/test.zig:8:5: 0x1037044 in foo (test)
    @trap();
    ^
/home/din/test.zig:4:8: 0x1034fd8 in main (test)
    foo();
       ^
/home/din/zig/0.13.0/lib/std/start.zig:514:22: 0x1034889 in posixCallMainAndExit (test)
            root.main();
                     ^
/home/din/zig/0.13.0/lib/std/start.zig:266:5: 0x10343f1 in _start (test)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x0 in ??? (???)

❯ llvm-symbolizer -pa --obj=./test 0x1037044 0x1034fd8 0x1034889 0x10343f1
0x1037044: test.foo at /home/din/test.zig:8:5

0x1034fd8: test.main at /home/din/test.zig:4:8

0x1034889: start.callMain at /home/din/zig/0.13.0/lib/std/start.zig:514:22
 (inlined by) start.callMainWithArgs at /home/din/zig/0.13.0/lib/std/start.zig:482:20
 (inlined by) start.posixCallMainAndExit at /home/din/zig/0.13.0/lib/std/start.zig:438:36

0x10343f1: _start at /home/din/zig/0.13.0/lib/std/start.zig:266:5