Logging a stack trace on bare metal

debrisapron · April 28, 2024, 8:41pm

Hi lovely Zig community! This is about an embedded (freestanding) project I’m writing with Zig, where I have been wrestling with a persistent problem of trying to locate the source of panics. I have a custom panic handler that looks like this:

pub fn panic(msg: []const u8, _: ?*std.builtin.StackTrace, _: ?usize) noreturn {
    hal.ardPrint("!!!PANIC!!!\n");
    hal.ardPrint(msg.ptr);
    hal.ardPrint("\nNow my watch is ended.\n");
    hal.ardFlush();
    while (true) {}
}

Where hal.ardPrint and ardFlush just call the corresponding Arduino commands to print to the Serial out. This works great, but all it gives me is the error message e.g. “integer overflow” which on its own is not super helpful. What I want is to print out the stack trace line by line, but I am absolutely stuck. I have tried copying large chunks of std.debug and hacking them to do what I want but ultimately I cannot figure out how to do this in the absence of stdout or stderr. I read this article but quickly found myself totally out of my depth with all this DWARF stuff. Can anybody help out?

dimdin · April 28, 2024, 9:15pm

The last panic parameter: ret_addr: ?usize is the panic address.
From that address and the symbols addresses from elf you can find the function that panics.

debrisapron · April 28, 2024, 9:45pm

Thanks for the response dimdin. Yeah I sort of figured out that the ret_addr was the important thing by looking at the source, but that’s as far as I’ve got. I actually found this function which I think is what I need to reproduce, but it relies on this DebugInfo thing which is a total mystery to me.

Also I have to be honest, I understand about 10% of this sentence:

From that address and the symbols addresses from elf you can find the function that panics.

What is elf, what are the symbol addresses, how can I use these things to find the function and how would I print it out?

dimdin · April 28, 2024, 10:32pm

ELF is an executable file format, it is the linker output file.
Normally you generate a .bin or .hex file from that ELF file.
There are tools that can read ELF files and print the addresses for each symbol (such as nm, readelf, objdump).

Using objdump you can disassemble the ELF file (objdump -d zig-out/bin/name).
The output looks like:

1039126:       48 8b bd 60 ff ff ff    mov    -0xa0(%rbp),%rdi
103912d:       48 8b b5 48 ff ff ff    mov    -0xb8(%rbp),%rsi
1039134:       e8 c7 f0 ff ff          call   1038200 <builtin_panicOutOfBounds__316>
1039139:       31 c0                   xor    %eax,%eax

If the return address from panic is 0x1039139 you know that the previous statement called panic.

dimdin · April 28, 2024, 10:46pm

dumpStackTrace is used in default panic; it calls getSelfDebugInfo to retrieve DebugInfo and writeStackTrace to send the stack trace with the debug info to a stream.

debrisapron · April 28, 2024, 11:16pm

OK this is great info, so I can look up the ret_addr in this dump, find the call and then presumably iterate to find the call before that and so on. This definitely seems doable but also quite complicated! I guess this is a hard problem because all the panic handling code is designed assuming a standard posix (or windows) environment?

dimdin · April 29, 2024, 9:35pm

Yes, the code for handling and formatting panic is big.
StackTrace is an array with addresses, you can handle them as return addresses.
getSelfDebugInfo and getSymbolAtAddress do the magic to find the symbol, file and line number from the debugging info of the ELF file.

tracy · May 2, 2024, 4:15pm

Unfortunately the stdlib code for loading symbols from DWARF sections quickly gets into areas that require more of an operating system than you might want at this point.

For freestanding, you might get some use out of
GitHub - kubkon/zig-dwarfdump: dwarfdump utility but in Zig. It has some code that can go from a slice of bytes into parsed DWARF info. It’s up to you to provide the bytes of the DWARF sections.

If your kernel is an ELF file then you can probably read your own binary to get those sections. If you’re using a binary image, then those sections are stripped out when you did the objcopy. You’ll need some other way to convey the bytes of the DWARF section to your runtime.

I opted for a build-time tool that would extract a pared-down table of function addresses & names, then append that to the image after the objcopy. Then there’s some runtime code to look up an address in that table, returning the name of the function at that address.

0xfadead · November 12, 2024, 3:30am

Andrew Kelley has his own blog post about this:
https://andrewkelley.me/post/zig-stack-traces-kernel-panic-bare-bones-os.html