Just wanted to let you know that I recently found the time to work on this bad idea, and I made some progress.
I can now do this :
// main.zig
const std = @import("std");
const dynamic_library_loader = @import("dynamic_library_loader.zig");
pub fn main() !void {
var gpa: std.heap.DebugAllocator(.{ .stack_trace_frames = 10 }) = .init;
const allocator = gpa.allocator();
defer if (gpa.deinit() != .ok) @panic("Memory check failed");
try dynamic_library_loader.init(.{ .debug = false });
defer dynamic_library_loader.deinit(allocator);
const lib = try dynamic_library_loader.load(allocator, "libc.so.6");
const printf_sym = try lib.getSymbol("printf");
const printf_addr = printf_sym.addr;
const printf: *const fn ([*:0]const u8, ...) callconv(.c) c_int = @ptrFromInt(printf_addr);
_ = printf("Hello, World!\n");
}
$ zig run src/main.zig
Hello, World!
The famous libc printf from a non libc linked static executable
It should be noted that no tricks are used, like the ones in detour and similar projects. This is a real dynamic loader (with lots of TODOs).
This is currently limited to linux x86-64, and it has only been tested on my machine.
I can even do this, thanks to the customizable SelfInfo system:
// main.zig
const std = @import("std");
const dynamic_library_loader = @import("dynamic_library_loader.zig");
pub const debug = struct {
pub const SelfInfo = dynamic_library_loader.CustomSelfInfo;
};
pub fn main() !void {
var gpa: std.heap.DebugAllocator(.{ .stack_trace_frames = 10 }) = .init;
const allocator = gpa.allocator();
defer if (gpa.deinit() != .ok) @panic("Memory check failed");
try dynamic_library_loader.init(.{ .debug = false });
defer dynamic_library_loader.deinit(allocator);
const lib = try dynamic_library_loader.load(allocator, "libc.so.6");
const sprintf_sym = try lib.getSymbol("sprintf");
const sprintf_addr = sprintf_sym.addr;
const sprintf: *const fn ([*c]u8, [*c]const u8, ...) callconv(.c) c_int = @ptrFromInt(sprintf_addr);
var buf: [128:0]u8 = undefined;
// trigger a segfault to test stack traces
_ = sprintf(&buf, @ptrFromInt(0x8));
}
$ zig run src/main.zig
Segmentation fault at address 0x8
../sysdeps/x86_64/multiarch/strchr-sse2.S:41:0: 0x7fb48767cf23 in __strchrnul_sse2 (../sysdeps/x86_64/multiarch/strchr-sse2.S)
./stdio-common/printf-parse.h:82:34: 0x7fb48762c669 in __find_specmb (vfprintf-internal.c)
./libio/iovsprintf.c:62:3: 0x7fb48764b208 in __vsprintf_internal (iovsprintf.c)
./stdio-common/sprintf.c:30:10: 0x7fb487629700 in __sprintf (sprintf.c)
/home/tibbo/Dev/Project/DynLoader/src/main.zig:26:16: 0x11a1e4f in main (main.zig)
_ = sprintf(&buf, @ptrFromInt(0x8));
^
/home/tibbo/Builds/Zig/zig-x86_64-linux-0.16.0-dev.1225+bf9082518/lib/std/start.zig:696:37: 0x11a2523 in callMain (std.zig)
const result = root.main() catch |err| {
^
Unwind error at address `/proc/self/exe:0x11a2523` (unwind info invalid), remaining frames may be incorrect
/home/tibbo/Builds/Zig/zig-x86_64-linux-0.16.0-dev.1225+bf9082518/lib/std/start.zig:237:5: 0x11889a1 in _start (std.zig)
asm volatile (switch (native_arch) {
^
[1] 15724 IOT instruction zig run src/main.zig
An almost perfect stack trace from a segfault in a dynamically loaded library! There is still an unwind error that I need to debug, but the ability to get a stack trace was extremely useful.
I consider the prototyping almost done and will soon proceed to a complete rewrite, as I now have a much clearer view of the subject. I just want to open a window and display something using X11 and EGL before that.
If you are curious, you can find the current dynamic_library_loader.zig file here. For the prototyping phase I decided to put everything into this one file.
How does it work?
Here is a breakdown of the main steps:
- Map the .so library ELF file
- Parse the ELF file
- Collect every symbol, with versions, visibility, binding, etc.
- Collect PT_LOAD segments
- Read PT_GNU_RELRO segments to collect the final permissions
- Collect the PT_TLS segment to handle thread-local storage later
- Read the PT_DYNAMIC segment
- Collect dependencies from DT_NEEDED
- Collect relocations
- Even the RELR ones, for which documentation is very sparse, but still used by every dynamic library on my system…
- Collect DT_INIT, DT_INIT_ARRAY, DT_FINI, DT_FINI_ARRAY
- Map PT_LOAD segments, trying to handle the file offset vs. memory offset mess, and honor permissions
- Repeat the previous steps with all discovered dependencies if needed
- For each newly loaded library, handle TLS
- Set the new TLS area. The initial TLS area is set by the zig startup process before main. It must be resized, adding new TLS blocks for each library that needs it, so we gather info about the current one using
std.os.tls.area_desc. But there is a problem: in addition to extending the area to lower addresses to place the new TLS block, we also extend it toward higher addresses to give room for the libc pthread structure. More on this after. - Set the new thread pointer (the fs register), and set the first word (as required by the ABI) and the third word (to respect the pthread struct layout) after TP to the TP address.
- Save the offset of the new TLS block for later.
- Set the new TLS area. The initial TLS area is set by the zig startup process before main. It must be resized, adding new TLS blocks for each library that needs it, so we gather info about the current one using
- Process normal relocations
- For TPOFF64 relocations, we use the TLS offset saved previously.
- For the JUMP_SLOT ones, we check before applying them if the symbol is a function that should be handled by ourselves, like dlopen, because we are the dynamic loader now. A function that is part of the public API of libc’s ld should be implemented by us.
- Process IRELATIVE relocations, in a separate pass because those resolver functions can depend on other relocations.
- Update permissions to their final state now that relocations have been applied.
- Call the function from DT_INIT and the functions from DT_INIT_ARRAY, taking care of the fact that the first init function of libc expects argc, argv, and the environment. A fact that I discovered only by reading glibc source code…
About TLS
When zig initializes the TLS area for the current static executable, it uses this layout on linux x86-64:
-----------------------------------------------
| TLS Blocks | ABI TCB | Zig TCB | DTV struct |
-------------^---------------------------------
`-- The TP register points here.
A more convenient layout, when loading libc, would be something like this:
| POTENTIAL PTHREAD STRUCT ====>
---------------------------------------------------------
| TLS Blocks | Zig TCB | ABI TCB | *DTV | *SELF | SPACE
-----------------------^---------------------------------
`-- The TP register points here.
just in case alexrp sees this
When mapping a TLS block from a loaded library, I compute the size of the new area using the _thread_db_sizeof_pthread symbol (I get rid of the Zig TCB struct for now), just in case. I feel so lucky that std.os.tls.area_desc is a pub var. And it seems I will eventually have a good reason to implement __tls_get_addr ![]()
About custom SelfInfo
Unfortunately, it seems there is no convenient way to add modules to be parsed when unwinding using the default implementation of SelfInfo (lib/std/debug/SelfInfo/Elf.zig).
just in case mlugg sees this
But I have to say, making SelfInfo customizable is pure genius. I just copied the default implementation, added an extra_phdr_infos array, and adapted the findModule logic to use it. With #25668 merged, the unwinding process is able to print a full stack trace! Again, it was immensely useful.
Just a final question: should I continue to post updates here? (I don’t know if it is considered as necroposting).
And apologies if my English sounds a bit off.