Loading `libvulkan.so.1` on Linux with `std.ElfDynLib`

At the moment, my Linux application framwork seizer uses software rendering to achieve it’s goal of static linking. Long term, I’m not really satisfied with that as a solution. GPUs are amazingly powerful, and recreating everything they CPU side is just not possible. The stumbling block for seizer is that:

  1. graphics libraries require dynamic linking or dlopen
  2. Linux has no system libc, so we must bring our own dlopen to access graphics libraries
  3. Graphics libraries use functions from glibc or musl
  4. glibc and musl don’t support this usecase; they only support being used with their own dynamic linker.

(See also this post in my topic for shimizu)

Currently, I have two ideas for getting around this problem:

  1. Work with the various upstreams (mesa, glibc, musl, etc.) to support for being loaded by third party dlopen implementations.
  2. Write my own GPU library. (Been meaning to watch Raw dogging linux graphics (DRM) - YouTube)

For this topic, I am focused on idea 1.


Before filing any issues I want to make sure that I have something concrete to go off of. I’ve created a repo here with a small program that attempts to load libvulkan.so.1:

My first attempt looked something like this:

pub fn main() !void {
    var vulkan_dynlib = try std.DynLib.open("libvulkan.so.1");
    defer vulkan_dynlib.close();

    const vkGetInstanceProcAddr = vulkan_dynlib.lookup(vk.PfnGetInstanceProcAddr, "vkGetInstanceProcAddr") orelse return error.vkGetInstanceProcAddrNotFound;

    const vkb: VulkanBaseDispatch = try VulkanBaseDispatch.load(vkGetInstanceProcAddr);
    std.log.debug("vkCreateInstance = {}", .{vkb});
}

const VulkanBaseDispatch = vk.BaseWrapper(vulkan_apis);

const vk = @import("vulkan");
const vulkan_apis: []const vk.ApiInfo = &.{
    vk.features.version_1_0,
};

const dl_lib = @import("dlopen-mesa-with-zig_lib");
const std = @import("std");

Running the program gives us this output, and we are given a stark reminder of the uphill battle we may be facing:

error: ElfHashTableNotFound
/home/geemili/code/zig/build-master/stage3/lib/zig/std/dynamic_library.zig:346:45: 0x103e340 in open (dlopen-mesa-with-zig)
            .hashtab = maybe_hashtab orelse return error.ElfHashTableNotFound,
                                            ^
/home/geemili/code/zig/build-master/stage3/lib/zig/std/dynamic_library.zig:32:28: 0x103b126 in open (dlopen-mesa-with-zig)
        return .{ .inner = try InnerType.open(path) };
                           ^
/home/geemili/code/dlopen-mesa-with-zig/src/main.zig:2:25: 0x103af78 in main (dlopen-mesa-with-zig)
    var vulkan_dynlib = try std.DynLib.open("libvulkan.so.1");
                        ^

My system libvulkan.so.1 contains no DT_HASH section, only a DT_GNU_HASH section. Glibc recently disabled DT_HASH by default. While the technical merits can be debated, what this unquestionably represents is a disregard for backwards compatibility. Issues similar to this is why seizer only supports a built-in software renderer at the moment. But I digress.

To solve Zig’s standard library not supporting DT_GNU_HASH, I copied ElfDynLib from Zig’s standard library and added support for looking up symbols using DT_GNU_HASH. I used “ELF: better symbol lookup via DT_GNU_HASH” by flapenguin.me for reference. (UPDATE: made a pull request adding DT_GNU_HASH support to Zig’s standard library)

Having shaved that yak, I replaced std.DynLib with our copy of ElfDynLib, and ran it again. This time we get:

~/code/dlopen-mesa-with-zig> zig build run
debug: bits found in bloom filter, symbol vkGetInstanceProcAddr may exist
debug: found hash match for 0x53f1e7aa
debug: found symbol "vkGetInstanceProcAddr"
Segmentation fault at address 0x0
???:?:?: 0x0 in ??? (???)

Running it with lldb we can see the segmentation fault occurs in vkGetInstanceProcAddr:

~/code/dlopen-mesa-with-zig> lldb ./zig-out/bin/dlopen-mesa-with-zig
(lldb) target create "./zig-out/bin/dlopen-mesa-with-zig"
Current executable set to '/home/geemili/code/dlopen-mesa-with-zig/zig-out/bin/dlopen-mesa-with-zig' (x86_64).
(lldb) run
Process 3009743 launched: '/home/geemili/code/dlopen-mesa-with-zig/zig-out/bin/dlopen-mesa-with-zig' (x86_64)
debug: bits found in bloom filter, symbol vkGetInstanceProcAddr may exist
debug: found hash match for 0x53f1e7aa
debug: found symbol "vkGetInstanceProcAddr"
debug: vkGetInstanceProcAddr = fn (vk.Instance, [*:0]const u8) callconv(.c) ?*const fn () callconv(.c) void@7ffff7f0e4e0
Process 3009743 stopped
* thread #1, name = 'dlopen-mesa-wit', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x0)
    frame #0: 0x0000000000000000
error: memory read failed for 0x0
(lldb) bt
* thread #1, name = 'dlopen-mesa-wit', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x0)
  * frame #0: 0x0000000000000000
    frame #1: 0x00007ffff7f0e51b
    frame #2: 0x000000000103d7d3 dlopen-mesa-with-zig`vk.BaseWrapper(loader=0x00007ffff7f0e4e0).load__anon_4774 at vk.zig:27075:27
    frame #3: 0x000000000103be42 dlopen-mesa-with-zig`main.main at main.zig:8:64
    frame #4: 0x000000000103bc9b dlopen-mesa-with-zig`start.posixCallMainAndExit [inlined] start.callMain at start.zig:656:37
    frame #5: 0x000000000103bc81 dlopen-mesa-with-zig`start.posixCallMainAndExit [inlined] start.callMainWithArgs at start.zig:616:20
    frame #6: 0x000000000103bc08 dlopen-mesa-with-zig`start.posixCallMainAndExit(argc_argv_ptr=0x00007fffffffe3e0) at start.zig:571:36
    frame #7: 0x000000000103b84e dlopen-mesa-with-zig`start._start at start.zig:271:5
(lldb) fr sel 1
frame #1: 0x00007ffff7f0e51b
->  0x7ffff7f0e51b: testl  %eax, %eax
    0x7ffff7f0e51d: jne    0x7ffff7f0e550
    0x7ffff7f0e51f: movq   0x603e2(%rip), %rbx
    0x7ffff7f0e526: movq   -0x38(%rbp), %rax
(lldb) disassemble -s 0x7ffff7f0e4e0 -e 0x7ffff7f0e51b+16
    0x7ffff7f0e4e0: endbr64 
    0x7ffff7f0e4e4: pushq  %rbp
    0x7ffff7f0e4e5: movq   %rsp, %rbp
    0x7ffff7f0e4e8: pushq  %r15
    0x7ffff7f0e4ea: pushq  %r14
    0x7ffff7f0e4ec: pushq  %r13
    0x7ffff7f0e4ee: leaq   0x3352c(%rip), %r13
    0x7ffff7f0e4f5: pushq  %r12
    0x7ffff7f0e4f7: movq   %rdi, %r12
    0x7ffff7f0e4fa: pushq  %rbx
    0x7ffff7f0e4fb: subq   $0x18, %rsp
    0x7ffff7f0e4ff: movq   %fs:0x28, %rbx
    0x7ffff7f0e508: movq   %rbx, -0x38(%rbp)
    0x7ffff7f0e50c: movq   %rsi, %rbx
    0x7ffff7f0e50f: movq   %r13, %rsi
    0x7ffff7f0e512: movq   %rbx, %rdi
    0x7ffff7f0e515: callq  *0x6055d(%rip)
->  0x7ffff7f0e51b: testl  %eax, %eax
    0x7ffff7f0e51d: jne    0x7ffff7f0e550
    0x7ffff7f0e51f: movq   0x603e2(%rip), %rbx
    0x7ffff7f0e526: movq   -0x38(%rbp), %rax
    0x7ffff7f0e52a: fs     
(lldb) 

vkGetInstanceProcAddr is loaded at 0x7ffff7f0e4e0, and the return pointer in the backtrace points at 0x7ffff7f0e51b.

Opening libvulkan.so.1 with rizin, we can seek to sym.vkGetInstanceProcAddr at 0x000284e0:

~/code/dlopen-mesa-with-zig> rizin /usr/lib/libvulkan.so.1
ERROR: Cannot determine entrypoint, using 0x00007040.
 -- The unix-like reverse engineering framework.
[0x00007040]> aaa
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls
[x] Analyze len bytes of instructions for references
[x] Check for classes
[x] Analyze local variables and arguments
[x] Type matching analysis for all functions
[x] Applied 0 FLIRT signatures via sigdb
[x] Propagate noreturn information
[x] Integrate dwarf function information.
[x] Resolve pointers to data sections
[x] Use -AA or aaaa to perform additional experimental analysis.
[0x00007040]> s sym.vkGetInstanceProcAddr
[0x000284e0]> 

Doing a bit of math we can determine that the instruction we want to see is at:

  • 0x000284e0 + (0x7ffff7f0e51b - 0x7ffff7f0e4e0)
  • 0x000284e0 + 59
  • 0x00028515
[0x000284e0]> pd 20
            ;-- vkGetInstanceProcAddr:
┌ sym.vkGetInstanceProcAddr(int64_t arg1, const char *s1);
│           ; arg int64_t arg1 @ rdi
│           ; arg const char *s1 @ rsi
│           ; var int var_48h @ stack - 0x48
│           ; var int64_t var_40h @ stack - 0x40
│           0x000284e0      endbr64                                    ; RELOC TARGET 64 vkGetInstanceProcAddr @ 0x000284e0
│           0x000284e4      push  rbp
│           0x000284e5      mov   rbp, rsp
│           0x000284e8      push  r15
│           0x000284ea      push  r14
│           0x000284ec      push  r13
│           0x000284ee      lea   r13, [str.vkGetInstanceProcAddr]     ; 0x5ba21 ; "vkGetInstanceProcAddr"
│           0x000284f5      push  r12
│           0x000284f7      mov   r12, rdi                             ; arg1
│           0x000284fa      push  rbx
│           0x000284fb      sub   rsp, 0x18
│           0x000284ff      mov   rbx, qword fs:[0x28]
│           0x00028508      mov   qword [var_40h], rbx
│           0x0002850c      mov   rbx, rsi                             ; arg2
│           0x0002850f      mov   rsi, r13                             ; const char *s2
│           0x00028512      mov   rdi, rbx                             ; const char *s1
│           0x00028515      call  qword [reloc.strcmp]                 ; [reloc.strcmp:8]=0x892b8 reloc.target.strcmp
│           0x0002851b      test  eax, eax
│       ┌─< 0x0002851d      jne   0x28550
│       │   0x0002851f      mov   rbx, qword [reloc.vkGetInstanceProcAddr.88908] ; [0x88908:8]=0x284e0 sym.vkGetInstanceProcAddr

rizin helpfully informs us that 0x00028515 is a call to reloc.target.strcmp.

0x00028515  call  qword [reloc.strcmp]   ; [reloc.strcmp:8]=0x892b8 reloc.target.strcmp

This is no doubt a missing feature of std.DynLib. As far as I can tell it doesn’t have ElfDynLib doesn’t support relocations or recursively loading dynamic libraries.

Anyway, that’s all for now. I’ll pick up from here when/if I get around to it.

10 Likes

Following this adventure with great interest.

For what it’s worth, I didn’t get very far with ElfDynLib because I realized that, while it was quite possible to map dynamic libraries and run them, they all depended on libc-specific behavior, so it kinda just made more sense to link libc if you want shared libraries.

That being said, I think attacking the problem again with the express goal of being able to load libvulkan.so on the 10 most popular Linux distros or something like that, is a fantastic idea.

If you get stuck, I recommend to read musl or glibc source code, since that’s where the dynamic linking logic lives that we are all using today. Specifically, the start code that calls main.

4 Likes

Spent some time on this today; and I got it to the point where it’s recursively loading libraries and applying relocations. I hope I’m doing the relocations correctly, I haven’t made a test program to check. Maybe I should do that.

Currently stuck at handling relocations that have to do with thread local storage:

debug: needed library: 0x1c73 libc.so.6
debug: needed library: 0x869c ld-linux-x86-64.so.2
debug: ld-linux-x86-64.so.2 loaded at 0x712c95391000 - 0x712c953c9310
warning: unhandled relocation(libc.so.6)[0001]: 0x001e7c68 <- "" (0x0)	elf.R_X86_64.TPOFF64	56
warning: unhandled relocation(libc.so.6)[0002]: 0x001e7c70 <- "" (0x0)	elf.R_X86_64.TPOFF64	48
warning: unhandled relocation(libc.so.6)[0003]: 0x001e7c78 <- "" (0x0)	elf.R_X86_64.TPOFF64	72
warning: unhandled relocation(libc.so.6)[0004]: 0x001e7c80 <- "" (0x0)	elf.R_X86_64.TPOFF64	88
warning: unhandled relocation(libc.so.6)[0005]: 0x001e7c88 <- "" (0x0)	elf.R_X86_64.TPOFF64	80
warning: unhandled relocation(libc.so.6)[0006]: 0x001e7c90 <- "" (0x0)	elf.R_X86_64.TPOFF64	100
warning: unhandled relocation(libc.so.6)[0007]: 0x001e7c98 <- "" (0x0)	elf.R_X86_64.TPOFF64	120
warning: unhandled relocation(libc.so.6)[0008]: 0x001e7ca0 <- "" (0x0)	elf.R_X86_64.TPOFF64	128
warning: unhandled relocation(libc.so.6)[0009]: 0x001e7cc0 <- "" (0x0)	elf.R_X86_64.TPOFF64	24
warning: unhandled relocation(libc.so.6)[0010]: 0x001e7cd8 <- "" (0x0)	elf.R_X86_64.TPOFF64	40
warning: unhandled relocation(libc.so.6)[0011]: 0x001e7cf0 <- "" (0x0)	elf.R_X86_64.TPOFF64	16
warning: unhandled relocation(libc.so.6)[0012]: 0x001e7d78 <- "" (0x0)	elf.R_X86_64.TPOFF64	96
warning: unhandled relocation(libc.so.6)[0013]: 0x001e7ef8 <- "" (0x0)	elf.R_X86_64.TPOFF64	0
warning: unhandled relocation(libc.so.6)[0014]: 0x001e7fd8 <- "" (0x0)	elf.R_X86_64.TPOFF64	8
warning: unhandled relocation(libc.so.6)[0015]: 0x001e7ff0 <- "" (0x0)	elf.R_X86_64.TPOFF64	32
warning: unhandled relocation(libc.so.6)[0074]: 0x001e7ed0 <- "__libc_dlerror_result" (0x5f7)	elf.R_X86_64.TPOFF64	0
debug: libc.so.6 loaded at 0x712c95402000 - 0x712c955f3c78
debug: libvulkan.so.1 loaded at 0x712c9580e000 - 0x712c958971b0
debug: vkEnumerateInstanceVersion = fn (*u32) callconv(.c) vk.Result@712c958417a0
General protection exception (no address available)
???:?:?: 0x712c954a47af in ??? (???)
Unwind information for `???:0x712c954a47af` was not available, trace may be incomplete

???:?:?: 0x712c954a964c in ??? (???)
???:?:?: 0x712c9583626e in ??? (???)
???:?:?: 0x712c95842461 in ??? (???)
???:?:?: 0x712c958608f8 in ??? (???)
???:?:?: 0x712c958417f0 in ??? (???)
/home/geemili/code/dlopen-mesa-with-zig/src/main.zig:16:46: 0x1049859 in main (dlopen-mesa-with-zig)
    const result = vkEnumerateInstanceVersion(&version);
                                             ^
/home/geemili/code/zig/build-master/stage3/lib/zig/std/start.zig:656:37: 0x104277a in posixCallMainAndExit (dlopen-mesa-with-zig)
            const result = root.main() catch |err| {
                                    ^
/home/geemili/code/zig/build-master/stage3/lib/zig/std/start.zig:271:5: 0x104232d in _start (dlopen-mesa-with-zig)
    asm volatile (switch (native_arch) {
    ^

Running it in lldb we get some disassembly:

Process 450185 stopped
* thread #1, name = 'dlopen-mesa-wit', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
    frame #0: 0x00007ffff7b847af
->  0x7ffff7b847af: movq   %rcx, %fs:(%rax)
    0x7ffff7b847b3: addq   $0x60, %rcx
    0x7ffff7b847b7: movq   %rcx, %rax
    0x7ffff7b847ba: leaq   0x7f0(%rcx), %rdx
(lldb) 

I don’t understand how thread local storage works at the moment, so I’ll have to come back to this. I probably also need to handle .tdata and .tbss.

I am wondering at this point if the system libc.so.6 could be replaced with a theoretical ziglibc. libvulkan.so.1 itself only uses a couple symbols:

     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __strcat_chk@GLIBC_2.3.4 (2)
     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND getenv@GLIBC_2.2.5 (3)
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __snprintf_chk@GLIBC_2.3.4 (2)
     4: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND dlerror@GLIBC_2.34 (4)
     5: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND free@GLIBC_2.2.5 (3)
     6: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND abort@GLIBC_2.2.5 (3)
     7: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __errno_location@GLIBC_2.2.5 (3)
     8: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strncpy@GLIBC_2.2.5 (3)
     9: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strncmp@GLIBC_2.2.5 (3)
    10: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTable
    11: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND secure_getenv@GLIBC_2.17 (5)
    12: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __isoc23_sscanf@GLIBC_2.38 (6)
    13: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND qsort@GLIBC_2.2.5 (3)
    14: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fread@GLIBC_2.2.5 (3)
    15: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strtod@GLIBC_2.2.5 (3)
    16: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND readlink@GLIBC_2.2.5 (3)
    17: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fclose@GLIBC_2.2.5 (3)
    18: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND opendir@GLIBC_2.2.5 (3)
    19: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strlen@GLIBC_2.2.5 (3)
    20: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __stack_chk_fail@GLIBC_2.4 (7)
    21: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strchr@GLIBC_2.2.5 (3)
    22: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND pthread_mutex_destroy@GLIBC_2.2.5 (3)
    23: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND snprintf@GLIBC_2.2.5 (3)
    24: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strrchr@GLIBC_2.2.5 (3)
    25: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fputs@GLIBC_2.2.5 (3)
    26: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND memset@GLIBC_2.2.5 (3)
    27: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strncat@GLIBC_2.2.5 (3)
    28: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND closedir@GLIBC_2.2.5 (3)
    29: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fputc@GLIBC_2.2.5 (3)
    30: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strcspn@GLIBC_2.2.5 (3)
    31: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strtok_r@GLIBC_2.2.5 (3)
    32: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND calloc@GLIBC_2.2.5 (3)
    33: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strcmp@GLIBC_2.2.5 (3)
    34: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND dlopen@GLIBC_2.34 (4)
    35: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __memmove_chk@GLIBC_2.3.4 (2)
    36: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __memcpy_chk@GLIBC_2.3.4 (2)
    37: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    38: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND memcpy@GLIBC_2.14 (8)
    39: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __isoc23_strtol@GLIBC_2.38 (6)
    40: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fileno@GLIBC_2.2.5 (3)
    41: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND readdir@GLIBC_2.2.5 (3)
    42: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND pthread_mutex_unlock@GLIBC_2.2.5 (3)
    43: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND malloc@GLIBC_2.2.5 (3)
    44: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __vsnprintf_chk@GLIBC_2.3.4 (2)
    45: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __strncpy_chk@GLIBC_2.3.4 (2)
    46: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND realloc@GLIBC_2.2.5 (3)
    47: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND memmove@GLIBC_2.2.5 (3)
    48: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND access@GLIBC_2.2.5 (3)
    49: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fopen@GLIBC_2.2.5 (3)
    50: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND dlsym@GLIBC_2.34 (4)
    51: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __memset_chk@GLIBC_2.3.4 (2)
    52: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __strncat_chk@GLIBC_2.3.4 (2)
    53: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
    54: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strerror@GLIBC_2.2.5 (3)
    55: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND dlclose@GLIBC_2.34 (4)
    56: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND pthread_mutex_init@GLIBC_2.2.5 (3)
    57: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fstat@GLIBC_2.33 (9)
    58: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@GLIBC_2.2.5 (3)
    59: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strstr@GLIBC_2.2.5 (3)
    60: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND pthread_mutex_lock@GLIBC_2.2.5 (3)
    61: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __ctype_tolower_loc@GLIBC_2.3 (10)
    62: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  UND stderr@GLIBC_2.2.5 (3)

Though I guess the drivers that libvulkan.so.1 loads would probably rely on many more symbols.

5 Likes

I can give you an overview of how Thread Local Storage works.

Basically it’s to make threadlocal variables work. This is what it says on the tin - there is one copy of the data per thread. If the data has an initialization value, it needs to be initialized (memcpy from .tdata) when any new thread is created. If the data is zero (.tbss), then it only needs to be memset to zero when a new thread is created, but the data doesn’t need to be stored in the shared object.

Despite this seemingly simple problem, it can’t really be solved without getting the linker involved. If you think about it, it makes sense - threads are created at runtime, and hence you have some dynamic linking to do.

Unfortunately, this is where things start to be tightly coupled with libc. If the application uses pthreads for instance, pthreads is going to have some implementation of accessing .tdata and .tbss and initializing the thread local storage area.

Likewise if you use std.Thread.spawn from Zig, that code has data to initialize the thread-local storage based on examining the process’s own program sections on startup.

I don’t recall all the details about what might go wrong here, but hopefully you can see that there are some bits to coordinate (literally) and if a thread is spawned in one shared object and calls into the other, or vice versa, and they don’t agree on conventions, then the threadlocal storage won’t be initialized properly and probably it will crash.

Anyway, might be worth having a look at the start code in zig that sets up TLS, std.Thread.spawn, std.os.linux.tls, and the equivalent code in glibc and/or musl that you are trying to interact with by virtue of dynamically loading libvulkan.so.

It might be worth constructing a simpler version of what you’re trying to do - a simple C program that spawns a thread, accesses threadlocal storage, and does not do much else. Then turn this into a .so that dynamically links glibc just like libvulkan.so, and try to load it with your code. Next, try having Zig spawn a thread and call into the .so from the Zig thread, and make sure TLS still works. Then vice versa. Should be a way to test your relocations too, before you go full ham on the real world example.

7 Likes