When testing createInstance with GitHub - Snektron/vulkan-zig: Vulkan binding generator for Zig, it returns an error saying LayerNotPresent. I’m not a vulkan specialist, but maybe it is a “normal” error, and not something due to a mistake while dynamically loading it.
I updated the gist. I will integrate your changes dealing with .so files without versym table later today, right now I need another break.
@geemili I updated the gist to handle the no versym case. Let me know if vulkan custom allocators are still needed, and if any additional change is required on your side.
I’ll switch to the EGL/X11 test since it’ll be much faster and easier for me to understand the details of any potential failures. EGL also uses dlopen, so I suspect I’ll run into the same kind of issues as with vulkan.
EDIT: just added a modification to handle the fact that musl doesn’t have a __libc_early_init symbol. I tested loading vulkan using musl’s libc.so and it worked, I got the correct vulkan version.
EDIT: sorry, it appears that libvulkan.so required libc.so.6, so in fact glibc was used. I don’t know how I can properly test musl.
EDIT again: I think I succeeded in using musl libc.so and the vulkan lib you linked earlier. But it is segfaulting in get_random_secret:
I still get a general protection exception in mimalloc using the latest code.
General protection exception (no address available)
???:?:?: 0x729ead0576e5 in _mi_heap_get_free_small_page (../mimalloc/src/mimalloc.c)
Unwind error at address `/lib64/libc.so:0x729ead0576e5` (unwind info unavailable), remaining frames may be incorrect
???:?:?: 0x729ead4cb66a in get_unix_settings_path (../loader/settings.c)
???:?:?: 0x729ead4cbff5 in update_global_loader_settings (../loader/settings.c)
???:?:?: 0x729ead4cee45 in vkEnumerateInstanceVersion (../loader/trampoline.c)
/home/geemili/code/geemili/dynamic_linking_adventures/examples/vulkan_version.zig:28:39: 0x11a35ac in main (vulkan_version.zig)
switch (vkEnumerateInstanceVersion(&vk_version)) {
^
/home/geemili/.local/share/ziglang/0.16.0-dev.1220+95c76b1b4/lib/std/start.zig:696:37: 0x11a3d73 in callMain (std.zig)
const result = root.main() catch |err| {
^
/home/geemili/.local/share/ziglang/0.16.0-dev.1220+95c76b1b4/lib/std/start.zig:237:5: 0x118b9a1 in _start (std.zig)
asm volatile (switch (native_arch) {
^
I did figure out why the reallocation callback was failing, and now I end up with vkCreateInstance returning error_incompatible_driver because dlopen isn’t implemented:
info(dynamic_library_loader): loading: libvulkan.so.1 [/lib64/libvulkan.so.1]
error(dynamic_library_loader): == TODO: SHT_NOTE: .note.gnu.build-id
error(dynamic_library_loader): == TODO: SHT_GNU_HASH: .gnu.hash
error(dynamic_library_loader): == TODO: SHT_FINI_ARRAY: .fini_array
error(dynamic_library_loader): => TODO: PT_PHDR
error(dynamic_library_loader): == TODO: DT_SONAME: 0x1c17
error(dynamic_library_loader): => TODO: DT_FLAGS: 0x8
error(dynamic_library_loader): => TODO: DT_FLAGS_1: 0x1
error(dynamic_library_loader): == TODO: DT_SYMENT_OR_ADDRNUM: 0x18
error(dynamic_library_loader): == TODO: DT_GNU_HASH: 0x2188
error(dynamic_library_loader): => TODO: PT_GNU_STACK
error(dynamic_library_loader): => TODO: PT_NOTE
info(dynamic_library_loader): loading: libc.so [/lib64/libc.so]
error(dynamic_library_loader): == TODO: SHT_NOTE: .note.gnu.build-id
error(dynamic_library_loader): == TODO: SHT_GNU_HASH: .gnu.hash
error(dynamic_library_loader): == TODO: SHT_HASH: .hash
error(dynamic_library_loader): => TODO: PT_PHDR
error(dynamic_library_loader): => TODO: DT_FLAGS: 0x8
error(dynamic_library_loader): => TODO: DT_FLAGS_1: 0x1
error(dynamic_library_loader): == TODO: DT_SYMENT_OR_ADDRNUM: 0x18
error(dynamic_library_loader): == TODO: DT_GNU_HASH: 0x9d48
error(dynamic_library_loader): == TODO: DT_HASH_OR_PPC64_NUM: 0xcd8c
error(dynamic_library_loader): => TODO: PT_GNU_STACK
error(dynamic_library_loader): => TODO: PT_NOTE
info(dynamic_library_loader): _dl_debug_state
info(dynamic_library_loader): _dl_debug_state
info(dynamic_library_loader): calling init function for libc.so at 0x7ae69a198900 (initial address: 0x57900)
info(dynamic_library_loader): calling init function for libvulkan.so.1 at 0x7ae69a5ba39f (initial address: 0x7e39f)
info(dynamic_library_loader): calling 2 init_array functions for libvulkan.so.1 (0x7f750)
info(dynamic_library_loader): calling init_array[0] for libvulkan.so.1 at 0x7ae69a56eaa0 (initial address: 0x32aa0)
info(dynamic_library_loader): calling init_array[1] for libvulkan.so.1 at 0x7ae69a5986a0 (initial address: 0x5c6a0)
info: vkCreateInstance = fn (*const vulkan_create_instance.vulkan.InstanceCreateInfo, *const vulkan_create_instance.vulkan.AllocationCallbacks, **vulkan_create_instance.vulkan.Instance__opaque_28717) callconv(.c) vulkan_create_instance.vulkan.Result@7ae69a5af0b0
info(vulkan): general: Loader Message(0): No valid vk_loader_settings.json file found, no loader settings will be active
info(vulkan): general: Loader Message(0): Searching for implicit layer manifest files
info(vulkan): general: Loader Message(0): In following locations:
info(vulkan): general: Loader Message(0): /etc/xdg/vulkan/implicit_layer.d
info(vulkan): general: Loader Message(0): /etc/vulkan/implicit_layer.d
info(vulkan): general: Loader Message(0): /usr/local/share/vulkan/implicit_layer.d
info(vulkan): general: Loader Message(0): /usr/share/vulkan/implicit_layer.d
info(vulkan): general: Loader Message(0): Found the following files:
info(vulkan): general: Loader Message(0): /usr/share/vulkan/implicit_layer.d/VkLayer_MESA_device_select.json
info(vulkan): general: Loader Message(0): Found manifest file /usr/share/vulkan/implicit_layer.d/VkLayer_MESA_device_select.json (file version 1.0.0)
info(vulkan): general: Loader Message(0): Searching for explicit layer manifest files
info(vulkan): general: Loader Message(0): In following locations:
info(vulkan): general: Loader Message(0): /etc/xdg/vulkan/explicit_layer.d
info(vulkan): general: Loader Message(0): /etc/vulkan/explicit_layer.d
info(vulkan): general: Loader Message(0): /usr/local/share/vulkan/explicit_layer.d
info(vulkan): general: Loader Message(0): /usr/share/vulkan/explicit_layer.d
info(vulkan): general: Loader Message(0): Found the following files:
info(vulkan): general: Loader Message(0): /usr/share/vulkan/explicit_layer.d/VkLayer_MESA_overlay.json
info(vulkan): general: Loader Message(0): /usr/share/vulkan/explicit_layer.d/VkLayer_INTEL_nullhw.json
info(vulkan): general: Loader Message(0): Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_MESA_overlay.json (file version 1.0.0)
info(vulkan): general: Loader Message(0): Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_INTEL_nullhw.json (file version 1.0.0)
info(vulkan): general: Loader Message(0): Searching for driver manifest files
info(vulkan): general: Loader Message(0): In following locations:
info(vulkan): general: Loader Message(0): /etc/xdg/vulkan/icd.d
info(vulkan): general: Loader Message(0): /etc/vulkan/icd.d
info(vulkan): general: Loader Message(0): /usr/local/share/vulkan/icd.d
info(vulkan): general: Loader Message(0): /usr/share/vulkan/icd.d
info(vulkan): general: Loader Message(0): Found the following files:
info(vulkan): general: Loader Message(0): /usr/share/vulkan/icd.d/nouveau_icd.x86_64.json
info(vulkan): general: Loader Message(0): /usr/share/vulkan/icd.d/intel_icd.x86_64.json
info(vulkan): general: Loader Message(0): /usr/share/vulkan/icd.d/virtio_icd.x86_64.json
info(vulkan): general: Loader Message(0): /usr/share/vulkan/icd.d/radeon_icd.x86_64.json
info(vulkan): general: Loader Message(0): /usr/share/vulkan/icd.d/lvp_icd.x86_64.json
info(vulkan): general: Loader Message(0): /usr/share/vulkan/icd.d/intel_hasvk_icd.x86_64.json
info(vulkan): general: Loader Message(0): Found ICD manifest file /usr/share/vulkan/icd.d/nouveau_icd.x86_64.json, version 1.0.1
debug(vulkan): general: Loader Message(0): Searching for ICD drivers named /usr/lib/libvulkan_nouveau.so
warning(dynamic_library_loader): substitutes: unimplemented dlopen called: /usr/lib/libvulkan_nouveau.so, 0x1
error(vulkan): general: Loader Message(0):
error(vulkan): general: Loader Message(0): loader_icd_scan: Failed loading library associated with ICD JSON /usr/lib/libvulkan_nouveau.so. Ignoring this JSON
info(vulkan): general: Loader Message(0): Found ICD manifest file /usr/share/vulkan/icd.d/intel_icd.x86_64.json, version 1.0.1
debug(vulkan): general: Loader Message(0): Searching for ICD drivers named /usr/lib/libvulkan_intel.so
warning(dynamic_library_loader): substitutes: unimplemented dlopen called: /usr/lib/libvulkan_intel.so, 0x1
error(vulkan): general: Loader Message(0):
error(vulkan): general: Loader Message(0): loader_icd_scan: Failed loading library associated with ICD JSON /usr/lib/libvulkan_intel.so. Ignoring this JSON
info(vulkan): general: Loader Message(0): Found ICD manifest file /usr/share/vulkan/icd.d/virtio_icd.x86_64.json, version 1.0.1
debug(vulkan): general: Loader Message(0): Searching for ICD drivers named /usr/lib/libvulkan_virtio.so
warning(dynamic_library_loader): substitutes: unimplemented dlopen called: /usr/lib/libvulkan_virtio.so, 0x1
error(vulkan): general: Loader Message(0):
error(vulkan): general: Loader Message(0): loader_icd_scan: Failed loading library associated with ICD JSON /usr/lib/libvulkan_virtio.so. Ignoring this JSON
info(vulkan): general: Loader Message(0): Found ICD manifest file /usr/share/vulkan/icd.d/radeon_icd.x86_64.json, version 1.0.1
debug(vulkan): general: Loader Message(0): Searching for ICD drivers named /usr/lib/libvulkan_radeon.so
warning(dynamic_library_loader): substitutes: unimplemented dlopen called: /usr/lib/libvulkan_radeon.so, 0x1
error(vulkan): general: Loader Message(0):
error(vulkan): general: Loader Message(0): loader_icd_scan: Failed loading library associated with ICD JSON /usr/lib/libvulkan_radeon.so. Ignoring this JSON
info(vulkan): general: Loader Message(0): Found ICD manifest file /usr/share/vulkan/icd.d/lvp_icd.x86_64.json, version 1.0.1
debug(vulkan): general: Loader Message(0): Searching for ICD drivers named /usr/lib/libvulkan_lvp.so
warning(dynamic_library_loader): substitutes: unimplemented dlopen called: /usr/lib/libvulkan_lvp.so, 0x1
error(vulkan): general: Loader Message(0):
error(vulkan): general: Loader Message(0): loader_icd_scan: Failed loading library associated with ICD JSON /usr/lib/libvulkan_lvp.so. Ignoring this JSON
info(vulkan): general: Loader Message(0): Found ICD manifest file /usr/share/vulkan/icd.d/intel_hasvk_icd.x86_64.json, version 1.0.1
debug(vulkan): general: Loader Message(0): Searching for ICD drivers named /usr/lib/libvulkan_intel_hasvk.so
warning(dynamic_library_loader): substitutes: unimplemented dlopen called: /usr/lib/libvulkan_intel_hasvk.so, 0x1
error(vulkan): general: Loader Message(0):
error(vulkan): general: Loader Message(0): loader_icd_scan: Failed loading library associated with ICD JSON /usr/lib/libvulkan_intel_hasvk.so. Ignoring this JSON
error(vulkan): general: Loader Message(0): vkCreateInstance: Found no drivers!
info: error creating vulkan instance: error_incompatible_driver
error: VkCreateInstanceFailed
/home/geemili/code/geemili/dynamic_linking_adventures/examples/vulkan_create_instance.zig:68:13: 0x11a11f3 in main (vulkan_create_instance.zig)
return error.VkCreateInstanceFailed;
^
run:vulkan_create_instance
└─ run exe vulkan_create_instance failure
error: process exited with error code 1
failed command: ./.zig-cache/o/28691b72e25e0b5370413ccf505aa838/vulkan_create_instance
Build Summary: 1/3 steps succeeded (1 failed)
run:vulkan_create_instance transitive failure
└─ run exe vulkan_create_instance failure
error: the following build command failed with exit code 1:
.zig-cache/o/41f59f69053bb7a7e2e8ec82e6863023/build /home/geemili/.local/share/ziglang/0.16.0-dev.1220+95c76b1b4/zig /home/geemili/.local/share/ziglang/0.16.0-dev.1220+95c76b1b4/lib /home/geemili/code/geemili/dynamic_linking_adventures .zig-cache /home/geemili/.cache/zig --seed 0x4983a16a -Z961077f2d340ff06 run:vulkan_create_instance --color on
You could install a virtual machine and/or use something like Distrobox.
Seems likely that this section of code is relevant (ldso/dynlink.c:1768):
/* Stage 2b sets up a valid thread pointer, which requires relocations
* completed in stage 2, and on which stage 3 is permitted to depend.
* This is done as a separate stage, with symbolic lookup as a barrier,
* so that loads of the thread pointer and &errno can be pure/const and
* thereby hoistable. */
void __dls2b(size_t *sp, size_t *auxv)
{
/* Setup early thread pointer in builtin_tls for ldso/libc itself to
* use during dynamic linking. If possible it will also serve as the
* thread pointer at runtime. */
search_vec(auxv, &__hwcap, AT_HWCAP);
libc.auxv = auxv;
libc.tls_size = sizeof builtin_tls;
libc.tls_align = tls_align;
if (__init_tp(__copy_tls((void *)builtin_tls)) < 0) {
a_crash();
}
struct symdef dls3_def = find_sym(&ldso, "__dls3", 0);
if (DL_FDPIC) ((stage3_func)&ldso.funcdescs[dls3_def.sym-ldso.syms])(sp, auxv);
else ((stage3_func)laddr(&ldso, dls3_def.sym->st_value))(sp, auxv);
}
You’re right, libc.auxv seems unitialized when entering get_random_secret. I need to understand how and when it is called, to be able to call it manually if necessary.
Yes, but since we are the dynamic loader here, we cannot call the real ld. We basically need to do the minimal amount of work it does to end up with a working libc. For instance, for glibc I added this in the loader:
Well… I see no non-hacky way of initializing the libc structure.
I think I will sleep on it. The more I think about it, the more I’m convinced that your initial idea of implementing a good subset of libc in zig land is the way to go. If we can load a dynamic library, relocate it fully, map its TLS block correctly, all that in pure Zig, then maybe skipping loading libc when it is a dependency and providing needed symbols from zig is the most elegant solution (I saw on your repo that you were heading toward it). At least it feels less like a dirty hack and more portable (as in less dependent on the specific libc required), which was my initial goal. I got distracted by the good feeling that believing I was conquering libc gave me.
So, I think the next step for me will be: “this undefined symbol should come from libc, here is its Zig implementation.” Hopefully, I will come up with an idea to avoid having to embed a full libc in a static executable that wants to load dynamic libraries… And I still have this proof of concept that seems to work with glibc.
This topic is crazy fascinating to me, as someone who digs stuff like this and OS kernels and such. Is there a chance we could somehow (probably way down the road) do something like this for Windows? I know that it’s possible to develop apps targeting it without any need for the windows C runtime, it’s just extremely painful to actually do (which, really, is more a problem of the Win32 API than it is of anything else). More asking this because I am primarily a Windows user out of necessity. Well, and I’m curious. We obviously couldn’t replicate the Windows library logic entirely, since it’s closed-source and all, but we could use LoadLibrary and such and delegate to it. I’m definitely following this topic – it’ll be really cool to find out where this goes!
I’m afraid I lack the required technical knowledge to provide accurate information, as I’m not a windows user. IIUC there’s no such thing as a truly static executable on Windows, right? I was under the impression that cross-compiling and distributing for windows was far easier than distributing software that works across all linux distributions.
Are there cases where it’s not possible to use LoadLibrary? Or maybe I’m missing something?
diff --git a/original/musl-1.2.5/ldso/dynlink.c b/musl-1.2.5/ldso/dynlink.c
index 324aa85..bfc5140 100644
--- a/original/musl-1.2.5/ldso/dynlink.c
+++ b/musl-1.2.5/ldso/dynlink.c
@@ -1767,6 +1767,14 @@ hidden void __dls2(unsigned char *base, size_t *sp)
else ((stage3_func)laddr(&ldso, dls2b_def.sym->st_value))(sp, auxv);
}
+extern void __pre_dls2b(size_t *auxv)
+{
+ search_vec(auxv, &__hwcap, AT_HWCAP);
+ libc.auxv = auxv;
+ libc.tls_size = sizeof builtin_tls;
+ libc.tls_align = tls_align;
+}
+
/* Stage 2b sets up a valid thread pointer, which requires relocations
* completed in stage 2, and on which stage 3 is permitted to depend.
* This is done as a separate stage, with symbolic lookup as a barrier,
@@ -1775,13 +1783,11 @@ hidden void __dls2(unsigned char *base, size_t *sp)
void __dls2b(size_t *sp, size_t *auxv)
{
+ __pre_dls2b(auxv);
+
/* Setup early thread pointer in builtin_tls for ldso/libc itself to
* use during dynamic linking. If possible it will also serve as the
* thread pointer at runtime. */
- search_vec(auxv, &__hwcap, AT_HWCAP);
- libc.auxv = auxv;
- libc.tls_size = sizeof builtin_tls;
- libc.tls_align = tls_align;
if (__init_tp(__copy_tls((void *)builtin_tls)) < 0) {
a_crash();
}
And I confirmed that calling this function after loading the library allowed me to get the correct vulkan version and reach the dlopen calls in libvulkan.so.1. This is obviously not a good solution, as it requires an upstream change, but I thought you might be interested. Since the resulting libc.so has no dependencies, it can almost be considered “shippable” (it is a binary, so not really).
I updated the gist to include the call if the __pre_dls2b symbol is found, and a correction to accommodate the fact that musl does not have initial TLS data.
@geemili, I also created a github repository with the POC organized as a project (the gist will stay as it is from now on). Collaboration to make it work on musl and to implement libc’s functions would be greatly appreciated, but there’s absolutely no obligation.
No, you can make a static binary on Windows. And yes, we can use the Windows library loader. Even if the logic is opaque. Making a library without libc is annoying but doable (I think we would need to include a couple symbols for stack probes and such) but other than that the win32 API can do mostly everything libc provides and then some. So it would just come down to have the Zig stdlib call only win32 APIs and never, ever call into libc. Windows does have some weird problems and conventions that msvcrt does normally take care of but I think we can sidestep those? I.e., it handles WinMain/main distinctions and all that.
Again, tested only on my machine with my glibc version, but still. The std.Io.sleep call, which uses the zig initial TLS area, works with both the LLVM and the self-hosted backends.
I decided to defer the dlopen replacement implementation for now, as I often hit bugs that are hard to debug.
Next step is starting zig threads, which implies cooperation with the std.Thread.LinuxThreadImpl.spawn implementation (because the pthread struct wants to be at TP, but the TLS area descriptor knows nothing about it).
EDIT:
screenshot of the window appearing, for the glory, and to promote the nnd debugger which was so useful
That’s because on Windows libc is implemented cleanly on top of the Win32 API (and not the other way around or a mix like quite a few UNIX systems do stuff).
I can now load libEGL.so and libGL.so. That means I can send OpenGL commands to an X11 window (itself coming from a dynamically loaded libX11.so.6). I can even load a libraylib.so file downloaded from github and start making a static non libc executable game libraylib.so will still use libc, though.
Here is a curated list of what new things it took:
Intercepting and executing dl functions in zig, like dlopen, dlsym, dladdr, etc. (all these functions have an equivalent in the lib anyway)
Implementing TLSDESC relocations, and the related resolver with its unique calling convention
Discovering that C++ unwinding calls non-public ld functions, like _dl_find_object or _dl_find_dso_for_object, so they need an implementation too
Understanding that changing the thread pointer (the fs_base) to implement static area resizing will not work reliably when pthread is involved, because it stores and uses the absolute value of the thread pointer in quite a lot of places; so the first time the TLS area is restructured (before any thread is created), a surplus is requested to make space for future libraries static TLS data (note that the pedantic behavior here should really be to refuse to load a dynamic library that has static TLS)
Understanding that IFUNC symbols are not always subject to an IRELATIVE relocation, so they need to be resolved when referenced
Making a “bridge” between pthread functions and std.Thread functions; for instance, pthread_create calls std.Thread.spawn (it is still incomplete as only pthread_create and pthread_join are implemented)
… and many other details that appear non relevant until you crash at a random memory address.
The pthread bridging thing was a deliberate decision: because of how TLS works, spawning threads cannot be done by two different actors. I chose to put zig in charge. I have to say that I feel at least half of libc’s complexity comes from threads and TLS, and the opinion that more of it should be handled by a monolithic kernel (like linux) grows stronger every day. This strange relationship between the kernel and libc is a bad idea IMO, and maybe I’m starting to understand what @KilianHanich says:
So far, patching the zig standard library hasn’t been needed, which is good news.
Side note: libLLVM.so.19.1 is really big… For the raylib example, it’s like 70% of the loading time (~50 libs are loaded).
@geemili, I’d like to focus more on musl-based systems now, so I will set up a Chimera machine and see where it goes. In the meantime, feel free to test the vulkan_musl and vulkan_advanced_musl examples from the repo.