Dynamic linking without libc adventures

Wow!

$ zig run src/test.zig
...
info: vulkan version = 4211016
$ echo $?
0

Amazing :slight_smile: I almost gave up when I felt I was near the “brute force programming” zone… But then it clicked.

@geemili is this version number coherent ?

EDIT: it is: 1.4.328

When testing createInstance with GitHub - Snektron/vulkan-zig: Vulkan binding generator for Zig, it returns an error saying LayerNotPresent. I’m not a vulkan specialist, but maybe it is a “normal” error, and not something due to a mistake while dynamically loading it.

I updated the gist. I will integrate your changes dealing with .so files without versym table later today, right now I need another break.

4 Likes

@geemili I updated the gist to handle the no versym case. Let me know if vulkan custom allocators are still needed, and if any additional change is required on your side.

I’ll switch to the EGL/X11 test since it’ll be much faster and easier for me to understand the details of any potential failures. EGL also uses dlopen, so I suspect I’ll run into the same kind of issues as with vulkan.

EDIT: just added a modification to handle the fact that musl doesn’t have a __libc_early_init symbol. I tested loading vulkan using musl’s libc.so and it worked, I got the correct vulkan version.

EDIT: sorry, it appears that libvulkan.so required libc.so.6, so in fact glibc was used. I don’t know how I can properly test musl.

EDIT again: I think I succeeded in using musl libc.so and the vulkan lib you linked earlier. But it is segfaulting in get_random_secret:


(sorry for the screenshot)

I’m nerd sniped again… I will download the musl sources and try to understand what is happening

I still get a general protection exception in mimalloc using the latest code.

General protection exception (no address available)
???:?:?: 0x729ead0576e5 in _mi_heap_get_free_small_page (../mimalloc/src/mimalloc.c)
Unwind error at address `/lib64/libc.so:0x729ead0576e5` (unwind info unavailable), remaining frames may be incorrect
???:?:?: 0x729ead4cb66a in get_unix_settings_path (../loader/settings.c)
???:?:?: 0x729ead4cbff5 in update_global_loader_settings (../loader/settings.c)
???:?:?: 0x729ead4cee45 in vkEnumerateInstanceVersion (../loader/trampoline.c)
/home/geemili/code/geemili/dynamic_linking_adventures/examples/vulkan_version.zig:28:39: 0x11a35ac in main (vulkan_version.zig)
    switch (vkEnumerateInstanceVersion(&vk_version)) {
                                      ^
/home/geemili/.local/share/ziglang/0.16.0-dev.1220+95c76b1b4/lib/std/start.zig:696:37: 0x11a3d73 in callMain (std.zig)
            const result = root.main() catch |err| {
                                    ^
/home/geemili/.local/share/ziglang/0.16.0-dev.1220+95c76b1b4/lib/std/start.zig:237:5: 0x118b9a1 in _start (std.zig)
    asm volatile (switch (native_arch) {
    ^

I only see two uses of _mi_heap_get_free_small_page in mimalloc’s codebase: src/alloc.c:147 and src/alloc-aligned.c:189.

I did figure out why the reallocation callback was failing, and now I end up with vkCreateInstance returning error_incompatible_driver because dlopen isn’t implemented:

info(dynamic_library_loader): loading: libvulkan.so.1 [/lib64/libvulkan.so.1]
error(dynamic_library_loader):     == TODO: SHT_NOTE: .note.gnu.build-id
error(dynamic_library_loader):     == TODO: SHT_GNU_HASH: .gnu.hash
error(dynamic_library_loader):     == TODO: SHT_FINI_ARRAY: .fini_array
error(dynamic_library_loader):     => TODO: PT_PHDR
error(dynamic_library_loader):         == TODO: DT_SONAME: 0x1c17
error(dynamic_library_loader):         => TODO: DT_FLAGS: 0x8
error(dynamic_library_loader):         => TODO: DT_FLAGS_1: 0x1
error(dynamic_library_loader):         == TODO: DT_SYMENT_OR_ADDRNUM: 0x18
error(dynamic_library_loader):         == TODO: DT_GNU_HASH: 0x2188
error(dynamic_library_loader):     => TODO: PT_GNU_STACK
error(dynamic_library_loader):     => TODO: PT_NOTE
info(dynamic_library_loader): loading: libc.so [/lib64/libc.so]
error(dynamic_library_loader):     == TODO: SHT_NOTE: .note.gnu.build-id
error(dynamic_library_loader):     == TODO: SHT_GNU_HASH: .gnu.hash
error(dynamic_library_loader):     == TODO: SHT_HASH: .hash
error(dynamic_library_loader):     => TODO: PT_PHDR
error(dynamic_library_loader):         => TODO: DT_FLAGS: 0x8
error(dynamic_library_loader):         => TODO: DT_FLAGS_1: 0x1
error(dynamic_library_loader):         == TODO: DT_SYMENT_OR_ADDRNUM: 0x18
error(dynamic_library_loader):         == TODO: DT_GNU_HASH: 0x9d48
error(dynamic_library_loader):         == TODO: DT_HASH_OR_PPC64_NUM: 0xcd8c
error(dynamic_library_loader):     => TODO: PT_GNU_STACK
error(dynamic_library_loader):     => TODO: PT_NOTE
info(dynamic_library_loader): _dl_debug_state
info(dynamic_library_loader): _dl_debug_state
info(dynamic_library_loader): calling init function for libc.so at 0x7ae69a198900 (initial address: 0x57900)
info(dynamic_library_loader): calling init function for libvulkan.so.1 at 0x7ae69a5ba39f (initial address: 0x7e39f)
info(dynamic_library_loader): calling 2 init_array functions for libvulkan.so.1 (0x7f750)
info(dynamic_library_loader): calling init_array[0] for libvulkan.so.1 at 0x7ae69a56eaa0 (initial address: 0x32aa0)
info(dynamic_library_loader): calling init_array[1] for libvulkan.so.1 at 0x7ae69a5986a0 (initial address: 0x5c6a0)
info: vkCreateInstance = fn (*const vulkan_create_instance.vulkan.InstanceCreateInfo, *const vulkan_create_instance.vulkan.AllocationCallbacks, **vulkan_create_instance.vulkan.Instance__opaque_28717) callconv(.c) vulkan_create_instance.vulkan.Result@7ae69a5af0b0
info(vulkan): general: Loader Message(0): No valid vk_loader_settings.json file found, no loader settings will be active
info(vulkan): general: Loader Message(0): Searching for implicit layer manifest files
info(vulkan): general: Loader Message(0):    In following locations:
info(vulkan): general: Loader Message(0):       /etc/xdg/vulkan/implicit_layer.d
info(vulkan): general: Loader Message(0):       /etc/vulkan/implicit_layer.d
info(vulkan): general: Loader Message(0):       /usr/local/share/vulkan/implicit_layer.d
info(vulkan): general: Loader Message(0):       /usr/share/vulkan/implicit_layer.d
info(vulkan): general: Loader Message(0):    Found the following files:
info(vulkan): general: Loader Message(0):       /usr/share/vulkan/implicit_layer.d/VkLayer_MESA_device_select.json
info(vulkan): general: Loader Message(0): Found manifest file /usr/share/vulkan/implicit_layer.d/VkLayer_MESA_device_select.json (file version 1.0.0)
info(vulkan): general: Loader Message(0): Searching for explicit layer manifest files
info(vulkan): general: Loader Message(0):    In following locations:
info(vulkan): general: Loader Message(0):       /etc/xdg/vulkan/explicit_layer.d
info(vulkan): general: Loader Message(0):       /etc/vulkan/explicit_layer.d
info(vulkan): general: Loader Message(0):       /usr/local/share/vulkan/explicit_layer.d
info(vulkan): general: Loader Message(0):       /usr/share/vulkan/explicit_layer.d
info(vulkan): general: Loader Message(0):    Found the following files:
info(vulkan): general: Loader Message(0):       /usr/share/vulkan/explicit_layer.d/VkLayer_MESA_overlay.json
info(vulkan): general: Loader Message(0):       /usr/share/vulkan/explicit_layer.d/VkLayer_INTEL_nullhw.json
info(vulkan): general: Loader Message(0): Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_MESA_overlay.json (file version 1.0.0)
info(vulkan): general: Loader Message(0): Found manifest file /usr/share/vulkan/explicit_layer.d/VkLayer_INTEL_nullhw.json (file version 1.0.0)
info(vulkan): general: Loader Message(0): Searching for driver manifest files
info(vulkan): general: Loader Message(0):    In following locations:
info(vulkan): general: Loader Message(0):       /etc/xdg/vulkan/icd.d
info(vulkan): general: Loader Message(0):       /etc/vulkan/icd.d
info(vulkan): general: Loader Message(0):       /usr/local/share/vulkan/icd.d
info(vulkan): general: Loader Message(0):       /usr/share/vulkan/icd.d
info(vulkan): general: Loader Message(0):    Found the following files:
info(vulkan): general: Loader Message(0):       /usr/share/vulkan/icd.d/nouveau_icd.x86_64.json
info(vulkan): general: Loader Message(0):       /usr/share/vulkan/icd.d/intel_icd.x86_64.json
info(vulkan): general: Loader Message(0):       /usr/share/vulkan/icd.d/virtio_icd.x86_64.json
info(vulkan): general: Loader Message(0):       /usr/share/vulkan/icd.d/radeon_icd.x86_64.json
info(vulkan): general: Loader Message(0):       /usr/share/vulkan/icd.d/lvp_icd.x86_64.json
info(vulkan): general: Loader Message(0):       /usr/share/vulkan/icd.d/intel_hasvk_icd.x86_64.json
info(vulkan): general: Loader Message(0): Found ICD manifest file /usr/share/vulkan/icd.d/nouveau_icd.x86_64.json, version 1.0.1
debug(vulkan): general: Loader Message(0): Searching for ICD drivers named /usr/lib/libvulkan_nouveau.so
warning(dynamic_library_loader): substitutes: unimplemented dlopen called: /usr/lib/libvulkan_nouveau.so, 0x1
error(vulkan): general: Loader Message(0): 
error(vulkan): general: Loader Message(0): loader_icd_scan: Failed loading library associated with ICD JSON /usr/lib/libvulkan_nouveau.so. Ignoring this JSON
info(vulkan): general: Loader Message(0): Found ICD manifest file /usr/share/vulkan/icd.d/intel_icd.x86_64.json, version 1.0.1
debug(vulkan): general: Loader Message(0): Searching for ICD drivers named /usr/lib/libvulkan_intel.so
warning(dynamic_library_loader): substitutes: unimplemented dlopen called: /usr/lib/libvulkan_intel.so, 0x1
error(vulkan): general: Loader Message(0): 
error(vulkan): general: Loader Message(0): loader_icd_scan: Failed loading library associated with ICD JSON /usr/lib/libvulkan_intel.so. Ignoring this JSON
info(vulkan): general: Loader Message(0): Found ICD manifest file /usr/share/vulkan/icd.d/virtio_icd.x86_64.json, version 1.0.1
debug(vulkan): general: Loader Message(0): Searching for ICD drivers named /usr/lib/libvulkan_virtio.so
warning(dynamic_library_loader): substitutes: unimplemented dlopen called: /usr/lib/libvulkan_virtio.so, 0x1
error(vulkan): general: Loader Message(0): 
error(vulkan): general: Loader Message(0): loader_icd_scan: Failed loading library associated with ICD JSON /usr/lib/libvulkan_virtio.so. Ignoring this JSON
info(vulkan): general: Loader Message(0): Found ICD manifest file /usr/share/vulkan/icd.d/radeon_icd.x86_64.json, version 1.0.1
debug(vulkan): general: Loader Message(0): Searching for ICD drivers named /usr/lib/libvulkan_radeon.so
warning(dynamic_library_loader): substitutes: unimplemented dlopen called: /usr/lib/libvulkan_radeon.so, 0x1
error(vulkan): general: Loader Message(0): 
error(vulkan): general: Loader Message(0): loader_icd_scan: Failed loading library associated with ICD JSON /usr/lib/libvulkan_radeon.so. Ignoring this JSON
info(vulkan): general: Loader Message(0): Found ICD manifest file /usr/share/vulkan/icd.d/lvp_icd.x86_64.json, version 1.0.1
debug(vulkan): general: Loader Message(0): Searching for ICD drivers named /usr/lib/libvulkan_lvp.so
warning(dynamic_library_loader): substitutes: unimplemented dlopen called: /usr/lib/libvulkan_lvp.so, 0x1
error(vulkan): general: Loader Message(0): 
error(vulkan): general: Loader Message(0): loader_icd_scan: Failed loading library associated with ICD JSON /usr/lib/libvulkan_lvp.so. Ignoring this JSON
info(vulkan): general: Loader Message(0): Found ICD manifest file /usr/share/vulkan/icd.d/intel_hasvk_icd.x86_64.json, version 1.0.1
debug(vulkan): general: Loader Message(0): Searching for ICD drivers named /usr/lib/libvulkan_intel_hasvk.so
warning(dynamic_library_loader): substitutes: unimplemented dlopen called: /usr/lib/libvulkan_intel_hasvk.so, 0x1
error(vulkan): general: Loader Message(0): 
error(vulkan): general: Loader Message(0): loader_icd_scan: Failed loading library associated with ICD JSON /usr/lib/libvulkan_intel_hasvk.so. Ignoring this JSON
error(vulkan): general: Loader Message(0): vkCreateInstance: Found no drivers!
info: error creating vulkan instance: error_incompatible_driver
error: VkCreateInstanceFailed
/home/geemili/code/geemili/dynamic_linking_adventures/examples/vulkan_create_instance.zig:68:13: 0x11a11f3 in main (vulkan_create_instance.zig)
            return error.VkCreateInstanceFailed;
            ^
run:vulkan_create_instance
└─ run exe vulkan_create_instance failure
error: process exited with error code 1
failed command: ./.zig-cache/o/28691b72e25e0b5370413ccf505aa838/vulkan_create_instance

Build Summary: 1/3 steps succeeded (1 failed)
run:vulkan_create_instance transitive failure
└─ run exe vulkan_create_instance failure

error: the following build command failed with exit code 1:
.zig-cache/o/41f59f69053bb7a7e2e8ec82e6863023/build /home/geemili/.local/share/ziglang/0.16.0-dev.1220+95c76b1b4/zig /home/geemili/.local/share/ziglang/0.16.0-dev.1220+95c76b1b4/lib /home/geemili/code/geemili/dynamic_linking_adventures .zig-cache /home/geemili/.cache/zig --seed 0x4983a16a -Z961077f2d340ff06 run:vulkan_create_instance --color on

You could install a virtual machine and/or use something like Distrobox.

In case you missed it :slight_smile:

Here’s the source for get_random_secret:

static inline uint64_t get_random_secret()
{
	uint64_t secret = (uintptr_t)&secret * 1103515245;
	for (size_t i=0; libc.auxv[i]; i+=2)
		if (libc.auxv[i]==AT_RANDOM)
			memcpy(&secret, (char *)libc.auxv[i+1]+8, sizeof secret);
	return secret;
}

Seems likely that this section of code is relevant (ldso/dynlink.c:1768):

/* Stage 2b sets up a valid thread pointer, which requires relocations
 * completed in stage 2, and on which stage 3 is permitted to depend.
 * This is done as a separate stage, with symbolic lookup as a barrier,
 * so that loads of the thread pointer and &errno can be pure/const and
 * thereby hoistable. */

void __dls2b(size_t *sp, size_t *auxv)
{
	/* Setup early thread pointer in builtin_tls for ldso/libc itself to
	 * use during dynamic linking. If possible it will also serve as the
	 * thread pointer at runtime. */
	search_vec(auxv, &__hwcap, AT_HWCAP);
	libc.auxv = auxv;
	libc.tls_size = sizeof builtin_tls;
	libc.tls_align = tls_align;
	if (__init_tp(__copy_tls((void *)builtin_tls)) < 0) {
		a_crash();
	}

	struct symdef dls3_def = find_sym(&ldso, "__dls3", 0);
	if (DL_FDPIC) ((stage3_func)&ldso.funcdescs[dls3_def.sym-ldso.syms])(sp, auxv);
	else ((stage3_func)laddr(&ldso, dls3_def.sym->st_value))(sp, auxv);
}

You’re right, libc.auxv seems unitialized when entering get_random_secret. I need to understand how and when it is called, to be able to call it manually if necessary.

Is src/env/__libc_start_main.c relevant?

EDIT: Hmm, I think ldso/dlstart.c might be more relevant.

Yes, but since we are the dynamic loader here, we cannot call the real ld. We basically need to do the minimal amount of work it does to end up with a working libc. For instance, for glibc I added this in the loader:

    if (is_libc_so) {
        const dl_tls_static_size: *usize = @ptrFromInt(rtld_global_ro.address + 704);
        const dl_tls_static_align: *usize = @ptrFromInt(rtld_global_ro.address + 712);
        dl_tls_static_size.* = new_tls_area_desc.block.size;
        dl_tls_static_align.* = new_tls_area_desc.alignment;
    }

(values are wrong but it is prototyping, so…)

I found this searching for the definition of the libc struct in musl:

struct __libc {
	char can_do_threads;
	char threaded;
	char secure;
	volatile signed char need_locks;
	int threads_minus_1;
	size_t *auxv;
	struct tls_module *tls_head;
	size_t tls_size, tls_align, tls_cnt;
	size_t page_size;
	struct __locale_struct global_locale;
};

#ifndef PAGE_SIZE
#define PAGE_SIZE libc.page_size
#endif

extern hidden struct __libc __libc;
#define libc __libc

I will try to populate auxv and see where it goes.

(it is crazy how musl source code is wayyy more discoverable than glibc…)

This is effectively the function I would like to call :slight_smile:

void __init_libc(char **envp, char *pn)

Well… I see no non-hacky way of initializing the libc structure.

I think I will sleep on it. The more I think about it, the more I’m convinced that your initial idea of implementing a good subset of libc in zig land is the way to go. If we can load a dynamic library, relocate it fully, map its TLS block correctly, all that in pure Zig, then maybe skipping loading libc when it is a dependency and providing needed symbols from zig is the most elegant solution (I saw on your repo that you were heading toward it). At least it feels less like a dirty hack and more portable (as in less dependent on the specific libc required), which was my initial goal. I got distracted by the good feeling that believing I was conquering libc gave me.

So, I think the next step for me will be: “this undefined symbol should come from libc, here is its Zig implementation.” Hopefully, I will come up with an idea to avoid having to embed a full libc in a static executable that wants to load dynamic libraries… And I still have this proof of concept that seems to work with glibc.

Thanks for your help :slight_smile:

1 Like

This topic is crazy fascinating to me, as someone who digs stuff like this and OS kernels and such. Is there a chance we could somehow (probably way down the road) do something like this for Windows? I know that it’s possible to develop apps targeting it without any need for the windows C runtime, it’s just extremely painful to actually do (which, really, is more a problem of the Win32 API than it is of anything else). More asking this because I am primarily a Windows user out of necessity. Well, and I’m curious. We obviously couldn’t replicate the Windows library logic entirely, since it’s closed-source and all, but we could use LoadLibrary and such and delegate to it. I’m definitely following this topic – it’ll be really cool to find out where this goes!

I’m afraid I lack the required technical knowledge to provide accurate information, as I’m not a windows user. IIUC there’s no such thing as a truly static executable on Windows, right? I was under the impression that cross-compiling and distributing for windows was far easier than distributing software that works across all linux distributions.
Are there cases where it’s not possible to use LoadLibrary? Or maybe I’m missing something?

Just a last update concerning musl.

I recompiled musl with this addition:

diff --git a/original/musl-1.2.5/ldso/dynlink.c b/musl-1.2.5/ldso/dynlink.c
index 324aa85..bfc5140 100644
--- a/original/musl-1.2.5/ldso/dynlink.c
+++ b/musl-1.2.5/ldso/dynlink.c
@@ -1767,6 +1767,14 @@ hidden void __dls2(unsigned char *base, size_t *sp)
        else ((stage3_func)laddr(&ldso, dls2b_def.sym->st_value))(sp, auxv);
 }
 
+extern void __pre_dls2b(size_t *auxv)
+{
+       search_vec(auxv, &__hwcap, AT_HWCAP);
+       libc.auxv = auxv;
+       libc.tls_size = sizeof builtin_tls;
+       libc.tls_align = tls_align;
+}
+
 /* Stage 2b sets up a valid thread pointer, which requires relocations
  * completed in stage 2, and on which stage 3 is permitted to depend.
  * This is done as a separate stage, with symbolic lookup as a barrier,
@@ -1775,13 +1783,11 @@ hidden void __dls2(unsigned char *base, size_t *sp)
 
 void __dls2b(size_t *sp, size_t *auxv)
 {
+       __pre_dls2b(auxv);
+
        /* Setup early thread pointer in builtin_tls for ldso/libc itself to
         * use during dynamic linking. If possible it will also serve as the
         * thread pointer at runtime. */
-       search_vec(auxv, &__hwcap, AT_HWCAP);
-       libc.auxv = auxv;
-       libc.tls_size = sizeof builtin_tls;
-       libc.tls_align = tls_align;
        if (__init_tp(__copy_tls((void *)builtin_tls)) < 0) {
                a_crash();
        }

And I confirmed that calling this function after loading the library allowed me to get the correct vulkan version and reach the dlopen calls in libvulkan.so.1. This is obviously not a good solution, as it requires an upstream change, but I thought you might be interested. Since the resulting libc.so has no dependencies, it can almost be considered “shippable” (it is a binary, so not really).

I updated the gist to include the call if the __pre_dls2b symbol is found, and a correction to accommodate the fact that musl does not have initial TLS data.

@geemili, I also created a github repository with the POC organized as a project (the gist will stay as it is from now on). Collaboration to make it work on musl and to implement libc’s functions would be greatly appreciated, but there’s absolutely no obligation.

2 Likes

No, you can make a static binary on Windows. And yes, we can use the Windows library loader. Even if the logic is opaque. Making a library without libc is annoying but doable (I think we would need to include a couple symbols for stack probes and such) but other than that the win32 API can do mostly everything libc provides and then some. So it would just come down to have the Zig stdlib call only win32 APIs and never, ever call into libc. Windows does have some weird problems and conventions that msvcrt does normally take care of but I think we can sidestep those? I.e., it handles WinMain/main distinctions and all that.

I’ll just note that we already have full support for producing libc-less binaries for Windows. In fact, we don’t link libc by default.

2 Likes

I can now open an X11 window from a non-libc statically linked executable :slight_smile:

Here is the interesting part:


    const lib_x11 = try dll.load(allocator, "libX11.so.6");

    const xOpenDisplay: *Xlib.XOpenDisplay = @ptrFromInt((try lib_x11.getSymbol("XOpenDisplay")).addr);
    const xDefaultScreen: *Xlib.XDefaultScreen = @ptrFromInt((try lib_x11.getSymbol("XDefaultScreen")).addr);
    const xRootWindow: *Xlib.XRootWindow = @ptrFromInt((try lib_x11.getSymbol("XRootWindow")).addr);
    const xCreateSimpleWindow: *Xlib.XCreateSimpleWindow = @ptrFromInt((try lib_x11.getSymbol("XCreateSimpleWindow")).addr);
    const xBlackPixel: *Xlib.XBlackPixel = @ptrFromInt((try lib_x11.getSymbol("XBlackPixel")).addr);
    const xWhitePixel: *Xlib.XWhitePixel = @ptrFromInt((try lib_x11.getSymbol("XWhitePixel")).addr);
    const xMapWindow: *Xlib.XMapWindow = @ptrFromInt((try lib_x11.getSymbol("XMapWindow")).addr);
    const xFlush: *Xlib.XFlush = @ptrFromInt((try lib_x11.getSymbol("XFlush")).addr);
    const xCloseDisplay: *Xlib.XCloseDisplay = @ptrFromInt((try lib_x11.getSymbol("XCloseDisplay")).addr);

    const display = xOpenDisplay(null) orelse return error.UnableToCreateDisplay;
    const screen = xDefaultScreen(display);
    const root = xRootWindow(display, screen);
    const window = xCreateSimpleWindow(display, root, 100, 100, 400, 300, 1, xBlackPixel(display, screen), xWhitePixel(display, screen));

    _ = xMapWindow(display, window);
    _ = xFlush(display);

    try std.Io.sleep(io, .fromSeconds(3), .awake);

    _ = xCloseDisplay(display);

Again, tested only on my machine with my glibc version, but still. The std.Io.sleep call, which uses the zig initial TLS area, works with both the LLVM and the self-hosted backends.

I decided to defer the dlopen replacement implementation for now, as I often hit bugs that are hard to debug.

Next step is starting zig threads, which implies cooperation with the std.Thread.LinuxThreadImpl.spawn implementation (because the pthread struct wants to be at TP, but the TLS area descriptor knows nothing about it).

EDIT:

screenshot of the window appearing, for the glory, and to promote the nnd debugger which was so useful :slight_smile:

11 Likes

That’s because on Windows libc is implemented cleanly on top of the Win32 API (and not the other way around or a mix like quite a few UNIX systems do stuff).

2 Likes

Another update:

I can now load libEGL.so and libGL.so. That means I can send OpenGL commands to an X11 window (itself coming from a dynamically loaded libX11.so.6). I can even load a libraylib.so file downloaded from github and start making a static non libc executable game :slight_smile: libraylib.so will still use libc, though.

Here is a curated list of what new things it took:

  • Intercepting and executing dl functions in zig, like dlopen, dlsym, dladdr, etc. (all these functions have an equivalent in the lib anyway)
  • Implementing TLSDESC relocations, and the related resolver with its unique calling convention
  • Discovering that C++ unwinding calls non-public ld functions, like _dl_find_object or _dl_find_dso_for_object, so they need an implementation too
  • Understanding that changing the thread pointer (the fs_base) to implement static area resizing will not work reliably when pthread is involved, because it stores and uses the absolute value of the thread pointer in quite a lot of places; so the first time the TLS area is restructured (before any thread is created), a surplus is requested to make space for future libraries static TLS data (note that the pedantic behavior here should really be to refuse to load a dynamic library that has static TLS)
  • Understanding that IFUNC symbols are not always subject to an IRELATIVE relocation, so they need to be resolved when referenced
  • Making a “bridge” between pthread functions and std.Thread functions; for instance, pthread_create calls std.Thread.spawn (it is still incomplete as only pthread_create and pthread_join are implemented)

… and many other details that appear non relevant until you crash at a random memory address.

The pthread bridging thing was a deliberate decision: because of how TLS works, spawning threads cannot be done by two different actors. I chose to put zig in charge. I have to say that I feel at least half of libc’s complexity comes from threads and TLS, and the opinion that more of it should be handled by a monolithic kernel (like linux) grows stronger every day. This strange relationship between the kernel and libc is a bad idea IMO, and maybe I’m starting to understand what @KilianHanich says:

So far, patching the zig standard library hasn’t been needed, which is good news.

Side note: libLLVM.so.19.1 is really big… For the raylib example, it’s like 70% of the loading time (~50 libs are loaded).

@geemili, I’d like to focus more on musl-based systems now, so I will set up a Chimera machine and see where it goes. In the meantime, feel free to test the vulkan_musl and vulkan_advanced_musl examples from the repo.

4 Likes

This is starting to remind me of my adventures when I was doing glibc ↔ bionic libc translation layer … https://github.com/Cloudef/android2gnulinux, the project has since then been forked and maintained at https://gitlab.com/android_translation_layer/bionic_translation

Interesting read and nice work. I wonder how stable all this ends up being in the end.

3 Likes

We’ll see, but I will try not to fall into the "my machine only” trap.

For now, the "specific, non-standard” things are:

  • Writing auxv, tls_static_size, and tls_static_align directly into the rtld_global_ro struct of glibc
  • Writing auxv, tls_static_size, and tls_static_align directly into the __libc struct of musl
  • Writing the current fs_base value directly into the field at offset 16 of the pthread struct
  • _dl_find_object and _dl_find_dso_for_object are not part of libc’s public API, so implementations can break
  • Manually calling glibc’s __libc_early_init
  • Manually calling glibc’s __ctype_init
  • Calling the first entry of glibc’s init_array with args (the spec says those functions take no args)

Also, it’s almost 100% sure that some part of libc remains uninitialized :slight_smile:

2 Likes