Lost and confused about rpath, dlopen, link time

Hi there,

I thought I had a basic but good enough understand of dlopen/LoadLibrary, rpath, LD_LIBRARY_PATH … Turns out I was very wrong: I have no idea what I’m doing.

I am trying to write a very small cross-platorm (macOS, Linux, Windows) sample to use OpenGL ES 3.0 through Google Angle with zig.

The repo is here: https://github.com/mpalomas/angle-gles.git

To achieve that I use native libraries: Google Angle (prebuilt binaries + headers), GLFW and Imgui with its OpenGL backend (both fully build in build.zig).

If you have time and run zig build run you should at least see some logs about the OpenGL ES version, and a white window. And if you are on Windows, you will also see a basic imgui window.

And that’s my problem: despite my effort, I am struggling with OS differences regarding RPath and potentially even link time… Basically I can only get the imgui part to build and run on Windows. On Linux and MacOS, it triggers link errors related to all OpenGL functions used.

Meaning, if you are on Linux or MacOS, in the main.zig, you can comment the comptime if around line 109 so cImGui_ImplOpenGL3_InitEx gets called: it will no longer build, you will get link errors.

I am puzzled because in principle, imgui is supposed to use dlopen to load symbols for the OpenGL ES libs coming from Angle (deployed in bin alongside the exe). And it works on Windows ?!

GLFW also works this way, and since you can see the window and the logs, it definitely works. To make it work, I had to adjust rpath (see build.zig) but nothing fancy:

if (target.result.os.tag == .macos) {
        exe_mod.addRPathSpecial("@executable_path/.");
    } else if (target.result.os.tag == .linux) {
        exe_mod.addRPathSpecial("$ORIGIN");
    }

But for the imgui lib, I cannot get it to work. It seems like imgui is looking for the OpenGL symbols at link time but only on MacOS and Linux? I must have missed something, either in my build.zig, or a macro to define…

Since what I get are link errors, I actually tried to linkSystemLibrary to libGLESv2. And it solves my link errors… but then fail to run because it cannot find the lib! Despite my rpath adjusment.

Anyway, I am not asking you to spend time and debug for me, but instead:

  • How would you debug this kind of link or runtime issue? Any guideline, advice?
  • Maybe some zig compiler options to get more verbose details?
  • Any cli tool I could use to analyze the binaries (check rpath…) ?

Thank you in advance!

I’ll answer from a mostly Linux perspective, since mostly-Linux is what I work with.

Any cli tool I could use to analyze the binaries (check rpath…) ?

readelf can be used for getting detailed information about binaries on Linux. The -d switch gets you dynamic linkage information. You should be able to see the rpath and the expected location of any linked dynamic libraries.

On Windows I quite like Dependencies. It has both a CLI and a GUI.

Maybe some zig compiler options to get more verbose details?

There is --verbose and --verbose-link to make Zig print the relevant commands it forwards to system tools, though I’m not sure how useful that is.

How would you debug this kind of link or runtime issue? Any guideline, advice?

How are you building and running code on the different platforms? Different physical machines? VMs? Docker? It’s hard to give advice without knowing those details.

I’m rather fond of starting from no dependencies in a chroot jail to start with and then resolving link errors as they come, but admittedly that’s a hassle to set up for GUI applications.

Thanks for the suggestions.

Windows is actually the only OS where it works as expected, strangely :rofl: .

I will try the tools and options you mentioned. I develop and build on physical machines yes: a MacBook Pro M1, a workstation under Ubuntu Linux, and my professional laptop on Windows.

Just an example of link errors on MacOS (very similar errors on Linux and… no error on Windows)

error: undefined symbol: _glActiveTexture
    note: referenced by /Users/mpalomas/dev/zig/angle-gles/.zig-cache/o/85545b53d9a1de73a2a5434464a8242d/libdcimgui.a(imgui_impl_opengl3.o):__Z32ImGui_ImplOpenGL3_RenderDrawDataP10ImDrawData

This glActiveTexture function, and all the others, are supposed to be loaded at runtime, not linked/resolved during the build.

This glActiveTexture function, and all the others, are supposed to be loaded at runtime, not linked/resolved during the build.

If symbols exist in the user code they must exist at link time. A function called through dlopen would show up as an indirect call through a handle, so wouldn’t usually have its own symbol (excuse the language):

void * handle = dlopen( "libname.so.2", ... ); //load the library
dlsym(handle, "function_name"); //call function_name()

If the library isn’t present, you should get a runtime error rather than a build-time error.

I’m not intimately familiar with any of the libraries, but maybe there’s a compilation switch for the libraries you’re missing on Linux/MacOS to exclude the symbols when they’re being runtime-loaded that isn’t required for Windows builds?

1 Like

Yes I’m on the same page. I’m thinking there must be a compilation switch somewhere with different default on Windows vs the others.
I will do more investigation over the week-end.
My backup plan is to rewrite entirely in Zig the Imgui OpenGL backend, which is causing trouble.
Will post my findings and updates at the end of the week.

1 Like

This should happen automatically. Perhaps OP is inadvertently calling these functions by name instead of by pointer, which is causing the linker to include them. Presumably, on Windows, the linker found these functions to include.

1 Like

Ok guys I’ve figured it out! First thanks a lot, I needed different ideas and perspective on the issue.
So, my build.zig was correct, and it was indeed a “compilation switch”, ie a “IMGUI_IMPL_OPENGL_ES3” that I added with addCMacro.
The problem is that, in the imgui backend, this switch is actually rather confusing, and protected with tons of others #define, imbricated and hard to figure out.
Turns out for some reason, with this switch, only on Windows it would use a “loader” storing function pointers, resolved at runtime. But on Linux and MacOS, it would instead #include directly some GL headers, declaring functions, therefore they were expected to be found at link time.
A bit weird and unexpected, but everything is working as expected now!

4 Likes