Debugging and allocating memory in code loaded from dynamic libraries

Hello everyone, I’m trying to play around with basic hot-reloading through dynamic libraries.

I’m struggling a bit with debugging crashes whenever I call a dynamically loaded function with any allocation at all. Even debug printing crashes for some reason. However, I’m able to call the function fine if for example it returns a simple u8.

Can anyone provide any guidance on what I might be missing? For context, I’m using a modified version of the default build script for the exe and shared library. The only major modification is the removal of unneeded parts (static lib for the exe, and the exe for the shared library), and swapping out addStaticLibrary for addSharedLibrary. Trying to step through the code in gdb doesn’t work too. I’m also using the latest available zig version (0.12.0-dev.1856+94c63f31f)

Here’s my main program:

const std = @import("std");
const Dynlib = std.DynLib;

pub fn main() !void {
    var lib = try Dynlib.open("./foo/zig-out/lib/libfoo.so");

    const fn_test: *fn () u8 = lib.lookup(*fn () u8, "_test").?;
    std.debug.print("{?}\n", .{fn_test});

    const ret = fn_test();
    std.debug.print("{?}\n", .{ret});

    lib.close();
}

And here’s my test library:

const std = @import("std");

export fn _test() u8 {
    // Calling std.debug.print crashes!
    return 69;
}
2 Likes

When you export a function it automatically becomes callconv(.C) and function pointers should be const in Zig, so the correct type for the call to lookup is

*const fn () callconv(.C) u8

Changing this does not solve the issue though. I’m not sure what is going on tbh, but I also hadn’t used std.DynLib before this. Hopefully someone more knowledgeable can weigh in.

The Zig standard library test doesn’t seem to be doing anything special, but it is also just loading and executing a simple arithmetic function.

I was able to walk through the dynamic library code in gdb by stepping through assembly instructions one at a time. Calling std.os.write(2, "hello\n") from within _test did seem to invoke the correct system call, but the pointer it passed as the text buffer to write did not point to the string “hello\n” at runtime.

Below is another example of unusual behavior with string literals.

// foo.zig
const array_str = "hello".*;

export fn foo() [*:0]const u8 {
    return &array_str;
}

export fn foo2() [*:0]const u8 {
    return "hello";
}
// main.zig
const std = @import("std");
const Dynlib = std.DynLib;
const Foo = *const fn () callconv(.C) [*:0]const u8;

pub fn main() !void {
    var lib = try Dynlib.open("./libfoo.so");
    defer lib.close();

    const foo = lib.lookup(Foo, "foo").?;
    const str = foo();
    std.debug.print("str: {s}\n", .{str[0..5]});

    const foo2 = lib.lookup(Foo, "foo2").?;
    const str2 = foo2();
    std.debug.print("str2: {s}\n", .{str2[0..5]});
}
> zig build-lib -dynamic foo.zig
> zig build-exe main.zig
> ./main
str: hello
str2:

Clearly something is wrong, or I am doing something very wrong.

1 Like

So I tried recreating the code in C and it worked perfectly. Trying the C library with the Zig program caused an ElfHashTableNotFound error. Searching for the cause of this error, I found this comment: DynLib fails to open libGL.so · Issue #5360 · ziglang/zig · GitHub, which recommends to link libc to use its dlopen function instead of Zig’s.

Doing so made the C and Zig libraries both work fine (including stepping through them in GDB), I can see from the discussion that Zig’s dlopen expects some different things vs the libc implementation, but I’m not quite sure why the code blows up when both ends are Zig.

For the sake completeness, here’s the code I tested with:

const std = @import("std");

export fn _test() c_int {
    std.debug.print("Foo\n", .{});

    var allocator = std.heap.GeneralPurposeAllocator(.{}){};
    var alloc = allocator.allocator();

    const bytes = alloc.alloc(u8, 1024) catch return -1;
    std.debug.print("Just allocated 1024 bytes\n", .{});
    alloc.free(bytes);
    std.debug.print("Just freed 1024 bytes\n", .{});

    return 69;
}
#include <stdio.h>

extern int foo() { 
	printf("Hello from C\n");
	return 69;
}

Anyway, I think things could be cleared up much more with a practical example somewhere (docs, zig news, ???), I’m down to do it myself but I still don’t understand the root of the problem quite yet. Any pointers from someone knowledgable are appreciated.

3 Likes