Zig-to-Zig Dynamic Libraries/Modules

Am I right in understanding that the engine and game code in this case will be a unified whole?
And thanks to various compilation optimizations, we won’t have to rebuild the entire project but will only replace small parts, correct?

Honestly, I’d be very happy about that, especially knowing that hot reloading is often a headache for current major game engines. This is mainly because they implement it by splitting the code into dynamic libraries that get reloaded after changes are made. As one might guess, this is not entirely seamless because it creates a clear boundary between what gets updated and what doesn’t. Perhaps at the compiler level, this process can be made smoother, as the compiler can more precisely control the generated and replaced machine code. In contrast, a library is a large and monolithic piece of code, and the replacement is usually done by swapping the addresses of called functions (as far as I understand).

If Zig manages to implement smooth hot code reloading, it will be a huge win.

As for me, I just used a game engine as an example. In reality, I am currently working on an OS kernel and it seems I’ve come to the conclusion of implementing a small API generator that will generate all the bindings from the source code by traversing the AST. Thanks to Zig for providing std.zig.Ast in the standard library. Then, I’ll simply include this tool in my build.zig to seamlessly integrate it into the project. I think this is the best thing I can do to achieve development convenience without creating lots of hacks and dirty solutions.

2 Likes

There seems to be some cross-talk here about what’s wanted, let me try a worked example.

Consider a Zig application, let’s make it an image editor. This is for end users, and it’s never reasonable to expect end users to run a compiler.

Image editing is inherently open-ended, there’s always another exotic file type to convert to and from, filters are an unbounded set, and so on. Also, many, even most, of these features are computationally demanding, so it’s not so simple as embedding Lua in the binary and calling it done.

So that’s where plugins come into the picture. End users can download them and drop them in a folder, and it Just Works. Since this is a Zig-native program, it’s using idiomatic Zig: structs aren’t extern, there are slices everywhere, tagged unions, and so on.

So exposing all of the surface area through the C ABI is just going to be painful. It’s at best a bunch of extra shrink-wrapping and ceremony, which has to be carried out on the other side of the system boundary.

It’s right to want something better than this, and Zig should have it. But I don’t think the language is ready. The compiler team needs basically complete freedom to compile Zig into any object code which fulfills the language semantics, and I think that will remain true for, who knows, but awhile longer yet.

I agree that most things which are done using runtime linkage should be statically compiled instead, but not everything. But I also see it as a niche, and not an urgent feature to add to the language, so in the meantime those who want to architect code this way are going to have to go through the C ABI, and get used to writing ptr: *const u8, len: usize, a lot.

Definitely a part of any respectable Zig Maximalist platform, however.

8 Likes

Yes, image editor which has to deal with many data formats is very similar to what I’ve describe, data uploader. When some new data format appears, I write one more decoder, make one more instance of uploading service (which loads exactly this new plugin) and voilà.

Do I understand correctly that this is about some zig-specific dynamic library format?

2 Likes

To expand on that a bit, the language itself is not stable, and won’t be until 1.0. Committing to a stable ABI and calling convention would be premature.

Preparing an object file for runtime linkage is just a different process from statically linking a bunch of code into one binary. Do that too early and the team would have to carry that complexity forward at a time when internals are rapidly changing (incremental compilation, for one thing).

Right now, we can make shared libraries, but have to use export and the C ABI. That’s an acceptable status quo, as far as I’m concerned.

4 Likes

Why? (seriously asking, I presented reasons why I believe it’s a good default and I might well be wrong. What’s your argument?)

To me that sounds like you made an argument about there being two things one could have talked about. If they are the same, then what’s the difference?

Agree. Except for “right now”.

I don’t know any Zig features, which would not work with platform shared library/dll formats.

The examples, insofar I understood them correctly, were all about how it’s clumsy or redundant to declare external interfaces. If that needs fixing, I believe the Zig syntax could just fix itself without impacting the format of shared libraries.

A shared library is a build artifact, just like an executable. There you have to handle argv, envp, and exit codes (+ stuff), here flat functions and constants. If Zig can easily interface with and produce such a type of artifact I cannot see why it should create its own format, if all that this would do is to support syntax in Zig: A member function in a struct is just syntax, the struct is still just a struct (no vtables) and the function is just a function (no lambda context capturing or other magic).

The only thing that seems to be different between C and Zig is alignment and calling conventions, and that also differs between C and C (clang vs gcc, …) in subtle ways.

Such a new format would just create friction that would make it harder to do interop and I don’t see how that would be a selling point for Zig. If it’s used, you lock out C and others, if it’s not used it makes Zig more complex. Where is the win?

Names are different. so is emphasizing “shared”, dll is emphasizing “dynamically loaded”.

Because it is just stupid to have decoders for many very different data formats in a single library. Imagine a library for loading images that can work with any image format that exits.

I don’t want this to go completely off topic, no more than it did. But…

You did not answer my question regarding awfulness. But you did say that my misunderstanding (the one you suspect in the first quoted comment) is based on the difference between SO and DLL. Now you say this is all about names, ignoring that everything I said about SO and DLL is that these are actually the same concepts named differently.

If this is actually only about whom of us two gets to win, then this is fine with me.

I thought or at least hoped that you saying I am wrong has the potential that there is something I can learn to avoid mistakes and not just about one of us feeling better.

We are of course both entitled to our opinions, but you said “this is awful”, not something like “i find this awful”, which are two entirely different things to say.

I’m not saying this to steer up trouble, I just think your approach could benefit from some reflection. Mine might too, I just don’t know how I triggered this reaction.

I did. See above.
Ok, once again - I have to load some (telemetry) data into DB.
There are dozens of formats of such data, example.

My approach is:

  • have a single executable (uploader)
  • have many instances of a service, all of them runs that single executable
  • but every instance loads a decoder specific to a data format.

And it is quite natural to have separate (dynamic) library for each protocol/data format.
Having “all-in-one” decoders is like having huge image processing library which can decode everything (jpeg, png etc).

Never told about your misunderstanding.
“Now”… Now? I used the word “names” (not “mechanisms”, not “entities”, just “names”) in the very first reply to you. And I only meant that

  • dynamic library can be shared by many apps, like libc.so or win32api.dll
  • dynamic library can be used by one app only (“plugins”)

But names kinda imply that so (“shared object”) has to be shared, but actually it hasn’t and that dll is “stressing” “dynamically”, but it does not mean it can’t be “shared”.

About static vs dynamic linking.
For the first case (“general purpose libraries”) static linking is not so bad idea, especially if a linker “embeds” only those functions that are actually used, but not the entire library.
But for the second case it’s just simply impossible to link something just because that something might appear only in the future.

Hmm, I think that was

And I just gave an example from my practice, where using plugins is more or less reasonable.

I don’t know if our discussion is welcome in regard to the original post. So I’m tempted to stop here.

On the other hand I really don’t like it to end a discussion because it got heated or is obviously mostly about misunderstandings or misinterpretations. So if anybody wants me to shut up, just say so.

I did. See above.

I don’t know which question you already answered or which question the following text is answering. The question I meant was why you think my default of linking everything into a single binary is “awful”.

I ask this, because I mentioned a whole lot of reasons why I believe it’s a good default. You did not argue with any of these reasons, neither did you explain why your approach is objectively better.

You did say that this approach proved to be successful, but that is not an argument for which approach is better (by any metrics that you are free to choose, just like I chose my metrics when making my arguments).

Natural as in encapsulating logic in functions, related functions in files and sets of files in modules? Sure. But I do not believe that DLL/SO are a tool that is well suited to organize code. See, I said “I do not believe”, because I might well miss something. This is an invitation for you to tell me that I’m wrong because something.

How am I to understand this:

“Look” - you thought I wasn’t considering and should take a second look.

“there are two X” + “guy is talking about X2” = “You thought it was X1, so look again”

That’s how I read your comment. What should I have understood instead?

I completely disagree with this. It is perfectly possible. You are actually delivering your plugins, just because somebody might possibly use them. And you are doing that together with everything else your project delivers in that single .deb package.

If you have 500 decoders and your users typically only use 3 of them, then it would be idiotic to link the whole bunch into a single executable. But that’s what I meant when I said “in the absence of requirements”, a single executable is a sane default.

And idiotic as it would be to pack all the decoders into a single executable, it might not even have a significant impact. But if you are facing the decision whether to implement it one way or the other, going the DLL way has an immediate impact. It’s more coding work. There are several point of failure. Nothing bad, it’s just additional complexity that - in the absence of conditions changing the picture - is not delivering a counter value (that I can see).

I never said what you do is wrong. You may have good reasons and your solution may well be better than mine in this particular context. But as a general rule of thumb, I believe the simple way is also the better way in the scope of the discussion (and I enumerated the reasons why I think so).

I’m really irritated, because I don’t understand how what I say is offending you.

If you would consider the arguments I presented for why statically linked binaries with all the benefits Zig’s comptime have potential to make run code faster while potentially using less memory, then I’m either wrong and you didn’t tell me, or I’m right and performance, memory size, security and maintenance overhead are irrelevant. But what then is relevant when choosing a delivery format for software?

That’s basically the same question as the one I asked again at the top.

The closest you came to answering this was “It’s natural” and I’m fine with that, if this is why you prefer your solution, because it’s none of my business what you prefer. But that wouldn’t justify you telling me that my approach is “awful”. And here the circle is closing again.

I never said and much less meant that using plugins is unreasonable. I said that using DLLs is unattractive. And the context was: “should zig have it’s own DLL format?”. And the context for that in turn was: “I had to write all my declarations twice - how annoying” (not literal quotes). And it turned out that this was said in the context of defining an interface for device drivers in a kernel that would not use any implementation of shared libraries or DLLs, but a kind of kernel module that had to be specific for that kernel.

I definitely feel justified to say that shared libraries are unattractive (to receive first class Zig support). Unattractive is not unreasonable. It’s just something you would not prefer unless you have a reason to.

But even without the context. I can see how you might disagree with me. But why is that raising the temperature so much?

I really feel a bit bad about the friction of the whole discussion. If I offended you then please know that this was not my intention, but also that I don’t think you needed to feel that way.

Please turn down the temperature a bit, and put in more effort to understand what the other is saying. Thanks.

Any Zig-native ABI for runtime linkage would be in addition to the C ABI it already has. That’s not going anywhere.

As for reasons to even have it, the low hanging fruit which I see are slices and optionals. Zig also has a much richer collection of integer types, that’s probably a smaller thing but it’s there.

Most data structures in Zig are Zig-native structs and/or tagged unions. The “no defined layout” rule makes it more challenging to handle those across compilation units, although I suspect that as long as both the binary and the plugin are compiled with indentical Zig compilers using identical settings for release mode and the like, you’d get the same layout on both sides.

But that’s very brittle, supporting runtime linkage would need to be less dependent on accidentals. I don’t think the compiler team should be dedicating time to making that possible now, but it will be nice to have later and I do hope it stays on the agenda.

As a final point, the compiler can choose whatever specific calling convention it wants for functions, and that’s the biggest place where retaining flexibility is important. For now.

3 Likes

oooops, i’m sorry for i hadn’t realized the point of the topic starter.

… and just to clarify my ramblings above
there are two ways of “dynamic linking”

  • implicit (default on Linux)
  • explicit (via POSIX dlopen() or Windows LoadLibrary`, does not matter at all)

in all of my previous posts I meant the second “traction”.

Does ELF or PE/COFF prevent calling API functions writen in Zig/Pascal/C in programs written in Zig/Pascal/C?

Does ELF or PE/COFF file formtas contain something that really contradicts to ABI (calling conventions and whatnot)?

Calling conventions are orthogonal to the binary file format.

2 Likes

That is it, but

const Api = @import("my-api.zig").MyApi;
const log = @import("std").debug.print;

fn method1() callconv(.Zig) void {
    log("la-m1-2\n", .{});
}

fn method2() callconv(.C) void {
    log("la-m2-2\n", .{});
}

export const api: Api = .{
    .method1 = &method1,
    .method2 = &method2,
};

does not compile:

$ zig build-lib -dynamic my-lib-a-2.zig -O ReleaseSmall
my-lib-a-2.zig:5:24: error: no field named 'zig' in enum 'builtin.CallingConvention'
fn method1() callconv(.zig) void {
                      ~^~~
/opt/zig-0.14/lib/std/builtin.zig:167:31: note: enum declared here
pub const CallingConvention = enum(u8) {
                              ^~~~

Then I do not understand nothing at all :frowning:

fn method1() void {
    log("la-m1-2\n", .{});
}
$ zig build-lib -dynamic my-lib-a-2.zig -O ReleaseSmall
my-lib-a-2.zig:14:6: error: expected type '*const fn () callconv(.C) void', found '*const fn () void'
    .method1 = &method1,
    ~^~~~~~~~~~~~~~~~~~
my-lib-a-2.zig:14:6: note: pointer type child 'fn () void' cannot cast into pointer type child 'fn () callconv(.C) void'
my-lib-a-2.zig:14:6: note: calling convention 'Unspecified' cannot cast into calling convention 'C'

What calling convention should be used to be it … um… zig, zigger, ziggest? … :slight_smile: