Writing runtime-extensible code in Zig is massively inconvenient

I’ve documented the status quo here:

Run zig build to verify.

2 Likes

This is the essence of why the improving the status quo as discussed here is a difficult ask.

In brief: the C ABI is a lie. The C standard does not mandate an ABI, but due to the prevalence of dynamic linking in C, the various operating systems and compilers converge around a strict convention for code which dynamically links with other code.

This is an issue because, let’s say Zig defined its own enriched ABI. There’s no way for that object code to advertise that it’s laid out differently from what the C consensus demands, because our object formats lack any such facility, unless you want to count symbol mangling, which I don’t.

C++ solves this problem by creating enormous problems for everyone, Zig solves it, like most other languages, by punting to the least-common-denominator.

Mostly, Zig operates on a closed-world assumption, and mostly, that works for what Zig is good at. My one ask would be to define a layout for slices already, an official C struct consisting of the pointer first and the length second. There’s really no reason not to do this.

But to do anything more sophisticated requires work at the operating system and object code format level which no one seems to be interested in. The freedom Zig gains with no-defined-layout guarantee is mostly about calling convention, rather than the ins and outs of how data formats themselves are laid out and evolve.

My point being, dynamic linkers fly completely blind. If object code is going to play in that arena, it needs to either advertise how it works, which calls for a new object format basically, or it needs to do like C++ and cause problems, or it needs to emit interfaces in a C compatible manner. One of these is very challenging, the other is rude, so that leaves door #3.

Cursed, maybe, hack, definitely, but I’m not convinced it needs improvement (and filed an issue expressing my hope that this behavior would be guaranteed going forward). Use of @setRuntimeSafety clearly expresses what’s going on there.

Honestly I think people are overthinking the problem. The only real problem is that Zig likes to not being forced into a locked memory layout for objects, yet dynamic linking of objects requires two binaries to agree on memory layout. One could argue that anytype and comptime is another but clearly by definition they cannot be allowed to cross dynamic boundaries as a dynamic boundary is runtime by definition.

In practice when compiling we need a way to tell the compiler that a given source zig file are replacing the struct known as MySuperUtf8String switching it to be a packed struct looking like this. It can then attempt to compile the code and fail if there are missing fields, mismatched types or what not. Obviously one could hand edit these, but let’s no go editing std files with packed structs just to support dynamic.

Similarly we need the ability to generate packed structs as an output from our build pipeline. We can’t do this comptime as comptime is not allowed to have side effects so we have to do this from build.zig through some sort of observer.

This way if people want a static API they can agree on a packed struct Convention and build both binaries with it. Quite possibly a convention they asked zig to deliver on their first release. However if they want optimal performance in their main app and can accept rebuilding plugins for each version they can have an API that changes with each build.

That still leaves slice as the odd ball but as you said that’s really just a struct.

Once this is done, the rest is just a library in std.

For my use cases where I want to add plugin support into a larger system I would need a library that allows me to export structs and functions from the host. But this is just vtables to be filled out when loading the module.

One could also imagine someone making a closed source library where they leave an api for you, complete with a loadmodule method that grabs all the addresses for the vtables it needs from your system and hooks it into their library. This would keep their binary size down and ensure that you don’t have two dynamic libraries that ends up conflicting on whether the pointer or the length is the first member of a slice. :wink:

About how other languages will support our packed struct that’s really not my concern. It’s possible but will take effort.

This is not a small thing. But I don’t see a real way around it. Until we have a clean way to replace certain parts of code base with dynamic replacements (generated packed structs with vtables) dynamic linking zig dlls will be one massive hack.

Once you have dynamic libraries around, you have to know exactly how they’re laid out. They can sit and ferment for years and people are going to want them to still link. C solves this problem by being very old and set in its ways. C++ solves this problem poorly.

It’s not about Zig defining a layout, it’s about that layout ossifying and never being changeable again. That’s ok for C because C never changes, but is it? Is it ok for C? Why is int still 32 bits again? ABI. ABI is why.

The solution to this problem is metadata in the object code. It’s a hard problem, mostly for social reasons. But reinventing the severe (!) problems C++ has had, which are caused by ABI compatibility, is not a solution to it, and pretending that, right now, we know what Zig object code should look like in five years, is also not a solution to it.

Better than under-thinking it.

As I’ve said elsewhere on this board, the ability to dynamically link with Zig-native object code is a good long-term goal for the language. But I think you’re discounting the problems which would come with just YOLOing out an 80% solution here.