Making opaque a real type

chung-leong · July 24, 2024, 12:55pm

Currently, opaque is not something we can create an instance of. We can only create pointers to opaque. To hide implementation details we might do something like this:


const Context = struct {
    field1: usize,
    field2: usize,
};
const ContextOpaquePtr = *align(@alignOf(Context)) opaque {};

pub fn startContext() ContextOpaquePtr {
    const allocator = gpa.allocator();
    const ptr = try allocator.create(Context);
    ptr.* = .{ .field1 = 123, .field2 = 456 };
    return @ptrCast(ptr);
}

The shapeless nature of opaque basically forces us to use the heap, increasing code size and slowing things down. Most of the times we don’t really that level of secrecy. It’s okay for the caller to know the dimensions and memory alignments of our structures. Opaqueness alone is sufficient.

So here’s my idea: allow the assignment of a backing type to an opaque.
Such an opaque would inherit a size and alignment from the backing type. This info allows the caller to provide memory storage on the stack for what is otherwise unknown structure. The example above can then be implemented like so:

const Context = struct {
    field1: usize,
    field2: usize,
};
const ContextOpaque = opaque(Context) {};

pub fn startContext() ContextOpaque {
    const cxt: Context = .{ .field1 = 123, .field2 = 456 };
    return @bitCast(cxt);
}

In addition, there would be a built-in function @expose(), which would let us selectively exposes decls from the backing type:

const Context = struct {
    field1: usize,
    field2: usize,

    fn getX(self: Context) usize {
        return self.field1;
    }
};
const ContextOpaque = opaque(Context) {
    pub const getX = @expose("getX");
};

The built-in would automatically cast the function so that self is of the opaque type.

mnemnion · July 24, 2024, 4:41pm

Interesting notion. What do you see as the advantage of this approach?

I think of anyopaque as Zig’s answer to void * in C: a type-generic way to include a reference to some memory, where a consumer is expected to know what that reference means. opaque itself is a way to provide more type safety than you get with void *, because that lets you have distinct categories of otherwise-unknown value.

This gets used for applications like a classic context object, which is defined as a function pointer of a given signature, and an * anyopaque pointer which is an argument to that function pointer. Then the function casts the pointer to what it knows it to be. That means that the code consuming the context object doesn’t have to be comptime-generic, which can be inconvenient or impossible.

What sort of tasks do you see a size-bearing opaque type as achieving? I don’t personally think hiding fields from consumers of some struct is worthwhile, and that seems to be the consensus: it wouldn’t be difficult to add private fields, but that idea has been rejected when proposed.

So I’m not seeing what a structure-erased blob with a size and alignment would really get us. Could you describe an application to go with the mechanism? You can make a pointer to some stack-allocated struct into an * anyopaque now, as long as you’re passing it down-stack so that the memory remains valid:

const StackStruct = struct {
    []const u8,
    []const u8,
    []const u8,
};

fn printStackStruct(erased: *const anyopaque) void {
    const unerased: *const StackStruct = @ptrCast(@alignCast(erased));
    std.debug.print("{s}\n", .{unerased[1]});
}

test "opaque pointer" {
    const on_stack: StackStruct = .{ "fee", "fie", "foe" };
    const eraser: *const anyopaque = @ptrCast(&on_stack);
    printStackStruct(eraser);
}

chung-leong · July 24, 2024, 7:21pm

A good example is this struct in zig-sqlite:

pub const Db = struct {
    const Self = @This();

    db: *c.sqlite3,
    // ...
};

Basically, all it contains is a C pointer. The content of the struct has the same size as a pointer to struct. Right now, I have to allocate 4/8 bytes off the heap in order to return it as an opaque pointer. I would be more efficient if I can just return 4/8 opaque bytes. One less indirection and no need to link in code for an allocator.

I’m working on an example for my project Zigar. Here’s some actual code:

var gpa = std.heap.GeneralPurposeAllocator(.{}){};

const SqliteOpaquePtr = *align(@alignOf(sqlite.Db)) opaque {};

pub fn openDb(path: [:0]const u8) !SqliteOpaquePtr {
    const allocator = gpa.allocator();
    const db_ptr = try allocator.create(sqlite.Db);
    errdefer allocator.destroy(db_ptr);
    db_ptr.* = try sqlite.Db.init(.{
        .mode = .{ .File = path },
        .open_flags = .{},
        .threading_mode = .MultiThread,
    });
    return @ptrCast(db_ptr);
}

pub fn closeDb(db_op: SqliteOpaquePtr) void {
    const db_ptr: *sqlite.Db = @ptrCast(db_op);
    db_ptr.deinit();
    const allocator = gpa.allocator();
    allocator.destroy(db_ptr);
}

While I could simply return sqlite.Db, that would lead to many of the struct’s functions getting exported to the JavaScript, which are more or less useless because the most critical ones require comptime arguments and I can’t marshall calls to them. The struct is only useful when it comes back into Zig. Like this:

pub const Album = struct {
    AlbumId: u32,
    Title: []const u8,
    ArtistId: u32,
    Artist: []const u8,
};

const FindAlbumsIterator = Iterator(Album,
    \\SELECT a.AlbumId, a.Title, b.ArtistId, b.Name AS Artist
    \\FROM albums a
    \\INNER JOIN artists b ON a.ArtistId = b.ArtistId
    \\WHERE a.Title LIKE '%' || ? || '%'
);

pub fn findAlbums(db_op: SqliteOpaquePtr, title: []const u8) !FindAlbumsIterator {
    return try FindAlbumsIterator.init(db_op, .{title});
}

Conceptually, what I’m suggesting is not hard to understand. In the real world after all, opaque objects do have dimensions that we can observe. We are able to allocate space for opaque objects. The opposite is the counter-intuitive notion.

mnemnion · July 24, 2024, 8:31pm

No conceptual difficulty here, certainly. Figuring out what it was meant to be good for is a different matter, thanks for providing an example.

Although I’m not seeing why it would be necessary in this case. The sqlite.Db struct either exists on the heap (in which case, allocation is fine, and a * anyopaque is also fine), or it exists on the stack. In which case, you can make a pointer to the stack copy, and hand that down to the code which uses it.

Not seeing any middle ground. Maybe the sqlite.Db lives in another struct? But that’s the same thing with another level of indirection.

It’s not obvious to me why that would be, but I’ll take your word for it. Is that just how your library works? Because if the goal is “pass a struct but don’t provide the decls to Javascript”, perhaps what you’re using to marshal the struct into Javascript-land could be configured to not automatically provide all decls as functions?

mnemnion · July 24, 2024, 8:32pm

But then again, since it’s an extern struct, you can @BitCast it into a big integer and pass that around. If you want.

AndrewCodeDev · July 24, 2024, 8:37pm

I was also thinking a buffer of u8 that you can cast.

Sze · July 24, 2024, 8:47pm

What about something like this:

const std = @import("std");

// faking sqlite
const c = struct {
    const sqlite3 = struct {
        here: u64 = 0,
        are: u64 = 232321,
        some: u64 = 434,
        fields: u64 = 971,
    };
};

const Ptr = *c.sqlite3;

pub const Db = struct {
    const Self = @This();

    db: Ptr,

    fn fromOpaque(o: OpaqueDb) Db {
        return .{ .db = @ptrFromInt(@as(usize, @bitCast(o.db))) };
    }

    // putting both methods here because opaque is supposed to be minimal
    fn toOpaque(self: Db) OpaqueDb {
        return .{ .db = @bitCast(@intFromPtr(self.db)) };
    }
};

pub const OpaqueDb = struct {
    db: [@sizeOf(Ptr)]u8 align(@alignOf(Ptr)), // using field with alignment
};

pub fn main() !void {
    var sqlite3 = c.sqlite3{};
    const sqlite3_ptr = &sqlite3;
    var internal_handle: Db = .{ .db = sqlite3_ptr };
    std.debug.print("internal_handle: {}\n", .{internal_handle});

    const external_handle = internal_handle.toOpaque();
    std.debug.print("external_handle: {}\n", .{external_handle});

    const reinternal_handle = Db.fromOpaque(external_handle);
    std.debug.print("reinternal_handle: {}\n", .{reinternal_handle});

    std.debug.print("\n--------------------------\n", .{});
    if (internal_handle.db == reinternal_handle.db) {
        std.debug.print("seems to work?\n", .{});
    } else {
        std.debug.print("fail\n", .{});
    }

    std.debug.print("OpaqueDb:\n", .{});
    std.debug.print("sizeOf: {d}\n", .{@sizeOf(OpaqueDb)});
    std.debug.print("alignOf: {d}\n", .{@alignOf(OpaqueDb)});
}

Output:

internal_handle: opaquewithfieldalign.Db{ .db = opaquewithfieldalign.c.sqlite3{ .here = 0, .are = 232321, .some = 434, .fields = 971 } }
external_handle: opaquewithfieldalign.OpaqueDb{ .db = { 104, 196, 241, 25, 252, 127, 0, 0 } }
reinternal_handle: opaquewithfieldalign.Db{ .db = opaquewithfieldalign.c.sqlite3{ .here = 0, .are = 232321, .some = 434, .fields = 971 } }

--------------------------
seems to work?
OpaqueDb:
sizeOf: 8
alignOf: 8

chung-leong · July 24, 2024, 9:53pm

The struct is being returned, not being handed down. And of course we can’t return a pointer to a stack variable. That was my point about being forced into allocating from the stack.

With my code, what actually happens when a struct is returned by value is that the struct will first land in the stack frame of comptime generated function that I call a “thunk”. The thunk copies the bytes into an JavaScript ArrayBuffer. The struct will then be floating around in garbage-collected memory until it’s passed to the thunk of another function, which does the reverse.

A lot of what I do is very similar to how you would marshal a function call across process boundary or across the network even. Thanks to the magic of the Zig comptime programming, everything can be done seamlessly and automatically.

Imagine, if you will, your computer is running a app that perform RPC to a server sitting in say Szczebrzeszyn, Poland. The first call would establish some sort of a context, stored in a struct that only the server in Szczebrzeszyn can understand. If this struct is returned by pointer, then the server in Szczebrzeszyn must keep a copy of it in memory so that when it receives the address again from your computer, it can find those bytes again. Just sending these opaque bytes down the wire is far simpler. And there’s no drawback unless the struct is larger than a network packet.

Yes, it’s design choice made for the purpose of ease of use. If it’s public, it’s available. There’s really no other way to implement this. Even if tags get implemented eventually, the issue of third-party code would remain (programmers can’t stick tags into other people’s code). Converting other people’s structs into blobs of anonymous bytes seems the easiest solution. If I don’t fully understand what these things are, then it’s entirely reasonable for me to not expose them to programmers using my code.

chung-leong · July 24, 2024, 10:00pm

It wouldn’t work since it’s not an extern struct. I can always make it work by being evil, but we don’t want that now do we

mnemnion · July 25, 2024, 12:09am

Ok, that makes sense. I can think of other ways to handle it but I agree that just turning it into a blob and back would be the way to go.

You can do that with std.mem.toBytes, or, since you’re copying it anyway, asBytes. Then on the other end it’s just std.mem.bytesToValue, also with a bytesAsValue no-copy variant.

You could even make it type safe:

const SqlBlob = struct {
   blob: [@sizeOf(SqliteDb)]u8,
};

chung-leong · July 25, 2024, 9:50am

Why make programmers jump through loops? Why make them do evil stuff like transforming a struct into a byte array when we make the language semantically sounder? There’s no reason why an opaque should behave like anyopaque. “Something” is not “anything”. Anything can have any size. Something has some size.

Right now, pointers to opaques are like C pointers in that they point to memory regions of unknown extent. That creates problems when you try to reduplicate or relocate data structures. You just don’t know how many bytes you need to copy.

kristoff · July 25, 2024, 10:26am

I’m not sure I understand the point of this discussion.

Usually opaque types are used when a system wants to give you ABI stability, so for example a database might have a plugin system and give you an opaque pointer so that when a new database version comes out, plugins don’t need to be recompiled even if some types in the database have changed shape and/or size.

As I understand this is pretty much the only reason why one would use opaque types, so if you start undoing the ABI stability guarantee, you lose the only advantage this system gives you.

chung-leong · July 25, 2024, 11:30am

I don’t think the ABI stability question is all that relevant to Zig. Zig modules generally will make some use of comptime so you have to compile from source.

The main motivation of using opaque types is to prevent consumers of your code from creating dependencies on implementation details. Today you might be using library X. Some time later you might switch to library Y. That’d break people’s code if they had been accessing X’s API directly.

mnemnion · July 25, 2024, 2:01pm

What hoops?

pub fn Blob(T: type) type {
    return struct {
        blob: [@sizeOf(T)]const u8,

        const BlobType = @This();

        pub fn init(obj: T) BlobType {
            return BlobType{.blob = std.mem.toBytes(obj)};
        }

        pub fn restore(blob: BlobType) T {
            return std.mem.bytesToValue(T, blob.blob);
        }
    };
}

How is this hoops? It’s the type you want, as a tiny userspace library. This niche use case doesn’t justify changes to the type system, because the type system can accommodate the application, as-is.

My definition of evil doesn’t include doing something which is provided by the standard library, so that we can have, for instance, and in particular, allocators.

What I’ve sketched out here cannot stop working, because it’s fundamental to how Zig operates. You’re not “allowed” to interpret the byte sequence, because Zig types don’t have a specified memory layout. But they do have a size, which is why @sizeOf always works, and they have a realized memory layout. It might change between compiler releases, it’s internal API, and it’s a mistake to read a subset of those bytes and expect to find a particular field. But round-tripping like this is a-ok. You are guaranteed by the semantics of the language to get your original instance struct back.

Read the code for one of the allocators. You’ll see what I mean.

There’s nothing type unsound about this construction, either. It’s a type-specific opaque container, where the data is const, so userspace can’t mess it up without trying really hard. To make the idea clear, I wrote restore as a method on the blob, but for your purposes you might want a layer of indirection so that restoration isn’t a temptation in user code. Obviously if they can get at std.mem then they can just do it themselves, but look: the proposal is to hand off data in opaque form, which means they have the data, so any sufficiently determined user is going to be able to rehydrate it.

If that’s a non-option, fine, then you pretty much have to give them a pointer and retain the data yourself.

That part isn’t even really true! You could literally encrypt the bytes and give them that, and then decrypt it on the far side of the JavaScript boundary. It’s all about how paranoid your code needs to be.

It actually makes perfect sense to me and @chung-leong did a good job of explaining why. A JavaScript runtime is an abstraction boundary, and this lets him marshall Zig-specific data across the abstraction boundary without having to maintain it in the Zig heap. That would mean having to keep track of it, and potentially garbage-collect it, because the data is heading into a dynamic language, so as soon as the last reference to that data goes out of scope, it’s gone.

Philosophically, I would not personally erase the identity of a handle like this, but then again, the reasons given are good: the methods pertaining to the struct require significant comptime configuration to be used correctly, so that’s a source of error and frustration in user programs which might try to use those methods. So maybe I would, but primarily, it’s not my library, so those aren’t my decisions to make.

kristoff · July 25, 2024, 9:00pm

The ABI stability question is extremely relevant for Zig because it’s a mechanism that is both used in the wild (meaning that Zig will need to interoperate with such systems) and also desirable in specific use cases by pure Zig projects as well. Not having to recompile plugins when the host application changes is a truly desirable feature in some situations.

I would be extremely skeptical of a library that tries to use opaque types to enforce a public interface. Semver and properly documenting what is considered public interface is how you prevent consumers from creating dependencies on implementation details (or at least making them aware of the fact that you don’t promise to not break their code if they do).

chung-leong · July 25, 2024, 10:00pm

Since that’s your assertion, please give me one example of an existing Zig package, one that you can fetch through the Zig package manager, which behaves in the manner you described, namely that a change in version would not trigger recompilation.

kristoff · July 25, 2024, 10:51pm

It seems that I’m talking about a mechanism that you’re not aware of, let me provide you with some links:

General concept:
https://hackaday.com/2018/07/12/its-all-in-the-libs-building-a-plugin-system-using-dynamic-loading/

Some examples:
https://www.sqlite.org/loadext.html
https://redis.io/docs/latest/develop/reference/modules/
https://big-elephants.com/2015-10/writing-postgres-extensions-part-i/
https://nginx.org/en/docs/dev/development_guide.html#Modules
https://tree-sitter.github.io/tree-sitter/creating-parsers

And here’s one concrete Zig example, a Redis module I wrote in Zig a while ago:
https://github.com/kristoff-it/redis-cuckoofilter

mnemnion · July 26, 2024, 1:06am

As the docs put it:

[opaque] is typically used for type safety when interacting with C code that does not expose struct details.

In C, this is reasonably common, for a few reasons, already adequately covered.

If you want a Zig dynamic library for whatever reason, it’s going to be using the C ABI, so the opaque pointer approach is justifiable.

If someone wanted to make module code which blobs up a struct, there’s nothing stopping that. You could really drive the point in by casting the byte array with @bitcast into a big ol’ unsigned integer, if you wanted to.

I don’t think that would be a popular choice, however. Feel free to experiment, and if it catches on, maybe it’s worth adding more language support for doing that.

But you’ve already identified a situation in your own code where it does make sense: a host/guest program, where you want to be able to pass Zig assets into a hosted runtime, and have guest code reach back into Zig with that data. I could see this being useful in Wasm, or JavaScript like you’re doing now. Lua, maybe. Any situation where the nature of the data is such that it has to be handed back to the Zig host for anything useful to happen to it.

What I’m not seeing is what promoting a byte blob to a special sort of opaque type is adding to the picture. It’s been proposed already, and it didn’t catch on.

No proposal based on access control has been accepted, and it will probably stay that way. I consider that a good thing, because Zig is a low-level language, structs are literal regions of memory carrying data. It’s possible to obscure what that data means, but not what it is, so sufficiently determined user code can always figure the rest of it out, and all that private fields or sized opaque types (which is just a struct where all the fields are private) can do is make life frustrating and annoying for whomever has to make some internal use of an asset for which some of the type metadata has been excluded.

It’s not even difficult to do this ‘by hand’, it is in fact very easy. There’s no advantage in making it easier than it already is.