Workflow when writing c bindings?

kj4tmp · March 5, 2025, 5:55am

I am writing some zig bindings (zenoh-zig) for a c library (zenoh-c).

I was planning on the following workflow:

Use the build system to depend on the pre-compiled c static libraries (done)
Port the examples over to zig using @cImport() (in progress)
Use learnings from porting the examples to write an easier to use zig API

Is this a good workflow?

An alternative work flow could be:

Use translate-c only once, and modify the generated zig code. Will this be easier or harder to maintain as the upstream changes?

Questions:

How do I know what I can delete from the translate-c output?
Is there a guide somewhere for writing bindings for C code?
The library is originally written in rust, There is a bunch of loan…move…stuff that I don’t understand. Where can I learn what this means?

Here is a snippet of what ported code looks like. Note: this code does not compile because I coudn’t figure out how to compare two [*c], this is absolutely brutal.

const std = @import("std");

const zenoh = @import("zenoh");

test "wrapping raw bytes into a z_bytes_t" {
    var payload: zenoh.c.z_owned_bytes_t = undefined;
    const input_bytes: []const u8 = &[_]u8{ 1, 2, 3, 4 };
    var output_bytes: zenoh.c.z_owned_slice_t = undefined;
    _ = zenoh.c.z_bytes_copy_from_buf(&payload, input_bytes.ptr, 4);
    _ = zenoh.c.z_bytes_to_slice(zenoh.c.z_bytes_loan_mut(&payload), &output_bytes);
    try std.testing.expectEqualSlices(u8, input_bytes, zenoh.c.z_slice_data(zenoh.c.z_slice_loan(&output_bytes)));
}

kj4tmp · March 5, 2025, 6:08am

this is a great intro to writing c bindings: Blog: Understanding How Zig and C Interact

And this is a good intro to the move and loan stuff: Rust Ownership, Move and Borrow - Part 1

floooh · March 5, 2025, 8:07am

I would try to automate the process by extracting the C API declaration either via clang -Xclang -ast-dump=json -c [c_src_path] or using Zig’s comptime reflection to iterate over the @cImport’ed struct (disclaimer: I haven’t tried the Zig approach yet - in recent times at least).

With that extracted AST information you can then make your Zig output to be more ‘Zig idiomatic’ (at least naming convention, but maybe also things like accepting slices instead of ptr/size pairs etc…).

In the sokol headers I use the first approach and run that automatically via GH Actions: Merge pull request #1214 from waywardmonkeys/reduce-typo-count · floooh/sokol@123f30c · GitHub

Here’s how I generate a simplified JSON from Clangs very verbose AST-dump (I put some restrictions regarding the C API on myself to simplify the data extraction though - this is easy because I also control the C API, for instance nested unnamed structs are not allowed, and no unions in general):

github.com

floooh/sokol/blob/master/bindgen/gen_ir.py

#-------------------------------------------------------------------------------
#   Generate an intermediate representation of a clang AST dump.
#-------------------------------------------------------------------------------
import re, json, sys, subprocess

def is_api_decl(decl, prefix):
    if 'name' in decl:
        return decl['name'].startswith(prefix)
    elif decl['kind'] == 'EnumDecl':
        # an anonymous enum, check if the items start with the prefix
        first = get_first_non_comment(decl['inner'])
        return first['name'].lower().startswith(prefix)
    else:
        return False

def get_first_non_comment(items):
    return next(i for i in items if i['kind'] != 'FullComment')

def strip_comments(items):
    return [i for i in items if i['kind'] != 'FullComment']

This file has been truncated. show original

…the gen_zig.py script then takes this ‘ir.json’ as input and generates a Zig module:

github.com

floooh/sokol/blob/master/bindgen/gen_zig.py

#-------------------------------------------------------------------------------
#   Generate Zig bindings.
#
#   Zig coding style:
#   - types are PascalCase
#   - functions are camelCase
#   - otherwise snake_case
#-------------------------------------------------------------------------------
import gen_ir
import os, shutil, sys
import textwrap

import gen_util as util

module_names = {
    'slog_':    'log',
    'sg_':      'gfx',
    'sapp_':    'app',
    'stm_':     'time',
    'saudio_':  'audio',

This file has been truncated. show original

…and the result then looks like this (for a somewhat simple C API):

github.com

floooh/sokol-zig/blob/master/src/sokol/audio.zig

// machine generated, do not edit

//
// sokol_audio.h -- cross-platform audio-streaming API
//
// Project URL: https://github.com/floooh/sokol
//
// Do this:
//     #define SOKOL_IMPL or
//     #define SOKOL_AUDIO_IMPL
// before you include this file in *one* C or C++ file to create the
// implementation.
//
// Optionally provide the following defines with your own implementations:
//
// SOKOL_DUMMY_BACKEND - use a dummy backend
// SOKOL_ASSERT(c)     - your own assert macro (default: assert(c))
// SOKOL_AUDIO_API_DECL- public function declaration prefix (default: extern)
// SOKOL_API_DECL      - same as SOKOL_AUDIO_API_DECL
// SOKOL_API_IMPL      - public function implementation prefix (default: -)

This file has been truncated. show original

nurpax · March 5, 2025, 8:58am

or using Zig’s comptime reflection to iterate over the @cImport ’ed struct

Not sure if your idea was to use this reflection to generate Zig source code and check that into Git, or to fully build the binding API at comptime. If the latter, this approach might produce an API that’s hard for programmers and ZLS to inspect. At least IME, ZLS tends to give up with many comptime constructs and it’d suck if there’s no IDE support for types and functions when using the bindings API. Your current approach doesn’t have any of these downsides.

floooh · March 5, 2025, 1:56pm

…the first option, e.g. it would be a separate Zig exe which @cImports the C header, uses comptime code to iterate over the reflection info in the imported struct (which would replace the clang-ast-dump step), ‘ziggify’ the C API and then write a Zig module output file which would be committed to git.

E.g. quite similar to the current approach, but replacing Clang and all the Python code with Zig.

kj4tmp · March 6, 2025, 4:43am

Currently I have generated the zig code using translate-c and I am hand editing it to remove the [*c] and replace with *T wherever I can.

What happens if I screw up? Will I get a compile error? What happens when my dependency changes, will I get a compile error then? How do I know that my bindings are accurate?

vulpesx · March 6, 2025, 5:01am

nope, if you are lucky segfault or some other runtime error. This is an example of how unsafe C is.
when interfacing with C you have to deal with it, cause there is no way zig can know what kind of pointer it should be thats why [*c] exists

depends on how it changes.
If the signature of types/functions change in a way thats incompatible you should get compile errors.
If it decides to use pointers differently without the above then segfault if your lucky.
You need to track the changes and make your own if necissary.

Again if signatures are inaccurate should be a compile error, beyond that it’s up to you to rtfm.

good luck

floooh · March 6, 2025, 7:58am

No, all sorts of weird things can happen from parameters arriving with wrong values on the Zig side, to memory corruption, to ‘impossible’ crashes. That’s why it is a good idea to not generate bindings manually

In my bindings I generate a Zig wrapper function which then calls the ‘raw’ C function, this Zig wrapper function can then do any type conversions or ‘safe casts’ if needed (mostly the Zig function just straight up calls the C function though - but at least this gives an additional safety layer if the args of the Zig wrapper function are not type-compatible with the C function args there will be a compile error)

kj4tmp · March 6, 2025, 8:32am

The translate c step gave me a lot of this:

(The Moved types roughly double the number of types).

pub extern fn z_query_reply_err(this_: [*c]const LoanedQuery, payload: [*c]MovedBytes, options: [*c]QueryReplyErrOptions) Result;

pub const MovedBytes = extern struct {
    _this: OwnedBytes = @import("std").mem.zeroes(OwnedBytes),
};

pub const OwnedBytes = extern struct {
    _0: [40]u8 = @import("std").mem.zeroes([40]u8),
};

pub fn z_bytes_move(arg_x: [*c]OwnedBytes) callconv(.c) [*c]MovedBytes {
    var x = arg_x;
    _ = &x;
    return @as([*c]MovedBytes, @ptrCast(@alignCast(x)));
}
pub fn z_bytes_take(arg_this_: [*c]OwnedBytes, arg_x: [*c]MovedBytes) callconv(.c) void {
    var this_ = arg_this_;
    _ = &this_;
    var x = arg_x;
    _ = &x;
    this_.* = x.*._this;
    z_internal_bytes_null(&x.*._this);
}

The “moved” and “owned” stuff I think comes from a rust style. It looks like I can just delete all the “moved” variants and just have “Owned”. What utility do these “moved” types offer?

Take a look at z_query_reply_err, if I change the signature from [*c]MovedBytes to *OwnedBytes I don’t think I am losing any infomation here. Because in zig, all function parameters are constant. I might as well rename OwnedBytes to just Bytes and delete MovedBytes and z_bytes_move?

Sze · March 6, 2025, 8:33am

If I were to do something like this, my preferred approach would be similar to what @floooh describes, but I would try to implement it as a Zig program that parses/reads an api specification and generates the code at build time using something like: https://ziglang.org/learn/build-system/#generating-zig and 0.14.0 release-notes #Allow-Packages-to-Expose-Arbitrary-LazyPaths-by-Name

kj4tmp · March 6, 2025, 8:38am

I’ve already gotten the translate-c source down to about ~2400 (from 5000) lines by just deleting duplicated type information and the extra C stuff.

I could write some code-gen to translate the headers better but honestly that code-gen would probably also be … 2000 lines? I think when I am finished hand editing it will be close to 1000 lines. It doesn’t seem wise to embark on a code-gen journey for the first round of bindings, maybe on the second go-around.

kj4tmp · March 6, 2025, 6:44pm

I think I change my mind about code-gen.
All I’m really doing is a bunch of find and replace operations. No reason I cant make a ZON file of those and just apply them serially to an array list of the code. Then when I update the dependency I can see the diff and add more / adjust if needed.
Maybe even run ast-check between each operation etc.

Pipeline could look like:

translate-c build step
apply serial find and replace operations as described by ZON file (deletions are just replace with \n)
maybe run ast-check / zig fmt a few times
output file
compare diff manually using git diff each time the dependency is updated
profit

I hope I don’t get addicted to this and start writing compilers.

kj4tmp · March 7, 2025, 3:22am

If anyone is curious, this is how I have built the pipeline:

build.zig

    // binding generation
    const translate_c = b.addTranslateC(.{
        .link_libc = true,
        .optimize = optimize,
        .target = target,
        .root_source_file = zenoh_c_dep.path("include/zenoh.h"),
    });
    const gen_tool = b.addExecutable(.{
        .name = "generate_bindings",
        .root_source_file = b.path("tools/generate_bindings.zig"),
        .target = target,
        .optimize = optimize,
    });
    gen_tool.root_module.addAnonymousImport("raw_translate_c", .{ .root_source_file = translate_c.getOutput() });

    const run_gen_tool = b.addRunArtifact(gen_tool);
    const generated_file = run_gen_tool.addOutputFileArg("c2.zig");
    const update_source = b.addUpdateSourceFiles();
    update_source.addCopyFileToSource(generated_file, "src/c2.zig");
    const gen_step = b.step("gen", "Generate bindings from the zenoh-c dependency, modifies source files!");
    gen_step.dependOn(&run_gen_tool.step);
    gen_step.dependOn(&update_source.step);

tools/generate_bindings.zig

const std = @import("std");

const raw_translate_c = @embedFile("raw_translate_c");

pub fn main() !void {
    var arena_state = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    defer arena_state.deinit();
    const arena = arena_state.allocator();

    const args = try std.process.argsAlloc(arena);

    if (args.len != 2) fatal("wrong number of arguments", .{});

    const output_file_path = args[1];

    var output_file = std.fs.cwd().createFile(output_file_path, .{}) catch |err| {
        fatal("unable to open '{s}': {s}", .{ output_file_path, @errorName(err) });
    };
    defer output_file.close();

    // maybe create an array list from raw c import and modify as needed...
    // for now just write unchanged
    try output_file.writeAll(raw_translate_c);
    return std.process.cleanExit();
}

fn fatal(comptime format: []const u8, args: anytype) noreturn {
    std.debug.print(format, args);
    std.process.exit(1);
}

kj4tmp · March 8, 2025, 8:17am

Alight I think I’ve got some decent bindings getting generated, and a path forward for updating dependency as needed. Now to the nitty gritty: writing an idiomatic API.

Here is an example. The c library has this config object, that is used to initialize a networking session:

pub const Config = extern struct {
    _0: [1840]u8 = @import("std").mem.zeroes([1840]u8),
};
pub const LoanedConfig = extern struct {
    _0: [1840]u8 = @import("std").mem.zeroes([1840]u8),
};
pub extern fn z_config_clone(dst: *Config, this_: *const LoanedConfig) void;
pub extern fn z_config_default(this_: *Config) Result;
pub extern fn z_config_drop(this_: *Config) void;
pub extern fn z_config_loan(this_: *const Config) *const LoanedConfig;
pub extern fn z_config_loan_mut(this_: *Config) *LoanedConfig;
pub extern fn zc_config_from_env(this_: *Config) Result;
pub extern fn zc_config_from_file(this_: *Config, path: [*c]const u8) Result;
pub extern fn zc_config_from_str(this_: *Config, s: [*c]const u8) Result;
pub extern fn zc_config_get_from_str(this_: *const LoanedConfig, key: [*c]const u8, out_value_string: [*c]String) Result;
pub extern fn zc_config_get_from_substr(this_: *const LoanedConfig, key: [*c]const u8, key_len: usize, out_value_string: [*c]String) Result;
pub extern fn zc_config_insert_json5(this_: *LoanedConfig, key: [*c]const u8, value: [*c]const u8) Result;
pub extern fn zc_config_insert_json5_from_substr(this_: *LoanedConfig, key: [*c]const u8, key_len: usize, value: [*c]const u8, value_len: usize) Result;
pub extern fn zc_config_to_string(config: *const LoanedConfig, out_config_string: [*c]String) Result;

And I have prototyped the following:

pub const c = @import("c.zig");

pub const Error = error{ZenohError};

pub fn err(code: c.Result) Error!void {
    if (code < 0) {
        return error.ZenohError;
    }
}
pub const Config = struct {
    _c: c.Config,

    pub fn initDefault() Error!Config {
        const c_config: c.Config = undefined;
        try err(c.z_config_default(&c_config));
        return Config{ ._c = c_config };
    }

    pub fn initFromEnv() Error!Config {
        const c_config: c.Config = undefined;
        try err(c.zc_config_from_env(&c_config));
        return Config{ ._c = c_config };
    }

    pub fn initFromFile(path: [:0]const u8) Error!Config {
        const c_config: c.Config = undefined;
        try err(c.zc_config_from_file(&c_config, path.ptr));
        return Config{ ._c = c_config };
    }

    pub fn initFromString(str: [:0]const u8) Error!Config {
        const c_config: c.Config = undefined;
        try err(c.zc_config_from_str(&c_config, str.ptr));
        return Config{ ._c = c_config };
    }

    pub fn deinit(self: Config) void {
        c.z_config_drop(&self._c);
    }
};

Is there a better way to approach this? Not sure how far I really wany to go down the code-gen route. I don’t think I want to codegen these idiomatic wrappers.

kj4tmp · March 10, 2025, 4:39am

So it turns out commiting generated files was a bad idea. The underlying c library has different header files depending on the architecture (the sizes of some opaque types changes between 64bit and 32 bit arches). What a pain.

Guess I need to fully generate inside the build system.

floooh · March 11, 2025, 10:21am

Would be interesting to know what causes the different sizes, and why those size differences are not automatically reflected on the Zig side.

For instance size_t/ssize_t/uintptr_t/intptr_t on the C side are 32- or 64-bits, but so is usize/isize on the Zig side. Same with pointers.

E.g. if you take a C struct, and ‘translate’ that to a Zig extern struct with the types mapped via translateC or manually via this mapping table Documentation - The Zig Programming Language the Zig-side types should be compatible with the C side types on the same computer, even if the the size is different for 32- vs 64-bit CPUs.

vulpesx · March 11, 2025, 10:56am

That shouldn’t be an issue since opaque types need to be behind pointers whose size is always known

kj4tmp · March 12, 2025, 3:09am

The issue with my first approach was I was using translate-c on header files supplied with the library that were for a specific architecture, and then trimming the output after that using find and replace operations.

Specifically, the dependency zenoh-c uses genereated header files, for example:

On 32 bit arch:

typedef struct ALIGN(4) z_owned_shm_mut_t {
  uint8_t _0[20];
} z_owned_shm_mut_t;

On 64 bit arch:

typedef struct ALIGN(8) z_owned_shm_mut_t {
  uint8_t _0[32];
} z_owned_shm_mut_t;

So the translate-c output will be different depending on the arch, so I cannot commit the output of translate-c into my repository (it will only be valid for a single architecture).

I have changed my pipeline to just directly use the translate-c output and I am just importing it now in the zig code:

build.zig.zon

    const translate_c = b.addTranslateC(.{
        .link_libc = true,
        .optimize = optimize,
        .target = b.resolveTargetQuery(.{ .cpu_arch = .x86_64, .os_tag = .linux, .abi = .musl }),
        .root_source_file = zenoh_c_dep.path("include/zenoh.h"),
    });
    zenoh.addImport("zenoh_c", translate_c.createModule());

root.zig

pub const c = @import("zenoh_c");
...
pub const Config = struct {
    _c: c.z_owned_config_t,

    pub fn initDefault() Error!Config {
        var c_config: c.z_owned_config_t = undefined;
        try err(c.z_config_default(&c_config));
        return Config{ ._c = c_config };
    }
}

And I was calling them opaque because they were in a file called zenoh_opaque.h (the c library is making some types “opaque” by hiding their contents as just bytes).