Converting a C API to Zig with the help of comptime

chung-leong · April 28, 2025, 1:35pm

mnemnion · April 28, 2025, 6:06pm

Cool write up. I’ve said it before: the alternative to implementing declarations as part of type reification is, in fact, codegen.

So if the purpose of comptime is to replace macros, it hasn’t yet succeeded.

Sze · April 28, 2025, 6:28pm

I think the idea is that it is better to have 90% of code using just comptime and the remaining 10% using explicit manual codegen, as an alternative to macros.

Because with macros you end up having those macros peppered throughout all of your code and library code and also being used for silly things that don’t really require codegen or macros, so at least you have bigger pieces of simpler code in the former.

mnemnion · April 28, 2025, 6:57pm

Go tried that. Eventually, they lost the argument.

chung-leong · April 29, 2025, 12:52pm

In terms of efforts though, I would say the ratio is reversed: 10% on comptime 90% on codegen. A lot of what the compiler does has to be replicated.

Sze · April 29, 2025, 1:10pm

Go tried to be puritanical and avoid all meta programming except code generation, when every simple generic needs to be code generated that is clearly a way more extreme standpoint than what Zig is doing, where lots of things can be done with comptime alone.

floooh · April 29, 2025, 3:15pm

Re ‘codegen as the build-time-version of comptime’, I did that for my Z80 emulator, and thought it would be really neat if the stdlib had a module which simplifies ~~comptime~~ build time(!) code generation, and goes beyond ‘write text to stdout’.

E.g. one idea would be an AST-builder API in the stdlib which allows to programmatically build an AST and then emit the resulting Zig code, and which makes it easy to transform declarative data (e.g. a description of the Z80 ISA like this: chips/codegen/z80_desc.yml at master · floooh/chips · GitHub) into Zig code (e.g. the Z80 instruction decoder switch statement, like this: chipz/src/chips/z80.zig at 58a31ec5acebeaf910fe29ecff445be09f703b3c · floooh/chipz · GitHub) via building an AST programmatically instead of directly formatting text.

…on the other hand such a codegen module should be flexible enough to just yolo it here and there and allow to inject freeform text into the output if the ‘string building’ actually turns out more convenient than building an AST programmatically in specific situations.

(PS: if it isn’t clear, I see build-time code generation as the natural extension of Zig’s comptime, and also saw it as the replacement for C’s missing generic features, e.g. Zig’s comptime doesn’t need to match macro systems of other languages if it provides a streamlined way to generate code in build.zig)

mnemnion · April 29, 2025, 4:12pm

Well I don’t want to grandstand on the subject too much.

All I know is, reading that document, there’s a point where all the fun stuff has to go out the window and @chung-leong is (as he mentioned) going to 10x the effort to get to the destination.

I see in what he’s done an application which Zig should fully support. Which it does not. It’s clear what it would take to do the whole job using comptime, the tools are not available, and they could be.

I see that as both suboptimal and unstable.

chung-leong · April 29, 2025, 7:22pm

I didn’t explain why I resorted to parsing the output from translate-c. There were two show stopper: (a) I couldn’t get the argument names of function and (b) different C enums all get translated as c_uint.

In theory, (a) could be fixed with a builtin that gives us the name of a function (so we no longer have to resort to stupid hack) along with names of its arguments.

Dealing with (b) would require support for custom type or type annotation, a feature that has frequently been requested for other reasons (e.g. distinguishing strings from arrays of numbers between 0 and 255).

mnemnion · April 29, 2025, 7:43pm

I was more responding to this:

While comptime allows us to “translate” a function from C to Zig, it is not enough when we want to translate a whole API.

Which I view as actually mission-critical to Zig reaching the point it wants to go to.

The immediate problem is that Zig currently does not allow us to attach decls to a struct type through @Type(). We cannot create a new namespace with callable functions.

Right. Exactly that. It also means no one can write a library which generates CPython binding code from the export interface of a Zig module, or a library which fault-injects errors into use of a Zig type’s functions. This list is not bounded, codegen is difficult, brittle, and scales poorly.

And even if we could, such a solution would be too difficult to use since this new namespace would essentially be a mystery box. We need actual code that tells us what functions are available, code from which we can generate autodoc.

I believe this can be overcome with the right implementation. Doc comments could be a part of the .decl types, for example, which are currently just a name.

In any case, I need to take the time to make a long-form case on this subject. It can’t go on the issue board, and it won’t be persuasive from the comment section.

Cloudef · April 30, 2025, 2:55am

Alternative to parsing translate-c is to take use of the zig’s builtin clang to dump the AST.

github.com/Cloudef/pipewrangler

codegen/src/codegen.zig

c6e88260e


      
          }
          
          fn dumpAstJson(allocator: std.mem.Allocator, ctx: ParseContext, paths: []const []const u8) ![]const u8 {
              var arena: std.heap.ArenaAllocator = .init(allocator);
              defer arena.deinit();
          
              var argv: std.ArrayListUnmanaged([]const u8) = .empty;
              defer argv.deinit(allocator);
              try argv.appendSlice(allocator, &.{ ctx.zig, "cc", "-E", "-xc", "-I." });
              for (ctx.include_dirs) |dir| try argv.appendSlice(allocator, &.{ "-I", dir });
              try argv.appendSlice(allocator, &.{ "-Xclang", "-ast-dump=json", "-" });
              var child: std.process.Child = .init(argv.items, arena.allocator());
          
              child.stdin_behavior = .Pipe;
              child.stdout_behavior = .Pipe;
              child.stderr_behavior = .Pipe;
              child.progress_node = ctx.node;
              child.cwd_dir = ctx.cwd;
              try child.spawn();
              try child.waitForSpawn();
              errdefer _ = child.kill() catch {};

Disclaimer, above is very hacky code

floooh · April 30, 2025, 8:21am

clang ast-dumping is what I do for bindings generation of the sokol-headers.

Clang-ast-dump produces an extremely verbose JSON which I then reduce to a simplified JSON which has just the information needed for creating language bindings, all of this is in this python script:

github.com/floooh/sokol

bindgen/gen_ir.py

master

#-------------------------------------------------------------------------------
#   Generate an intermediate representation of a clang AST dump.
#-------------------------------------------------------------------------------
import re, json, sys, subprocess

def is_api_decl(decl, prefix):
    if 'name' in decl:
        return decl['name'].startswith(prefix)
    elif decl['kind'] == 'EnumDecl':
        # an anonymous enum, check if the items start with the prefix
        first = get_first_non_comment(decl['inner'])
        return first['name'].lower().startswith(prefix)
    else:
        return False

def get_first_non_comment(items):
    return next(i for i in items if i['kind'] != 'FullComment')

def strip_comments(items):
    return [i for i in items if i['kind'] != 'FullComment']

This file has been truncated. show original

…and then output-language specific scripts read this simplified JSON and produce the language bindings (for Zig such a script looks like this:

github.com/floooh/sokol

bindgen/gen_zig.py

master

#-------------------------------------------------------------------------------
#   Generate Zig bindings.
#
#   Zig coding style:
#   - types are PascalCase
#   - functions are camelCase
#   - otherwise snake_case
#-------------------------------------------------------------------------------
import gen_ir
import os, shutil, sys
import textwrap

import gen_util as util

module_names = {
    'slog_':    'log',
    'sg_':      'gfx',
    'sapp_':    'app',
    'stm_':     'time',
    'saudio_':  'audio',

This file has been truncated. show original

…since I control the C APIs (e.g. I’m not trying to create a bindings generator that works for all C APIs) I can do some shortcuts to simplify the language bindings generation. For instance I don’t allow C features like this in the public C APIs:

C unions are generally not allowed
all public symbols must have a common API specific prefix (e.g. sg_), which is used to find the actually relevant symbols for the bindings generaion (because the raw AST dump will contain everything that’s been included by the header)
global constants must be defined as an unnamed enum, not as a #define
nested anonymous structs are not allowed (e.g. all nested structs must have an explicit struct declaration outside their ‘container struct’)
parsing of function args and return values is hardcoded to a couple of cases and is extended on an ‘as-needed basis’

_sh · May 4, 2025, 12:17pm

I think this is really cool and I have had some success with it. If I can get it working it could be a fantastic way to maintain bindings that can be updated automatically.

A few issues I ran into:

There appears to be no way to generate an error set from an unnamed enum. When each enum value has a prefix, it should be possible.
Fails if a root struct is a typedef void
Please support using zigft as a build.zig.zon dependency. Perhaps I’m wrong but I feel like “copy this file to your src/ folder” is not the Zig way.

chung-leong · May 8, 2025, 10:05pm

Can you point me to the API in question? That’s potentially a difficult case to deal with. If the C enum is unnamed, then its values would just be a bunch of c_uint constants floating around.

Can you clarify what you mean by that? void is not a container in C or Zig.

_sh · May 8, 2025, 11:22pm

Sure. GitHub - Mindwerks/wildmidi: WildMIDI is a simple software midi player which has a core softsynth library that can be used with other applications.

_sh · May 8, 2025, 11:25pm

You’re right that it’s not a struct but a handle. I would usually write a wrapper struct that stores a reference to the handle.

github.com/Mindwerks/wildmidi

include/wildmidi_lib.h

master

/*
 * wildmidi_lib.h -- Midi Wavetable Processing library
 *
 * Copyright (C) WildMIDI Developers 2001-2024
 *
 * This file is part of WildMIDI.
 *
 * WildMIDI is free software: you can redistribute and/or modify the player
 * under the terms of the GNU General Public License and you can redistribute
 * and/or modify the library under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation, either version 3 of
 * the licenses, or(at your option) any later version.
 *
 * WildMIDI is distributed in the hope that it will be useful, but WITHOUT
 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License and
 * the GNU Lesser General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License and the
 * GNU Lesser General Public License along with WildMIDI.  If not,  see

This file has been truncated. show original

chung-leong · May 9, 2025, 12:30pm

I see what you mean now. Basically, you want the handle be treated as an opaque, so you can do handle.method(...) instead of func(handle, ...). Sounds like a perfectly reasonable use case. So on detecting that c_root_struct refers to an int type, the generator should define a new packed struct backed by that int type.

P.S. And if it’s void*, a new opaque type should be defined.

joed · May 9, 2025, 2:19pm

Why output to .zig files at all, and waste compute cycles re-parsing the generated AST? The build system could hypothetically just pass the AST directly to the compiler.

And at that point, wouldn’t it just be equivalent to Rust’s procedural macros?

floooh · May 9, 2025, 3:16pm

At some point you want to debug the generated code in a regular debugger.

And at that point, wouldn’t it just be equivalent to Rust’s procedural macros?

AFAIK the output of Rust proc macros is not debuggable either (I might be wrong though).

(although Zig comptime code isn’t debuggable either, but that’s something that really should be fixed longterm - no idea how though)

chung-leong · May 14, 2025, 9:11pm

An idea sort of hit me while I was thinking about the discussion in this thread. What if there’s a special C define that changes the behavior of translate-c such that it’d embed meta information into the function name? Say we have the follow in a header file:

void foo(const char* bytes, size_t bytes_len);

If we import the header in this manner:

const c = @cImport({
    @cDefine("__ZIG_NAME_MANGLING", {});
    @cInclude("foo.h");
});

Then translate-c would give us something like this:

export const @"foo:\"void\" bytes:\"const char*\" bytes_len:\"size_t\"" = @extern(
    fn (?[*] const u8, bytes_len: usize) callconv(.c) void, 
    .{ .name = "foo" },
});

That would gives us the missing information at comptime required for automated function transform. If we have the names of the types and the names of the arguments then we can establish naming conventions that we can reliable act upon. For instance, if a size_t argument ends in “[name]_len” and the preceding argument is “[name]”, then these two arguments should be merged into a slice. Or if a pointer type’s name ends in “_maybe”, then it should be handled as an optional.

This is similar to C++ name-mangling, except that we wouldn’t be mangling the actual names as they exist in the .so file. The metadata would come from the header file.

I think this can open up a lot of possibilities all without any change to the language itself. All we’re doing is making translate-c behave differently when a special constant is defined.