Converting a C API to Zig with the help of comptime

13 Likes

Cool write up. I’ve said it before: the alternative to implementing declarations as part of type reification is, in fact, codegen.

So if the purpose of comptime is to replace macros, it hasn’t yet succeeded.

4 Likes

I think the idea is that it is better to have 90% of code using just comptime and the remaining 10% using explicit manual codegen, as an alternative to macros.

Because with macros you end up having those macros peppered throughout all of your code and library code and also being used for silly things that don’t really require codegen or macros, so at least you have bigger pieces of simpler code in the former.

4 Likes

Go tried that. Eventually, they lost the argument.

1 Like

In terms of efforts though, I would say the ratio is reversed: 10% on comptime 90% on codegen. A lot of what the compiler does has to be replicated.

3 Likes

Go tried to be puritanical and avoid all meta programming except code generation, when every simple generic needs to be code generated that is clearly a way more extreme standpoint than what Zig is doing, where lots of things can be done with comptime alone.

2 Likes

Re ā€˜codegen as the build-time-version of comptime’, I did that for my Z80 emulator, and thought it would be really neat if the stdlib had a module which simplifies comptime build time(!) code generation, and goes beyond ā€˜write text to stdout’.

E.g. one idea would be an AST-builder API in the stdlib which allows to programmatically build an AST and then emit the resulting Zig code, and which makes it easy to transform declarative data (e.g. a description of the Z80 ISA like this: chips/codegen/z80_desc.yml at master Ā· floooh/chips Ā· GitHub) into Zig code (e.g. the Z80 instruction decoder switch statement, like this: chipz/src/chips/z80.zig at 58a31ec5acebeaf910fe29ecff445be09f703b3c Ā· floooh/chipz Ā· GitHub) via building an AST programmatically instead of directly formatting text.

…on the other hand such a codegen module should be flexible enough to just yolo it here and there and allow to inject freeform text into the output if the ā€˜string building’ actually turns out more convenient than building an AST programmatically in specific situations.

(PS: if it isn’t clear, I see build-time code generation as the natural extension of Zig’s comptime, and also saw it as the replacement for C’s missing generic features, e.g. Zig’s comptime doesn’t need to match macro systems of other languages if it provides a streamlined way to generate code in build.zig)

3 Likes

Well I don’t want to grandstand on the subject too much.

All I know is, reading that document, there’s a point where all the fun stuff has to go out the window and @chung-leong is (as he mentioned) going to 10x the effort to get to the destination.

I see in what he’s done an application which Zig should fully support. Which it does not. It’s clear what it would take to do the whole job using comptime, the tools are not available, and they could be.

I see that as both suboptimal and unstable.

1 Like

I didn’t explain why I resorted to parsing the output from translate-c. There were two show stopper: (a) I couldn’t get the argument names of function and (b) different C enums all get translated as c_uint.

In theory, (a) could be fixed with a builtin that gives us the name of a function (so we no longer have to resort to stupid hack) along with names of its arguments.

Dealing with (b) would require support for custom type or type annotation, a feature that has frequently been requested for other reasons (e.g. distinguishing strings from arrays of numbers between 0 and 255).

5 Likes

I was more responding to this:

While comptime allows us to ā€œtranslateā€ a function from C to Zig, it is not enough when we want to translate a whole API.

Which I view as actually mission-critical to Zig reaching the point it wants to go to.

The immediate problem is that Zig currently does not allow us to attach decls to a struct type through @Type(). We cannot create a new namespace with callable functions.

Right. Exactly that. It also means no one can write a library which generates CPython binding code from the export interface of a Zig module, or a library which fault-injects errors into use of a Zig type’s functions. This list is not bounded, codegen is difficult, brittle, and scales poorly.

And even if we could, such a solution would be too difficult to use since this new namespace would essentially be a mystery box. We need actual code that tells us what functions are available, code from which we can generate autodoc.

I believe this can be overcome with the right implementation. Doc comments could be a part of the .decl types, for example, which are currently just a name.

In any case, I need to take the time to make a long-form case on this subject. It can’t go on the issue board, and it won’t be persuasive from the comment section.

1 Like

Alternative to parsing translate-c is to take use of the zig’s builtin clang to dump the AST.

Disclaimer, above is very hacky code :face_without_mouth:

1 Like

clang ast-dumping is what I do for bindings generation of the sokol-headers.

Clang-ast-dump produces an extremely verbose JSON which I then reduce to a simplified JSON which has just the information needed for creating language bindings, all of this is in this python script:

…and then output-language specific scripts read this simplified JSON and produce the language bindings (for Zig such a script looks like this:

…since I control the C APIs (e.g. I’m not trying to create a bindings generator that works for all C APIs) I can do some shortcuts to simplify the language bindings generation. For instance I don’t allow C features like this in the public C APIs:

  • C unions are generally not allowed
  • all public symbols must have a common API specific prefix (e.g. sg_), which is used to find the actually relevant symbols for the bindings generaion (because the raw AST dump will contain everything that’s been included by the header)
  • global constants must be defined as an unnamed enum, not as a #define
  • nested anonymous structs are not allowed (e.g. all nested structs must have an explicit struct declaration outside their ā€˜container struct’)
  • parsing of function args and return values is hardcoded to a couple of cases and is extended on an ā€˜as-needed basis’
1 Like

I think this is really cool and I have had some success with it. If I can get it working it could be a fantastic way to maintain bindings that can be updated automatically.

A few issues I ran into:

  • There appears to be no way to generate an error set from an unnamed enum. When each enum value has a prefix, it should be possible.
  • Fails if a root struct is a typedef void
  • Please support using zigft as a build.zig.zon dependency. Perhaps I’m wrong but I feel like ā€œcopy this file to your src/ folderā€ is not the Zig way.
1 Like

Can you point me to the API in question? That’s potentially a difficult case to deal with. If the C enum is unnamed, then its values would just be a bunch of c_uint constants floating around.

Can you clarify what you mean by that? void is not a container in C or Zig.

Sure. GitHub - Mindwerks/wildmidi: WildMIDI is a simple software midi player which has a core softsynth library that can be used with other applications.

You’re right that it’s not a struct but a handle. I would usually write a wrapper struct that stores a reference to the handle.

1 Like

I see what you mean now. Basically, you want the handle be treated as an opaque, so you can do handle.method(...) instead of func(handle, ...). Sounds like a perfectly reasonable use case. So on detecting that c_root_struct refers to an int type, the generator should define a new packed struct backed by that int type.

P.S. And if it’s void*, a new opaque type should be defined.

2 Likes

Why output to .zig files at all, and waste compute cycles re-parsing the generated AST? The build system could hypothetically just pass the AST directly to the compiler.

And at that point, wouldn’t it just be equivalent to Rust’s procedural macros?

At some point you want to debug the generated code in a regular debugger.

And at that point, wouldn’t it just be equivalent to Rust’s procedural macros?

AFAIK the output of Rust proc macros is not debuggable either (I might be wrong though).

(although Zig comptime code isn’t debuggable either, but that’s something that really should be fixed longterm - no idea how though)

1 Like

An idea sort of hit me while I was thinking about the discussion in this thread. What if there’s a special C define that changes the behavior of translate-c such that it’d embed meta information into the function name? Say we have the follow in a header file:

void foo(const char* bytes, size_t bytes_len);

If we import the header in this manner:

const c = @cImport({
    @cDefine("__ZIG_NAME_MANGLING", {});
    @cInclude("foo.h");
});

Then translate-c would give us something like this:

export const @"foo:\"void\" bytes:\"const char*\" bytes_len:\"size_t\"" = @extern(
    fn (?[*] const u8, bytes_len: usize) callconv(.c) void, 
    .{ .name = "foo" },
});

That would gives us the missing information at comptime required for automated function transform. If we have the names of the types and the names of the arguments then we can establish naming conventions that we can reliable act upon. For instance, if a size_t argument ends in ā€œ[name]_lenā€ and the preceding argument is ā€œ[name]ā€, then these two arguments should be merged into a slice. Or if a pointer type’s name ends in ā€œ_maybeā€, then it should be handled as an optional.

This is similar to C++ name-mangling, except that we wouldn’t be mangling the actual names as they exist in the .so file. The metadata would come from the header file.

I think this can open up a lot of possibilities all without any change to the language itself. All we’re doing is making translate-c behave differently when a special constant is defined.

1 Like