Cool write up. Iāve said it before: the alternative to implementing declarations as part of type reification is, in fact, codegen.
So if the purpose of comptime is to replace macros, it hasnāt yet succeeded.
I think the idea is that it is better to have 90% of code using just comptime and the remaining 10% using explicit manual codegen, as an alternative to macros.
Because with macros you end up having those macros peppered throughout all of your code and library code and also being used for silly things that donāt really require codegen or macros, so at least you have bigger pieces of simpler code in the former.
Go tried that. Eventually, they lost the argument.
In terms of efforts though, I would say the ratio is reversed: 10% on comptime 90% on codegen. A lot of what the compiler does has to be replicated.
Go tried to be puritanical and avoid all meta programming except code generation, when every simple generic needs to be code generated that is clearly a way more extreme standpoint than what Zig is doing, where lots of things can be done with comptime alone.
Re ācodegen as the build-time-version of comptimeā, I did that for my Z80 emulator, and thought it would be really neat if the stdlib had a module which simplifies comptime build time(!) code generation, and goes beyond āwrite text to stdoutā.
E.g. one idea would be an AST-builder API in the stdlib which allows to programmatically build an AST and then emit the resulting Zig code, and which makes it easy to transform declarative data (e.g. a description of the Z80 ISA like this: chips/codegen/z80_desc.yml at master Ā· floooh/chips Ā· GitHub) into Zig code (e.g. the Z80 instruction decoder switch statement, like this: chipz/src/chips/z80.zig at 58a31ec5acebeaf910fe29ecff445be09f703b3c Ā· floooh/chipz Ā· GitHub) via building an AST programmatically instead of directly formatting text.
ā¦on the other hand such a codegen module should be flexible enough to just yolo it here and there and allow to inject freeform text into the output if the āstring buildingā actually turns out more convenient than building an AST programmatically in specific situations.
(PS: if it isnāt clear, I see build-time code generation as the natural extension of Zigās comptime, and also saw it as the replacement for Cās missing generic features, e.g. Zigās comptime doesnāt need to match macro systems of other languages if it provides a streamlined way to generate code in build.zig)
Well I donāt want to grandstand on the subject too much.
All I know is, reading that document, thereās a point where all the fun stuff has to go out the window and @chung-leong is (as he mentioned) going to 10x the effort to get to the destination.
I see in what heās done an application which Zig should fully support. Which it does not. Itās clear what it would take to do the whole job using comptime, the tools are not available, and they could be.
I see that as both suboptimal and unstable.
I didnāt explain why I resorted to parsing the output from translate-c. There were two show stopper: (a) I couldnāt get the argument names of function and (b) different C enums all get translated as c_uint
.
In theory, (a) could be fixed with a builtin that gives us the name of a function (so we no longer have to resort to stupid hack) along with names of its arguments.
Dealing with (b) would require support for custom type or type annotation, a feature that has frequently been requested for other reasons (e.g. distinguishing strings from arrays of numbers between 0 and 255).
I was more responding to this:
While comptime allows us to ātranslateā a function from C to Zig, it is not enough when we want to translate a whole API.
Which I view as actually mission-critical to Zig reaching the point it wants to go to.
The immediate problem is that Zig currently does not allow us to attach decls to a struct type through
@Type()
. We cannot create a new namespace with callable functions.
Right. Exactly that. It also means no one can write a library which generates CPython binding code from the export
interface of a Zig module, or a library which fault-injects errors into use of a Zig typeās functions. This list is not bounded, codegen is difficult, brittle, and scales poorly.
And even if we could, such a solution would be too difficult to use since this new namespace would essentially be a mystery box. We need actual code that tells us what functions are available, code from which we can generate autodoc.
I believe this can be overcome with the right implementation. Doc comments could be a part of the .decl
types, for example, which are currently just a name.
In any case, I need to take the time to make a long-form case on this subject. It canāt go on the issue board, and it wonāt be persuasive from the comment section.
Alternative to parsing translate-c is to take use of the zigās builtin clang to dump the AST.
Disclaimer, above is very hacky code
clang ast-dumping is what I do for bindings generation of the sokol-headers.
Clang-ast-dump produces an extremely verbose JSON which I then reduce to a simplified JSON which has just the information needed for creating language bindings, all of this is in this python script:
ā¦and then output-language specific scripts read this simplified JSON and produce the language bindings (for Zig such a script looks like this:
ā¦since I control the C APIs (e.g. Iām not trying to create a bindings generator that works for all C APIs) I can do some shortcuts to simplify the language bindings generation. For instance I donāt allow C features like this in the public C APIs:
- C unions are generally not allowed
- all public symbols must have a common API specific prefix (e.g.
sg_
), which is used to find the actually relevant symbols for the bindings generaion (because the raw AST dump will contain everything thatās been included by the header) - global constants must be defined as an unnamed enum, not as a
#define
- nested anonymous structs are not allowed (e.g. all nested structs must have an explicit struct declaration outside their ācontainer structā)
- parsing of function args and return values is hardcoded to a couple of cases and is extended on an āas-needed basisā
I think this is really cool and I have had some success with it. If I can get it working it could be a fantastic way to maintain bindings that can be updated automatically.
A few issues I ran into:
- There appears to be no way to generate an error set from an unnamed enum. When each enum value has a prefix, it should be possible.
- Fails if a root struct is a
typedef void
- Please support using zigft as a
build.zig.zon
dependency. Perhaps Iām wrong but I feel like ācopy this file to yoursrc/
folderā is not the Zig way.
Can you point me to the API in question? Thatās potentially a difficult case to deal with. If the C enum is unnamed, then its values would just be a bunch of c_uint
constants floating around.
Can you clarify what you mean by that? void
is not a container in C or Zig.
Youāre right that itās not a struct but a handle. I would usually write a wrapper struct that stores a reference to the handle.
I see what you mean now. Basically, you want the handle be treated as an opaque, so you can do handle.method(...)
instead of func(handle, ...)
. Sounds like a perfectly reasonable use case. So on detecting that c_root_struct
refers to an int type, the generator should define a new packed struct backed by that int type.
P.S. And if itās void*
, a new opaque type should be defined.
Why output to .zig
files at all, and waste compute cycles re-parsing the generated AST? The build system could hypothetically just pass the AST directly to the compiler.
And at that point, wouldnāt it just be equivalent to Rustās procedural macros?
At some point you want to debug the generated code in a regular debugger.
And at that point, wouldnāt it just be equivalent to Rustās procedural macros?
AFAIK the output of Rust proc macros is not debuggable either (I might be wrong though).
(although Zig comptime code isnāt debuggable either, but thatās something that really should be fixed longterm - no idea how though)
An idea sort of hit me while I was thinking about the discussion in this thread. What if thereās a special C define that changes the behavior of translate-c such that itād embed meta information into the function name? Say we have the follow in a header file:
void foo(const char* bytes, size_t bytes_len);
If we import the header in this manner:
const c = @cImport({
@cDefine("__ZIG_NAME_MANGLING", {});
@cInclude("foo.h");
});
Then translate-c would give us something like this:
export const @"foo:\"void\" bytes:\"const char*\" bytes_len:\"size_t\"" = @extern(
fn (?[*] const u8, bytes_len: usize) callconv(.c) void,
.{ .name = "foo" },
});
That would gives us the missing information at comptime required for automated function transform. If we have the names of the types and the names of the arguments then we can establish naming conventions that we can reliable act upon. For instance, if a size_t argument ends in ā[name]_lenā and the preceding argument is ā[name]ā, then these two arguments should be merged into a slice. Or if a pointer typeās name ends in ā_maybeā, then it should be handled as an optional.
This is similar to C++ name-mangling, except that we wouldnāt be mangling the actual names as they exist in the .so
file. The metadata would come from the header file.
I think this can open up a lot of possibilities all without any change to the language itself. All weāre doing is making translate-c behave differently when a special constant is defined.