Experimental tool to generate idiomatic Zig bindings from C++

lassade · June 25, 2023, 5:25pm

Hello!

In the past couple of days I wrote a small tool called c2z that can create idiomatic zig bindings from c++ code. It uses clang -ast-dump command to criate the c++ AST that can be then traversed by my tool, so no more c glue code and the best part of it is that it can transpile c++ code this includes funcions and even template classes! But is intended to work with code written using the c with classes coding style and it will always require some user tweaks.

The project is its early stages, but please try it out and let me know if it works for you.

link to c2z

AndrewCodeDev · June 25, 2023, 7:49pm

I think this is a really great idea frankly. I like how you decided to go from the AST using existing tools because in some sense, this becomes a syntax tree problem instead of trying to tackle the C++ tool chain or build another parser. I think using existing tools basically lets the existing C++ community do the heavy lifting for you.

Help me understand your example code in your library a bit more (albeit, this is in the early phases) because I see that you have written some “use_cases” and I looked at your vector file and I’m not sure I’m following what is being accomplished here. Can you explain a bit more about what’s going on there?

lassade · June 25, 2023, 9:36pm

Thanks! and best part is that everything is bundled with zig, so you only need zig installed

Heres what is going on: the “use_cases” folder contains many gamedev libs that I want to create bindings for, that was what prompted me to do this project in the first place, but the “std” and “common_cases” folders don’t follow this pattern.

The “std” folder contains code that I use to understand the memory layout and the inner workings of common c++ std classes so that I can manually create my own zig compatible implementations. “use_cases/std/vector.cpp” explores the std::vector. Give a look at “src/cpp.zig” to see my zig impl.
The “common_cases” folder is made of small snippets of c++ code taken from the other libs so I can more easily analyze clang AST. I also write tests out of them.

AndrewCodeDev · June 25, 2023, 11:00pm

First, my condolences to anyone reading through standard library implementations… _M_Impl type variables and the jungle of inline namespaces and macro controls are truly eye-watering. I’ve done a significant amount of reading into the STL so I’m always happy to talk about it but it’s… eh… god…

So at what point does your implementation come into play? If I am understanding correctly, you are creating implementation classes that mirror (to a high-degree) existing structures in STL. Is your intention to swap the implementations when you come across them via your translation techniques? So like… std::something → zig.something?

This is interesting to me for several reasons, and I’m very curious about how you want to approach this problem (library design is a big passion of mine). So excuse the brain dump, but here’s some thoughts…

First, I think several functions in STL are really bad ideas on the whole, so you may want to have a better mechanism because you don’t have to support legacy code. So… for instance… push_back is a fantastic example because it’s essentially dwarfed by emplace_back. If a push_back takes a const& object via it’s argument, that same thing can be deduced by emplace_back because const will be appended when it reaches the copy constructor of the destination object. Thus, you really don’t need push_back in the general case and there are a lot of standard’s best-practices where they explicitly say to favor emplace_back due to its variadic universal forwarding.

So in that case, you could easily get away with a single function (if such a thing is doable… probably is with tuples in zig) that simply maps both push_back and emplace_back to the same function. So you could get away with condensing quite a few things if you analyze the edge-cases carefully.

The second major issue I forecast is dealing with allocators. C++ (as I’m sure you know) is very happy to give you a default allocator that just calls new and free whereas in zig, that’s more difficult. If you haven’t seen the proposals on how pinned memory could solve some of the pain points around things like allocators, I strongly suggest you do because that feature could save you an immense amount of time and strange design decisions.

I could keep going, but I think that is good for one post - curious to hear your thoughts.

Durobot · June 26, 2023, 9:11am

Whoa, this is really interesting.
However, I am confused:

“Avoid glue C code” - how does it achieve its target then?
I have tried building the project (git clone https://github.com/lassade/c2z.git, cd c2z, zig build), got this error:

zig build-exe zbg Debug native: error: the following command failed with 1 compilation errors:
/home/archie/apps/zig/zig build-exe /home/archie/projects/c2z/src/main.zig --cache-dir /home/archie/projects/c2z/zig-cache --global-cache-dir /home/archie/.cache/zig --name zbg --mod clap::/home/archie/projects/c2z/libs/zig-clap/clap.zig --deps clap --listen=- 
Build Summary: 0/3 steps succeeded; 1 failed (disable with --summary none)
install transitive failure
 install zbg transitive failure
    zig build-exe zbg Debug native 1 errors
src/main.zig:94:22: error: invalid builtin function: '@intToFloat'
            (100.0 * @intToFloat(f64, transpiler.nodes_visited) / @intToFloat(f64, transpiler.nodes_count)),
                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(zig version 0.11.0-dev.3803+7ad104227)
Am I even supposed to build it?
I must admit I am too dumb to understand how to use your tool. I have a very vague understanding of compilers’ inner workings…

Edit: either I’ve gone bananas, or they have changed @intToFloat to @floatFromInt: https://ziglang.org/documentation/master/#floatFromInt

Edit 2: https://ziggit.dev/t/floattoint-inttofloat-gone/914

lassade · June 26, 2023, 11:59am

Yes and it already does this!

We will end up having both a zig and a c++ impl from any transpiled function, this includes the std::vector, so I can do the translation from “std::vector::push_back” to “cpp.Vector.append” in the zig side, but the easy to accomplish this right now will be to just provide an “push_back” implementation when they are needed we see when gets to it.

But I want to keep this as simple and dumb as possible, any extra work than the bare minimum will be too much;

About the second point:

There is a “cpp.VectorAlloc” that support custom allocators that mimics the c++ code so conceptually we could pass a our zig “std.mem.Allocator” to it and in the c++ side there will be a “std::vector<MyType, zig::Allocator>”, this will require small modifications in the library side. But for general rule of thumb if the c++ lib allows you to use a custom allocator you can pass the zig allocator to it in a way or another, if not you will be stuck with free/malloc/realloc.

lassade · June 26, 2023, 12:05pm

@Durobot yep they did I wrote this on 0.11.0-dev.3220+447a30299 guess I never learn to use a sable version should have done it in the 0.10

It’s very easy to use it:

you need zig in your PATH
goto in the project directory
zig build run -- path\to\my\includeFile.h
use --cargs to pass arguments to clang this is useful to includes zig build run -- --cargs "-I.\use_cases\msdfgen\include\" .\use_cases\msdfgen\include\core\generator-config.h
modify the generated bindings until it works, you might need to import cpp.zig in the src folder

I updated the readme with these instructions let me know if you still needs some help using it.

AndrewCodeDev · June 26, 2023, 11:02pm

On your point about allocators - I’m glad you are aware of that… and allocators became much more “friendly” since C++17 so you may be able to make some useful linkages to the std::pmr namespace to great effect.

Also, I appreciate your point about making it as simple as possible right now - I suppose my point in that direction was meant to communicate how the C++ standard library already over-complicates itself by default (lol). That said, from a project standpoint, you will probably have to do less mental algebra if you can make a 1:1 implementation.

So that said, I’m curious now about what your take is on certain features like “noexcept” and move semantics. So vectors will not use move constructors unless they are marked noexcept (common mistake in C++ implementations), so there are certain “guarantees” that the STL makes that people using them would expect to see. I’m pointing to the general problem of Hyrum’s law here. So, would you make a mapping between something marked as noexecpt to a non-error-union return type?

If so, there’s a lot of work there to be done in the case of exceptions because functions that do not mark themselves as noexcept may throw or not. The madness of bad-defaults is one of the primary things that is driving me away from C++ in general.

lassade · June 27, 2023, 10:39am

I wasn’t even thinking about it , I was expecting people to compile their code with -fno-exceptions on, most of the code that I want to transpile is designed not to throw.

I’m only familiar with 2 patterns of error checking:

if (do_stuff() != 0) { /* handle error */ }

do_stuff();
if (have_error()) { /* handle error */  }

How exceptions can be handled in the zig side?

AndrewCodeDev · June 27, 2023, 2:49pm

It’s a very tenuous mapping because basically Zig makes the correct choice of returning errors by value (sort of like what you pointed out with your return codes).

I will think about it a little bit more, but there are some handy devices here that may get you started (apologies if you already know all this).

https://en.cppreference.com/w/cpp/types/is_move_constructible

Speficially, there is a type_trait there called std::is_nothrow_move_constructible.

So that thing basically contains a static bool and operates through template resolution. A good place to see how this is used is in std::tuple, because it has to propogate things like “explicity constructible” types throughout the tuple. So basically, if a single type in a tuple requires explicit construction, the whole tuple will too. These traits are used (in the example above) to see if something is marked as noexcept or not.

I’ll have to think on the mapping issue a bit more between the two languages. Exceptions are going to be bad news though because of how awkward they make the return paths.