What is 'zig cc' actually doing?

Just out of curiosity I tried

strace -f zig cc some.c 2>log

Below are some excerpts from log:

[pid 25482] execve("/opt/zig-0.11/cc", ["/opt/zig-0.11/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7fa8079e3370 /* 52 vars */) = -1 ENOENT
[pid 25482] execve("/home/user/bin/cc", ["/home/zed/bin/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7fa8079e3370 /* 52 vars */) = -1 ENOENT
[pid 25482] execve("/usr/local/sbin/cc", ["/usr/local/sbin/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7fa8079e3370 /* 52 vars */) = -1 ENOENT
[pid 25482] execve("/usr/local/bin/cc", ["/usr/local/bin/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7fa8079e3370 /* 52 vars */) = -1 ENOENT
[pid 25482] execve("/usr/sbin/cc", ["/usr/sbin/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7fa8079e3370 /* 52 vars */) = -1 ENOENT
[pid 25482] execve("/usr/bin/cc", ["/usr/bin/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7fa8079e3370 /* 52 vars */) = 0

As we can see, Zig is trying to find C compiler at different places, finally finds it and runs it with -E.

Then again:

[pid 25483] execve("/usr/lib/gcc/x86_64-linux-gnu/11/cc1", ["/usr/lib/gcc/x86_64-linux-gnu/11"..., "-E", "-quiet", "-imultiarch", "x86_64-linux-gnu", "-v", "/dev/null", "-mtune=generic", "-march=x86-64", "-fasynchronous-unwind-tables", "-fstack-protector-strong", "-Wformat", "-Wformat-security", "-fstack-clash-protection", "-fcf-protection", "-dumpbase", "null"], 0x201e620 /* 56 vars */ <unfinished ...>

And again:

[pid 25484] execve("/opt/zig-0.11/cc", ["/opt/zig-0.11/cc", "-print-file-name=crt1.o"], 0x7fa8079e6518 /* 52 vars */) = -1 ENOENT (Нет такого файла или каталога)
[pid 25484] execve("/home/user/bin/cc", ["/home/zed/bin/cc", "-print-file-name=crt1.o"], 0x7fa8079e6518 /* 52 vars */) = -1 ENOENT (Нет такого файла или каталога)
[pid 25484] execve("/usr/local/sbin/cc", ["/usr/local/sbin/cc", "-print-file-name=crt1.o"], 0x7fa8079e6518 /* 52 vars */) = -1 ENOENT
[pid 25484] execve("/usr/local/bin/cc", ["/usr/local/bin/cc", "-print-file-name=crt1.o"], 0x7fa8079e6518 /* 52 vars */) = -1 ENOENT
[pid 25484] execve("/usr/sbin/cc", ["/usr/sbin/cc", "-print-file-name=crt1.o"], 0x7fa8079e6518 /* 52 vars */) = -1 ENOENT
[pid 25484] execve("/usr/bin/cc", ["/usr/bin/cc", "-print-file-name=crt1.o"], 0x7fa8079e6518 /* 52 vars */) = 0

Then

[pid 25485] execve("/opt/zig-0.11/zig", ["/opt/zig-0.11/zig", "ld.lld", "--error-limit=0", "-O0", "-z", "stack-size=16777216", "--gc-sections", "--eh-frame-hdr", "-znow", "-m", "elf_x86_64", "-o", "a.out", "/usr/lib/gcc/x86_64-linux-gnu/11"..., "/usr/lib/gcc/x86_64-linux-gnu/11"..., "-L", "/usr/local/lib64", "-L", "/usr/local/lib", "-L", "/usr/lib/x86_64-linux-gnu", "-L", "/lib64", "-L", "/lib", "-L", "/usr/lib64", "-L", "/usr/lib", "-L", "/lib/x86_64-linux-gnu", "-L", ...], 0x7ffd5672cbd8 /* 51 vars */) = 0

Looks like a cheating.
Ok, I renamed /usr/bin/cc to some other name and…

[pid 35898] execve("/opt/zig-0.11/cc", ["/opt/zig-0.11/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7f22633ea370 /* 52 vars */) = -1 ENOENT
[pid 35898] execve("/home/user/bin/cc", ["/home/zed/bin/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7f22633ea370 /* 52 vars */) = -1 ENOENT
[pid 35898] execve("/usr/local/sbin/cc", ["/usr/local/sbin/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7f22633ea370 /* 52 vars */) = -1 ENOENT
[pid 35898] execve("/usr/local/bin/cc", ["/usr/local/bin/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7f22633ea370 /* 52 vars */) = -1 ENOENT
[pid 35898] execve("/usr/sbin/cc", ["/usr/sbin/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7f22633ea370 /* 52 vars */) = -1 ENOENT
[pid 35898] execve("/usr/bin/cc", ["/usr/bin/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7f22633ea370 /* 52 vars */) = -1 ENOENT
[pid 35898] execve("/sbin/cc", ["/sbin/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7f22633ea370 /* 52 vars */) = -1 ENOENT
[pid 35898] execve("/bin/cc", ["/bin/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7f22633ea370 /* 52 vars */) = -1 ENOENT
[pid 35898] execve("/usr/games/cc", ["/usr/games/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7f22633ea370 /* 52 vars */) = -1 ENOENT
[pid 35898] execve("/usr/local/games/cc", ["/usr/local/games/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7f22633ea370 /* 52 vars */) = -1 ENOENT
[pid 35898] execve("/snap/bin/cc", ["/snap/bin/cc", "-E", "-Wp,-v", "-xc", "/dev/null"], 0x7f22633ea370 /* 52 vars */) = -1 ENOENT

and then

[pid 35899] execve("/opt/zig-0.11/zig", ["/opt/zig-0.11/zig", "ld.lld", "--error-limit=0", "-O0", "-z", "stack-size=16777216", "--gc-sections", "--eh-frame-hdr", "-znow", "-m", "elf_x86_64", "-o", "a.out", "/home/zed/.cache/zig/o/f5cbeaf1e"..., "/home/zed/.cache/zig/o/b021b9896"..., "-L", "/usr/local/lib64", "-L", "/usr/local/lib", "-L", "/usr/lib/x86_64-linux-gnu", "-L", "/lib64", "-L", "/lib", "-L", "/usr/lib64", "-L", "/usr/lib", "-L", "/lib/x86_64-linux-gnu", "-dynamic-linker", ...], 0x7fff7ca7a4b8 /* 51 vars */) = 0

It seems that

  • if Zig finds some C compiler it uses it
  • otherwise it uses it’s own C compiler.

Is that true?

1 Like

:face_with_open_eyes_and_hand_over_mouth:
This works correctly, but I was not aware about using system CC.

env PATH="" /usr/bin/strace -f ~/zig/zig cc some.c 2>log

It’s discovering information about your system, running the command on my machine gives this output:

 ⚡ cc -E "-Wp,-v" -xc /dev/null
clang -cc1 version 14.0.3 (clang-1403.0.22.14.1) default target arm64-apple-darwin23.0.0
ignoring nonexistent directory "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/local/include"
ignoring nonexistent directory "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/Library/Frameworks"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/14.0.3/include
 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include
 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include
 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks (framework directory)
End of search list.
# 1 "/dev/null"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 414 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "/dev/null" 2
3 Likes

@kristoff you are right, it’s configure functionality.
It runs:

  1. cc -E -Wp,-v -xc /dev/null
  2. cc1 -E -quiet -imultiarch x86_64-linux-gnu -v /dev/null -mtune=generic -march=x86-64 -fasynchronous-unwind-tables -dumpbase null
  3. cc -print-file-name=crt1.o
  4. the last exec call is for itself zig ld.lld ....
1 Like

Ok.
There is another small observation.

  1. Running zig cc with gcc present
  • a.out.1 size is 5168 bytes
  • nm a.out.1
000000000020028c r __abi_tag
0000000000201500 t _dl_relocate_static_pie
00000000002025d0 d _DYNAMIC
0000000000201594 t _fini
0000000000202750 d _GLOBAL_OFFSET_TABLE_
                 w __gmon_start__
0000000000201578 t _init
                 U __libc_start_main
0000000000201510 T main
                 U printf
00000000002014d0 T _start
  1. Running zig cc with /usr/bin/cc symbolic link renamed to /usr/bin/cc__, so that Zig can not find it.
  • a.out.2 size is 8512 bytes
  • nm a.out.2
0000000000202670 d _DYNAMIC
0000000000201640 t _fini
                 w __gmon_start__
0000000000201628 t _init
0000000000201500 t __init_array_end
0000000000201500 t __init_array_start
0000000000201620 T __libc_csu_fini
00000000002015a0 T __libc_csu_init
                 U __libc_start_main
0000000000201530 T main
                 U printf
0000000000201500 T _start

Why resulting executable files are different depending on whether gcc (in my case) presents in the system or not?

Run ldd on each output executable. My guess is that with gcc present you will get a dynamic executable, while without gcc you will get a static executable.

If that’s the case, then Zig is using the presence of gcc as a way of detecting if your system uses glibc or something else by default.

That’s because by not setting a -target you’re asking for a native build and this is work that Zig is doing to understand what “native” means for your specific system.

No, both are dynamically linked:

$ ls -l a.out.* | awk {'print $5" "$9'}
5168 a.out.1 // with gcc
8512 a.out.2 // without gcc
$ file a.out.1
a.out.1: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2,
for GNU/Linux 3.2.0, with debug_info, not stripped

$ file a.out.2
a.out.2: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2,
for GNU/Linux 2.0.0, with debug_info, not stripped

ldd says this:

$ ldd a.out.1
	linux-vdso.so.1 (0x00007ffe88dfb000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8a6ddf5000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f8a6e03b000)

$ ldd a.out.2
	linux-vdso.so.1 (0x00007ffd6cbb7000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3bf999d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f3bf9be3000)

hah, interesting, then I don’t know the reason behind the difference!

maybe this is the difference?:

for GNU/Linux 3.2.0, with debug_info, not stripped
for GNU/Linux 2.0.0, with debug_info, not stripped

My guess would be that the second version assumes much less about the system and creates way less optimized code to run on very old systems?

I also noticed this, just did not write about it.

I do not know, but the way Zig collects info about system (i.e. running in system C compiler) looks a bit strange to me. Also it is not clear what kind of information Zig gets by that way and why all this affects resulting executable.

A long while ago I have seen a video where Andrew demonstrated / had freshly worked out that behavior, sadly I don’t quite know which video / stream / talk it was.

This is vagely from what I remember:
I think one of the reasons for this searching behavior was to avoid having to configure/provide/rely upon other external tools that may or may not be there. Instead of having to provide zig with some path to something like autotools, zig would just search for and find something to use to overcome the bootstrapping problem and avoiding an external toolchain being needed. (I don’t know the exact details about this)

This was before zig became self-hosted, so you needed some external compiler to get the ball rolling, being able to compile a new zig version from source which then is able to use clang, to build a new stage/version.

Basically if you don’t specify what you want zig cc tries to find something that works, that way you will be able to compile and run stage1, overcoming the bootstrapping problem and then use that to create further stages.

If you take a look at this blog post `zig cc`: a Powerful Drop-In Replacement for GCC/Clang - Andrew Kelley there is for example this section:

When no explicit glibc version is requested, and the target OS is the native (host) OS, Zig detects the native glibc version by inspecting the Zig executable’s own dynamically linked libraries, looking for glibc, and checking the version. It turns out you can look for libc.so.6 and then readlink on that, and it will look something like libc-2.27.so. When this strategy does not work, Zig looks at /usr/bin/env, looking for the same thing. Since this file path is hard-coded into countless shebang lines, it’s a pretty safe bet to find out the dynamic linker path and glibc version (if any) of the native system!

zig cc currently does not provide a way to choose a specific glibc version (because C compilers do not provide a way), and so Zig chooses the native version for compiling natively, and the default (2.17) for cross-compiling.

This blog post mentions that back then release 0.6 was upcoming, so I took a look at the release notes from back then: 0.6.0 Release Notes · The Zig Programming Language

In summary, since Zig links against libclang, Zig has the ability to act as a C compiler. And since Zig ships with libc, it has the ability to act as a cross-compiling C compiler. This feature has been available since 0.4.0, however, what’s new is the sub-command, zig cc, which has the ability to parse C compiler flags.

In this release, zig cc has full compatibility with Clang’s command line options. Clang is not invoked directly; some components are replaced with Zig’s own. For example, Zig provides all the include paths for libc, and acts as the linker driver. Zig translates the semantics of the arguments to its own internal build logic. Clang options that Zig is not aware of are forwarded to Clang directly. Some parameters are handled specially.

Thanks to this new ability, Zig is now able to bootstrap itself. Bootstrapping is not to be confused with self-hosting.

Under self-hosting you can see a bunch of detection things being mentioned as self hosted, I think if you research these in detail you could find out more about what things are selected automatically and how.

I am not sure whether there is some easy way to just print what was detected.
Looking at zig cc --help I found these that look interesting:

  • -### Print (but do not run) the commands to run for this compilation
    I imagine this could be used to compare between the two and see what different options are being used
  • -print-effective-triple is this the triple that ends up being used?
  • -print-supported-cpus
  • -serialize-diagnostics this seems to be for debugging zig cc so it might be a way to gain a lot of insight

So I guess this was me researching a bit, trying to find explanations, hopefully others can point out if anything I said is incorrect or misleading, or add further information. I will stop here for now and get back to working on my project, but this was definitely interesting poking around, trying to get some historical sense of what was going on.

2 Likes