Tail calls on x86_64-linux-gnu

Hi,
Im trying to learn Zig, so after a few simple algorithmic excercises Ive created for myself, I intend to play around with c library bindings. Im trying to do a simple application that renders through OpenGL. Just a hello triangle, possibly with SDL3 included to deal with the context creation, maybe inputs etc. (and I dont want to go mach-glfw route, yet). Im using zig-opengl to generate opengl bindings. That was a success and I got a nice API out of it. However, the api uses tail calls. It boggled my head for a while, because I ended up running in this issue:

src/gles3v2.zig:1274:12: error: unable to perform tail call: compiler backend 'stage2_x86_64' does not support tail calls on target architecture 'x86_64' with the selected CPU feature flags
    return @call(.always_tail, function_pointers.glGenBuffers, .{_n, _buffers});
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

My understanding back from uni suggested that the optimizer is running into tail call recursion optimization to get around stack overflows (roughly said turning a recursion that immediately dumps recursive calls to returns into whiles).
I could not find any note about this on the internet, and that it finally hit me to try building for aarch64-linux-gnu and x86_64-windows and boom, the error went away in both cases.
So is there anything im missing about x86_64-linux-gnu (default target from my local PC)? This is the output from build-exe --show-builtin, if thats of any help (arch linux is the distro, zig package from official repo, not aur, zen3 means 5800X):

const std = @import("std");
/// Zig version. When writing code that supports multiple versions of Zig, prefer
/// feature detection (i.e. with `@hasDecl` or `@hasField`) over version checks.
pub const zig_version = std.SemanticVersion.parse(zig_version_string) catch unreachable;
pub const zig_version_string = "0.16.0";
pub const zig_backend = std.builtin.CompilerBackend.stage2_x86_64;

pub const output_mode: std.builtin.OutputMode = .Exe;
pub const link_mode: std.builtin.LinkMode = .static;
pub const unwind_tables: std.builtin.UnwindTables = .async;
pub const is_test = false;
pub const single_threaded = false;
pub const abi: std.Target.Abi = .gnu;
pub const cpu: std.Target.Cpu = .{
    .arch = .x86_64,
    .model = &std.Target.x86.cpu.znver3,
    .features = std.Target.x86.featureSet(&.{
        .@"64bit",
        .adx,
        .aes,
        .allow_light_256_bit,
        .avx,
        .avx2,
        .bmi,
        .bmi2,
        .branchfusion,
        .clflushopt,
        .clwb,
        .clzero,
        .cmov,
        .crc32,
        .cx16,
        .cx8,
        .f16c,
        .fast_15bytenop,
        .fast_bextr,
        .fast_imm16,
        .fast_lzcnt,
        .fast_movbe,
        .fast_scalar_fsqrt,
        .fast_scalar_shift_masks,
        .fast_variable_perlane_shuffle,
        .fast_vector_fsqrt,
        .fma,
        .fsgsbase,
        .fsrm,
        .fxsr,
        .idivq_to_divl,
        .invpcid,
        .lzcnt,
        .macrofusion,
        .mmx,
        .movbe,
        .mwaitx,
        .nopl,
        .pclmul,
        .pku,
        .popcnt,
        .prfchw,
        .rdpid,
        .rdpru,
        .rdrnd,
        .rdseed,
        .sahf,
        .sbb_dep_breaking,
        .sha,
        .shstk,
        .smap,
        .smep,
        .sse,
        .sse2,
        .sse3,
        .sse4_1,
        .sse4_2,
        .sse4a,
        .ssse3,
        .vaes,
        .vpclmulqdq,
        .vzeroupper,
        .wbnoinvd,
        .x87,
        .xsave,
        .xsavec,
        .xsaveopt,
        .xsaves,
    }),
};
pub const os: std.Target.Os = .{
    .tag = .linux,
    .version_range = .{ .linux = .{
        .range = .{
            .min = .{
                .major = 7,
                .minor = 0,
                .patch = 9,
            },
            .max = .{
                .major = 7,
                .minor = 0,
                .patch = 9,
            },
        },
        .glibc = .{
            .major = 2,
            .minor = 43,
            .patch = 0,
        },
        .android = 29,
    }},
};
pub const target: std.Target = .{
    .cpu = cpu,
    .os = os,
    .abi = abi,
    .ofmt = object_format,
    .dynamic_linker = .init("/lib64/ld-linux-x86-64.so.2"),
};
pub const object_format: std.Target.ObjectFormat = .elf;
pub const mode: std.builtin.OptimizeMode = .Debug;
pub const link_libc = false;
pub const link_libcpp = false;
pub const have_error_return_tracing = true;
pub const valgrind_support = true;
pub const sanitize_thread = false;
pub const fuzz = false;
pub const position_independent_code = false;
pub const position_independent_executable = false;
pub const strip_debug_info = false;
pub const code_model: std.builtin.CodeModel = .default;
pub const omit_frame_pointer = false;

Thanks for any explanation you can provid, because I care more about understanding the issue, than just fixing it.

“Tail-calls” are also typically used to forward a function call to another function with identical signature with minimal overhead (e.g. just a jmp instruction like here: Compiler Explorer).

I guess exactly this is the intention of the explicit @call(.always_tail,... in the GL bindings wrapper.

Tbh, this is a typical case of ‘premature optimization’. I would simply let the compiler figure that out instead of hardwiring the tailcall requirement into the bindings layer (e.g. the compiler will most likely inline that bindings functions anyway and directly call into the GL function instead of doing a ‘double-hop’ through that one-line shim).

PS: I think the error you are seeing might be a difference between the self-hosted and LLVM backends (e.g. IIRC the self-hosted backend was initially not enabled by default for x86_64-windows - not sure if that has changed in the meantime though, and is also not the default for ARM CPUs). This means that the problem is probably also fixed by forcing Zig to always use the LLVM backend.

2 Likes

Thanks for your answer! Forwarding makes total sense as well, because it is basically the same pattern instruction-wise.
Anyway, Ive tried -Doptimize=ReleaseFast, which should use LLVM backend based on my understanding, and the issue is gone, and since Debug uses self-hosted and is default then, afaik, it is to be expected.
Ok, I can adjust the generator code to output a more appropriate code. Means I have to study C calls more thoroughly, not to mess it up, unless you would say it is “fool-proof”, but I should still read the documentation either way.

EDIT: direct call through the function pointer seems to do the trick, compilation-wise. I just need to make sure thats not illegal.

Calls through function pointers are perfectly legal. They wouldn’t exist in the language otherwise :wink:

Well… I come from C/C++ world mainly, and also Unreal Engine (but definitely not exclusively). So something being legal for the compiler, doesnt mean that it is well-defined. After all, Zig might have to generate a slightly different function call instruction flow to handle errors (I can see a couple of options), so I was not sure, if the function pointer to a C function is as valid as a function pointer to a Zig function. For a comparison, look how C arrays (pointers to arrays) vs. Zig arrays and slices work.
As Ive said, Im still learning, and that includes Zig internals. I can imagine what is Zig doing behind the code, but I cant be sure, until I actually dive into it. If that makes sense.