Trouble with imported fstatat() in macOS

I’m trying to debug a test case on macOS. The code uses the fstatat() imported from C to get the size of a test:

const std = @import("std");

const c = @cImport({
    @cInclude("unistd.h");
    @cInclude("fcntl.h");
    @cInclude("sys/stat.h");
});

pub fn main() !void {
    const dirfd = c.openat(c.AT_FDCWD, ".", c.O_DIRECTORY | c.O_RDONLY);
    if (dirfd < 0) return error.UnableToOpenDirectory;
    var info: c.struct_stat = undefined;
    if (c.fstatat(dirfd, "test.zig", &info, 0) != 0) return error.UnableToGetStat;
    std.debug.print("size = {d}\n", .{info.st_size});
}
size = 0

The code basically stats itself, so zero is obviously the wrong result.

The equivalent code in C works correctly:

#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <stdio.h>

int main() {
    int dirfd = openat(AT_FDCWD, ".", O_DIRECTORY | O_RDONLY);
    if (dirfd < 0) goto error;
    struct stat info;
    if (fstatat(dirfd, "test.c", &info, 0) < 0) goto error;
    printf("size = %zu\n", info.st_size);
    return 0;
error:
    printf("Error\n");
    return 1;
}
size = 372

Print-out of the stat struct:

cimport.struct_stat{ 
    .st_dev = 16777222, 
    .st_mode = 1052, 
    .st_nlink = 13, 
    .st_ino = 2151778714020, 
    .st_uid = 20, 
    .st_gid = 0, 
    .st_rdev = 1757947205, 
    .st_atimespec = cimport.struct_timespec{ .tv_sec = 621266321, .tv_nsec = 1757947204 }, 
    .st_mtimespec = cimport.struct_timespec{ .tv_sec = 155724084, .tv_nsec = 1757947204 }, 
    .st_ctimespec = cimport.struct_timespec{ .tv_sec = 159462706, .tv_nsec = 505 },
    .st_birthtimespec = cimport.struct_timespec{ .tv_sec = 8, .tv_nsec = 4096 },
    .st_size = 0, 
    .st_blocks = 0,
    .st_blksize = 0,
    .st_flags = 0,
    .st_gen = 2863311530,
    .st_lspare = -1431655766,
    .st_qspare = { -6148914691236517206, -6148914691236517206 } 
}

The size is showing up in the tv_nsec field of st_ctimespec. -6148914691236517206 in hex is, as you might have guessed, 0xAAAA_AAAA_AAAA_AAAA. It seems I’m calling the 32-bit version of the function with a 64-bit struct. Is there a define that needs to be set or something?

The macOS version is Monterey :sadface:

From macos man 2 stat:

excerpt
If fstatat() is passed the special value AT_FDCWD in the fd parameter, the current working directory is used and the behavior is identical to a call to stat() or lstat()
     respectively, depending on whether or not the AT_SYMLINK_NOFOLLOW bit is set in flag.

     The buf argument is a pointer to a stat structure as defined by <sys/stat.h> and into which information is placed concerning the file.  When the macro _DARWIN_FEATURE_64_BIT_INODE
     is not defined (see below for more information about this macro), the stat structure is defined as:

     struct stat { /* when _DARWIN_FEATURE_64_BIT_INODE is NOT defined */
         dev_t    st_dev;    /* device inode resides on */
         ino_t    st_ino;    /* inode's number */
         mode_t   st_mode;   /* inode protection mode */
         nlink_t  st_nlink;  /* number of hard links to the file */
         uid_t    st_uid;    /* user-id of owner */
         gid_t    st_gid;    /* group-id of owner */
         dev_t    st_rdev;   /* device type, for special file inode */
         struct timespec st_atimespec;  /* time of last access */
         struct timespec st_mtimespec;  /* time of last data modification */
         struct timespec st_ctimespec;  /* time of last file status change */
         off_t    st_size;   /* file size, in bytes */
         quad_t   st_blocks; /* blocks allocated for file */
         u_long   st_blksize;/* optimal file sys I/O ops blocksize */
         u_long   st_flags;  /* user defined flags for file */
         u_long   st_gen;    /* file generation number */
     };

     However, when the macro _DARWIN_FEATURE_64_BIT_INODE is defined, the stat structure will now be defined as:

     struct stat { /* when _DARWIN_FEATURE_64_BIT_INODE is defined */
         dev_t           st_dev;           /* ID of device containing file */
         mode_t          st_mode;          /* Mode of file (see below) */
         nlink_t         st_nlink;         /* Number of hard links */
         ino_t           st_ino;           /* File serial number */
         uid_t           st_uid;           /* User ID of the file */
         gid_t           st_gid;           /* Group ID of the file */
         dev_t           st_rdev;          /* Device ID */
         struct timespec st_atimespec;     /* time of last access */
         struct timespec st_mtimespec;     /* time of last data modification */
         struct timespec st_ctimespec;     /* time of last status change */
         struct timespec st_birthtimespec; /* time of file creation(birth) */
         off_t           st_size;          /* file size, in bytes */
         blkcnt_t        st_blocks;        /* blocks allocated for file */
         blksize_t       st_blksize;       /* optimal blocksize for I/O */
         uint32_t        st_flags;         /* user defined flags for file */
         uint32_t        st_gen;           /* file generation number */
         int32_t         st_lspare;        /* RESERVED: DO NOT USE! */
         int64_t         st_qspare[2];     /* RESERVED: DO NOT USE! */
     };
_DARWIN_FEATURE_64_BIT_INODE
     In order to accommodate advanced capabilities of newer file systems, the struct stat, struct statfs, and struct dirent data structures were updated in Mac OSX 10.5.

     The most obvious change is the increased size of ino_t from 32 bits to 64 bits.  As a consequence, storing an ino_t in an int is no longer safe, and file formats storing ino_t as
     32-bit values may need to be updated.  There are other changes as well, such as the widening of f_fstypename, f_mntonname, and f_mntfromname in struct statfs.  Please refer to
     dir(5) for more detail on the specific changes to the other affected data structures.

     On platforms that existed before these updates were available, ABI compatibility is achieved by providing two implementations for related functions: one using the legacy data
     structures and one using the updated data structures.  Variants which make use of the newer structures have their symbols suffixed with $INODE64.  These $INODE64 suffixes are
     automatically appended by the compiler tool-chain and should not be used directly.

     Platforms that were released after these updates only have the newer variants available to them.  These platforms have the macro _DARWIN_FEATURE_ONLY_64_BIT_INODE defined.

     The _DARWIN_FEATURE_64_BIT_INODE macro should not be set directly.  Instead, developers should make use of the _DARWIN_NO_64_BIT_INODE or _DARWIN_USE_64_BIT_INODE macros when the
     default variant is not desired.  The following table details the effects of defining these macros for different deployment targets.

                                                                        _DARWIN_FEATURE_ONLY_64_BIT_INODE not defined
                                                                   =========================+===============================
                                                                                            |       Deployment Target
                                                                        user defines:       |   < 10.5       10.5    > 10.5
                                                                   -------------------------+-------------------------------
                                                                            (none)          |   32-bit      32-bit   64-bit
                                                                   _DARWIN_NO_64_BIT_INODE  |   32-bit      32-bit   32-bit
                                                                   _DARWIN_USE_64_BIT_INODE |   32-bit      64-bit   64-bit
                                                                   -------------------------+-------------------------------
                                                                                            |
                                                                      _DARWIN_FEATURE_ONLY_64_BIT_INODE defined    |
                                                                   =========================+===============================
                                                                        user defines:       | Any Deployment Target
                                                                   -------------------------+-------------------------------
                                                                            (none)          | 64-bit-only
                                                                   _DARWIN_NO_64_BIT_INODE  |   (error)
                                                                   _DARWIN_USE_64_BIT_INODE | 64-bit-only
                                                                   -------------------------+-------------------------------

           32-bit       32-bit inode values are enabled, and the legacy structures involving the ino_t type are in use.  The macro _DARWIN_FEATURE_64_BIT_INODE is not defined.

           64-bit       64-bit inode values are enabled, and the expanded structures involving the ino_t type are in use.  The macro _DARWIN_FEATURE_64_BIT_INODE is defined, and loader
                        symbols will contain the $INODE64 suffix.

           64-bit-only  Like 64-bit, except loader symbols do not have the $INODE64 suffix.

           (error)      A compile time error is generated.

     Due to the increased benefits of the larger structure, it is highly recommended that developers not define _DARWIN_NO_64_BIT_INODE and make use of _DARWIN_USE_64_BIT_INODE when
     targeting Mac OSX 10.5.

     In addition to the $INODE64 suffixed symbols, variants suffixed with 64 are also available for related functions.  These functions were provided as a way for developers to use the
     updated structures in code that also made use of the legacy structures.  The enlarged stat structures were also prefixed with 64 to distinguish them from their legacy variants.
     These functions have been deprecated and should be avoided.

This might be the issue, try defining _DARWIN_NO_64_BIT_INODE maybe Apple didn’t bother making fstatat work with the 64-bit struct variants on Monterey.

The web version (Mac OS X Manual Page For stat(2)) is very helpfully from 1994 and makes no mention of the macros.

It’s weird that the C code works though? Maybe it sets the macro somewhere or this is a different issue

EDIT: both variants (Zig and C) behave correctly on my machine (M1 MacBook Pro, MacOS 15), with or without _DARWIN_USE_64_BIT_INODE, but if I try to use _DARWIN_NO_64_BIT_INODE the compiler complains:

error "Can't define _DARWIN_NO_64_BIT_INODE when only 64-bit inodes are available."

(_DARWIN_FEATURE_ONLY_64_BIT_INODE seems to be defined)

2 Likes

The function is there. It works if I do this:

pub extern fn @"fstatat$INODE64"(c_int, [*c]const u8, [*c]c.struct_stat, c_int) c_int;

pub fn main() !void {
    const dirfd = c.openat(c.AT_FDCWD, ".", c.O_DIRECTORY | c.O_RDONLY);
    if (dirfd < 0) return error.UnableToOpenDirectory;
    var info: c.struct_stat = undefined;
    if (@"fstatat$INODE64"(dirfd, "test.zig", &info, 0) != 0) return error.UnableToGetStat;
    std.debug.print("size = {d}\n", .{info.st_size});
}
size = 564

The problem appears to be that translate-C doesn’t do anything with the “__DARWIN_INODE64” macro:

int     fstatat(int, const char *, struct stat *, int) __DARWIN_INODE64(fstatat) __OSX_AVAILABLE_STARTING(__MAC_10_10, __IPHONE_8_0);
pub extern fn fstatat(c_int, [*c]const u8, [*c]struct_stat, c_int) c_int;

On an Intel Mac, __DARWIN_ONLY_64_BIT_INO_T is 0, so fstatat() still expects the 32-bit struct. On ARM, 64-bit is the only game in town so the code works.

1 Like

The fully expanded declaration is:

int fstatat(int, const char *, struct stat *, int) __asm("_" "fstatat" "$INODE64") __attribute__((availability(macosx,introduced=10.10)));

__asm("_" "fstatat" "$INODE64") is an asm label and means that the linker symbol name should be _fstatat$INODE64 instead of _fstatat. Both Clang and Aro recognize and parse asm labels, but translate-c currently ignores this information when rendering the Zig code.

I don’t think there are any open issues regarding this. I believe the main blocker (whether intentional or not) is that Zig doesn’t support specifying the linker symbol name on decls (Proposal: Add linkname in addition to linksection · Issue #19999 · ziglang/zig · GitHub). Theoretically, translate-c could work around it by emitting something like this:

extern fn @"\x01_fstatat$INODE64"(c_int, [*c]const u8, [*c]struct_stat, c_int) c_int;
pub const fstatat = @"\x01_fstatat$INODE64";

(\x01 suppresses name mangling.)

3 Likes