Where is the difference between this cpp and zig code?

markus · July 30, 2023, 11:31am

My friend and I compared zig and cpp based on this little code snippet. Mine is from a little program i made, his is an attempt at recreating what I did. Thing is, that I can’t read cpp and he cant read zig, at least not to an extent where either of us could figure out what could have made the performance difference between our snippets.

Can anyone figure out why his was faster? If so, what were the differences?

My code:

// measuring how long it takes to load the file
    const file_loading_start_time = std.time.milliTimestamp();
    var files = std.ArrayList([]const u8).init(allocator);
    defer files.deinit();
    var dir = try fs.cwd().openDir(resource_path, .{});
    defer dir.close();

    var resource_dir = try fs.cwd().openIterableDir(resource_path, .{});
    defer resource_dir.close();
    var resource_iter = resource_dir.iterate();
    while (try resource_iter.next()) |ifile| {
        if (ifile.kind != .file) {
            continue;
        }
        const file = try dir.openFile(ifile.name, .{});
        defer file.close();
        const file_contents = try file.readToEndAlloc(allocator, math.maxInt(usize));
        errdefer allocator.free(file_contents);
        try files.append(file_contents);
        try stdout.print("[SYSTEM] Loaded {s: >32}\n", .{ifile.name});
    }
    if (files.items.len == 0) {
        try stdout.print("[ERROR] No files found", .{});
        os.exit(1);
    }

    defer for (files.items) |file|
        allocator.free(file);
    // measuring how long it takes to load the file
    try stdout.print("took {d}ms", .{std.time.milliTimestamp() - file_loading_start_time});

His code:

std::chrono::time_point Start = std::chrono::high_resolution_clock::now();

    // GetCurrentPath
    char buffer[MAX_PATH];
    GetModuleFileNameA(nullptr, buffer, MAX_PATH);
    std::string Path(buffer);
    size_t pos = Path.find_last_of("\\");
    if (pos != std::string::npos) {
        Path = Path.substr(0, pos);
    }
    Path += "\\resources";

    // Iterate over files
    std::vector<std::string> out;
    WIN32_FIND_DATAA findData;
    HANDLE findHandle = FindFirstFileA((Path + "\\*").c_str(), &findData);
    if (findHandle != INVALID_HANDLE_VALUE) {
        do {
            std::string fileName = findData.cFileName;
            if (fileName == "." || fileName == "..")
                continue;
            std::string fullPath = Path + "\\" + fileName;
            bool isFolder = (findData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0;
            out.push_back(fullPath);
        } while (FindNextFileA(findHandle, &findData) != 0);
        FindClose(findHandle);
    }
    
    std::string Data;

    // Memory map files
    for (const std::string& Path : out) {
        HANDLE FileHandle = CreateFileA(Path.c_str(), GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
        LARGE_INTEGER FileSize;
        GetFileSizeEx(FileHandle, &FileSize);
        HANDLE MappingHandle = CreateFileMapping(FileHandle, NULL, PAGE_READWRITE, FileSize.HighPart, FileSize.LowPart, NULL);
        void* MappedPtr = MapViewOfFile(MappingHandle, FILE_MAP_WRITE, 0, 0, 0);

        Data += (char*)MappedPtr;

        UnmapViewOfFile(MappedPtr);
        CloseHandle(MappingHandle);
        CloseHandle(FileHandle);
    }

    std::chrono::time_point End = std::chrono::high_resolution_clock::now();
    float duration = std::chrono::duration_cast<std::chrono::milliseconds>(End - Start).count();
    std::cout << duration << "ms" << std::endl;

dee0xeed · July 30, 2023, 12:11pm

How did you compile your code?
Did you try build-exe with -O ReleaseFast option?

markus · July 30, 2023, 12:24pm

yea i did
why does this require me to make my post 10 chars long
wotever lol

neurocyte · July 30, 2023, 12:39pm

These are two very different programs. The Zig version reads the files and the C++ version maps them. Mapping is probably much faster than reading in most scenarios.

Btw, these two programs also do not do the same thing. the Zig version is safe and will work whatever the files contain. The C++ version is not safe and can crash if given bad files.

dude_the_builder · July 30, 2023, 1:16pm

This version should avoid allocating the file contents twice:

const std = @import("std");
const fs = std.fs;

pub fn main() !void {
    var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    defer arena.deinit();
    const allocator = arena.allocator();

    const file_loading_start_time = std.time.milliTimestamp();

    var files = std.ArrayList(u8).init(allocator);
    defer files.deinit();

    const resource_path = ".";
    var dir = try fs.cwd().openDir(resource_path, .{});
    defer dir.close();

    var resource_dir = try fs.cwd().openIterableDir(resource_path, .{});
    defer resource_dir.close();

    var resource_iter = resource_dir.iterate();
    while (try resource_iter.next()) |ifile| {
        if (ifile.kind != .file) continue;

        const file = try dir.openFile(ifile.name, .{});
        defer file.close();
        var buf_reader = std.io.bufferedReader(file.reader());
        const reader = buf_reader.reader();

        try reader.readAllArrayList(&files, std.math.maxInt(usize));
        std.debug.print("[SYSTEM] Loaded {s: >32}\n", .{ifile.name});
    }

    if (files.items.len == 0) {
        std.debug.print("[ERROR] No files found\n", .{});
    } else {
        // measuring how long it takes to load the file
        std.debug.print("took {d}ms\n", .{std.time.milliTimestamp() - file_loading_start_time});
    }
}

markus · July 30, 2023, 1:17pm

Whats memory mapping, how is it different to reading from a user perspective and how could it be implemented in zig?

Also in which cases will the cpp version fail?

kristoff · July 30, 2023, 2:44pm

also make sure to buffer your writes to stdout How to Add Buffering to a Reader / Writer in Zig - Zig NEWS

markus · July 30, 2023, 3:12pm

I dont see where mine does, though. I think our versions are de facto the same

dude_the_builder · July 30, 2023, 4:34pm

Although file_contents is a slice pointing to the bytes read from the file and not really the actual bytes of the file, you are then allocating space for that slice when you append it to files. So you allocated for the bytes of the file with readToEndAlloc and then allocated for the resulting slice with append (note that files is an ArrayList of slices and not bytes in your code). Although append only re-allocates if it runs out of capacity, it will eventually re-allocate unless you ensure the total capacity up-front.

neurocyte · July 30, 2023, 4:38pm

Memory mapping makes the pages of a file available in memory via the OS & CPU virtual memory support. The OS loads the data on-demand as the memory addresses are accessed, or not at all if the memory is not actually read. You can call the same Win32 API functions as the C++ version does to do this in Zig. There is nothing C++ specific about it.

The C++ version assumes that the data in the files is null terminated. (The Data += call). If that is not the case, it will read off the end of the file into uninitialized memory. Most likely causing a segmentation fault. This is very likely to happen if the mapped file happens to have a length that is a multiple of the OS memory page size. (Probably 4096 bytes) If the file size is not a multiple of 4096 the last mapped memory page is probably padded with zeros, which I guess is why this may seem to work most of the time.

markus · July 30, 2023, 5:11pm

Okay makes sense

markus · July 30, 2023, 5:51pm

Thank you for the explenation, greatly appreciate it.

Does this mean that mmapping is less efficient the more of the memory you access?

Damjan94 · July 30, 2023, 6:38pm

I think it’s also absolutely necessary in mmaping the hardware registers (think Arduino modules for example), so you can control them… I could be wrong though

neurocyte · July 30, 2023, 6:49pm

I don’t know if there is a good general answer to that. It depends on a lot of things. OS, hardware, access patterns in your app, etc.

squeek502 · July 30, 2023, 7:55pm

Something not mentioned yet is that the allocator used in the Zig code can matter (especially if the Zig code is using GeneralPurposeAllocator which is still slow in release modes currently). It usually makes sense to use std.heap.c_allocator when comparing against C/C++.