Program runs fine but running it with Valgrind results in a panic

loremayer · February 18, 2025, 9:35pm

I’m creating a program for Alpine Linux that counts the number of installed packages on the system. It does this by parsing the file /lib/apk/db/installed. I first did it with buffered I/O, but now I’m trying to do it with memory mapped I/O to benchmark the two and see which one is faster. Here’s my main.zig file with both implementations for reference:

const std = @import("std");

const page_size_min = std.heap.page_size_min;
const posix = std.posix;

const kib = 1 << 10;

pub fn main() !void {
    std.debug.print("{}\n", .{try get_packages_mmap()});
}

fn get_packages_mmap() !u16 {
    const key = "C:Q";
    const cwd = std.fs.cwd();

    const file = cwd.openFile("/lib/apk/db/installed", .{}) catch return 0;
    const metadata = try file.metadata();
    const size = metadata.size();
    const map: posix.MAP = .{ .TYPE = .PRIVATE };

    const array_ptr: [*]align(page_size_min) u8 = undefined;

    var counter: u16 = 0;
    var offset: u64 = 0;

    while (true) {
        const str = try posix.mmap(
            array_ptr, page_size_min, posix.PROT.READ, map, file.handle, offset
        );

        var iter = std.mem.splitScalar(u8, str, '\n');

        while (iter.next()) |entry| {
            if (entry.len >= key.len
                and std.mem.eql(u8, entry[0..key.len], key))
            {
                counter += 1;
            }
        }

        offset += page_size_min;
        if (offset >= size) {
            break;
        }
    }

    return counter;
}

fn get_packages() !u16 {
    const key = "C:Q";

    const file = std.fs.cwd().openFile("/lib/apk/db/installed", .{}) catch {
       return 0;
    };

    var counter: u16 = 0;
    {
        var file_buf: [8 * kib]u8 = undefined;
        var offset: u64 = 0;

        while (true) {
            const bytes_read = try file.preadAll(&file_buf, offset);
            const str = file_buf[0..bytes_read];
            var iter = std.mem.splitScalar(u8, str, '\n');

            while (iter.next()) |entry| {
                if (entry.len >= key.len
                    and std.mem.eql(u8, entry[0..key.len], key))
                {
                    counter += 1;
                }
            }

            if (bytes_read != file_buf.len) {
                break;
            }

            offset += bytes_read - key.len + 1;
        }
    }

    return counter;
}

Compiling with zig build-exe -target x86_64-linux main.zig, the output on my machine when running ./main is 1042, which is the expected result. When I run it with Valgrind though, I get this output:

==13233== Memcheck, a memory error detector
==13233== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==13233== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==13233== Command: ./main
==13233== 
thread 13233 panic: reached unreachable code
/usr/local/lib/zig/std/posix.zig:4790:19: 0x103b1a1 in mmap (main)
        .INVAL => unreachable, // Invalid parameters to mmap()
                  ^
/home/loremayer/source/main.zig:27:35: 0x103a7f6 in get_packages_mmap (main)
        const str = try posix.mmap(
                                  ^
/home/loremayer/source/main.zig:9:52: 0x103b607 in main (main)
    std.debug.print("{}\n", .{try get_packages_mmap()});
                                                   ^
/usr/local/lib/zig/std/start.zig:656:37: 0x103a51a in posixCallMainAndExit (main)
            const result = root.main() catch |err| {
                                    ^
/usr/local/lib/zig/std/start.zig:271:5: 0x103a0cd in _start (main)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x0 in ??? (???)
==13233== 
==13233== Process terminating with default action of signal 6 (SIGABRT)
==13233==    at 0x1069717: os.linux.x86_64.syscall4 (x86_64.zig:58)
==13233==    by 0x1077A4C: os.linux.sigprocmask (linux.zig:1724)
==13233==    by 0x106B0BB: posix.sigprocmask (posix.zig:5791)
==13233==    by 0x106B033: posix.raise (posix.zig:733)
==13233==    by 0x105890C: posix.abort (posix.zig:677)
==13233==    by 0x103EF0C: debug.defaultPanic (debug.zig:672)
==13233==    by 0x103D846: debug.FullPanic((function 'defaultPanic')).reachedUnreachable (debug.zig:59)
==13233==    by 0x103B1A1: posix.mmap (posix.zig:4790)
==13233==    by 0x103A7F6: main.get_packages_mmap (main.zig:27)
==13233==    by 0x103B607: main.main (main.zig:9)
==13233==    by 0x103A51A: callMain (start.zig:656)
==13233==    by 0x103A51A: callMainWithArgs (start.zig:616)
==13233==    by 0x103A51A: start.posixCallMainAndExit (start.zig:571)
==13233==    by 0x103A0CD: (below main) (start.zig:271)
==13233== 
==13233== HEAP SUMMARY:
==13233==     in use at exit: 0 bytes in 0 blocks
==13233==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==13233== 
==13233== All heap blocks were freed -- no leaks are possible
==13233== 
==13233== For lists of detected and suppressed errors, rerun with: -s
==13233== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Aborted

Also, if in the main function I put a while loop that runs the function over and over again, I eventually get an OutOfMemory error.

dimdin · February 18, 2025, 9:50pm

Perhaps the file can be empty?

man mmap gives the following reasons for .INVAL:

       EINVAL We don't like addr, length, or offset (e.g., they are too
              large, or not aligned on a page boundary).

       EINVAL (since Linux 2.6.12) length was 0.

       EINVAL flags contained none of MAP_PRIVATE, MAP_SHARED, or
              MAP_SHARED_VALIDATE.

loremayer · February 18, 2025, 9:55pm

Unfortunately that’s not it. Running wc -c < /lib/apk/db/installed, I get 8989235 bytes.

Sze · February 18, 2025, 11:14pm

I think you should call defer posix.munmap to release the previous mapping at the end of the iteration, instead of leaking the mapping.

I think with the last iteration you may have a portion of the file remaining that is smaller than the page_size_min and the mapping tries to map past the end of file, maybe that is what causes the error.

You can calculate the actual remaining size and pass that to the second parameter to mmap. One way would be @min(page_size_min, size - offset).

dimdin · February 18, 2025, 11:23pm

Set the first argument of mmap to null. The code tries to use address array_ptr that is undefined (zig might set all the bytes of the pointer to 0xaa or not, depending on the compilation mode and the presence of valgrind).

loremayer · February 18, 2025, 11:56pm

I think you should call defer posix.munmap to release the previous mapping at the end of the iteration, instead of leaking the mapping.

That solved the issue where if I run the function in a loop, I would get an OutOfMemory error. Unfortunately, the issue with Valgrind is still present.

I think with the last iteration you may have a portion of the file remaining that is smaller than the page_size_min and the mapping tries to map past the end of file, maybe that is what causes the error.

I don’t think that’s necessary because in the loop I have if (offset >= size) break;. I still tried your suggestion, but sadly it didn’t fix the issue.

Sze · February 19, 2025, 12:15am

Have you tried my suggestion together with @dimdin’s to use null in the first argument?

Sze · February 19, 2025, 12:25am

I think it would be needed because for the last iteration, offset can be smaller than size while offset + page_size_min is bigger than size, this would mean that either the mmap call fails (if it refuses to map memory that isn’t backed by the file) or it returns a str slice that maps to memory that only partially maps to a file.

If it is the latter your string processing code would than eventuall get some kind of access violation, but I think it should actually be the former, because all mmap examples I have seen so far make sure to not map memory that is beyond the actual file size.

For example they use file.setEndPos with a writeable file to ensure the mapping works (when writing via a mapping): Mmap file - Zig cookbook

loremayer · February 19, 2025, 12:37am

Thank you, setting the first argument of mmap to null and using posix.munmap fixed the issue! Here’s my updated function:

fn get_packages_mmap() !u16 {
    const key = "C:Q";
    const cwd = std.fs.cwd();

    const file = cwd.openFile("/lib/apk/db/installed", .{}) catch return 0;
    const metadata = try file.metadata();
    const size = metadata.size();
    const map: posix.MAP = .{ .TYPE = .PRIVATE };

    var counter: u16 = 0;
    var offset: u64 = 0;

    while (true) {
        const str = try posix.mmap(
            null, page_size_min, posix.PROT.READ, map, file.handle, offset
        );
        defer posix.munmap(str);

        var iter = std.mem.splitScalar(u8, str, '\n');

        while (iter.next()) |entry| {
            if (entry.len >= key.len
                and std.mem.eql(u8, entry[0..key.len], key))
            {
                counter += 1;
            }
        }

        offset += page_size_min;
        if (offset >= size) {
            break;
        }
    }

    return counter;
}

loremayer · February 19, 2025, 12:38am

You’re right I forgot to set it to null. That fixed the Valgrind issue.

loremayer · February 19, 2025, 12:42am

I think you’re right. I think on the last iteration, the part of the file remaining could be smaller than page_size_min, causing undefined behaviour. I will look into the link you sent me.

Sze · February 19, 2025, 12:54am

I think it can “work” without error if min_page_size corresponds to the actual page size used by the system call to map pages.
Where it would allow you to read the bytes past the end of the file as zero bytes.
But I think it is better to use the correct size, so that str never contains bytes that don’t correspond to bytes from the file.

For more detail kernel - What is the behaviour of a file-backed memory map when reading from or writing to an address larger than the length of the file? - Unix & Linux Stack Exchange

I am not completely sure, some links say that reading past the end should be zero memory and others say that it may cause SIGBUS. mmap :

The mmap() function can be used to map a region of memory that is larger than the current size of the object. Memory access within the mapping but beyond the current end of the underlying objects may result in SIGBUS signals being sent to the process. The reason for this is that the size of the object can be manipulated by other processes and can change at any moment. The implementation should tell the application that a memory reference is outside the object where this can be detected; otherwise, written data may be lost and read data may not reflect actual data in the object.

Note that references beyond the end of the object do not extend the object as the new end cannot be determined precisely by most virtual memory hardware. Instead, the size can be directly manipulated by ftruncate().

loremayer · February 19, 2025, 1:12am

From the link you sent me, POSIX specifies that the end of the array will be filled with zeros if the bytes read from the file is less than the length of the array. I checked in all three release modes, and it is zero. Since it’s not undefined, I think it’s perfectly safe and fine to leave it as-is.

Side note: I just realized that the method you suggested of giving @min(page_size_min, size - offset) as the second parameter to mmap can cause double counting on the last iteration. It may count entries that were already counted in the second last iteration because the offset begins somewhere in between the second last iteration’s offset and the end of the file.

Another note: I just noticed your edit about SIGBUS. Hmm, I need to look into this further.

Sze · February 19, 2025, 1:23am

I don’t know what you mean, if the page_size_min was 1024 and size is 1030, then offset would be 0 and then 1024 and size - offset would be 1030 and then 6.
size - offset would always be bigger than page_size_min except for the last iteration.

loremayer · February 19, 2025, 1:26am

You’re right, I misread and thought the second parameter to mmap is the offset when it’s actually the length. Setting the second parameter to @min(page_size_min, size - offset) works as intended, and it will also prevent the process from getting SIGBUS. Thank you for your help!

loremayer · February 19, 2025, 1:35am

I just did a benchmark and my mmap version is about two times slower than my non-mmap version! I did not expect that, but it’s good that I at least tried it out.

Sze · February 19, 2025, 2:09am

Did you try different page sizes?
You also could try mapping many pages at once, so you would pass some min_mapping_size to the length parameter.

I think it may be faster to create mappings for bigger regions of memory, but if non mmap still is faster that is good too.

loremayer · February 19, 2025, 5:10am

I did some more testing, and here is what I found:

For the non-mmapped function, I found a local maximum for performance at a buffer length of four pages. Anything more or less than that results in worse performance.
For the mmapped function, increasing the length always improves performance, but with diminishing returns.
When I remove all the code that iterates over the slice and parses it (identical for both functions), the mmapped function performs way better, sometimes eight times as fast.
However, when I keep the code that parses the slice, the mmapped function performs worse than the non-mmapped function depending on the length. If I set the length to four pages, it performs two times worse than the other one. It’s only when I set the length to 64 pages (256 KiB) that the performance is about equal. This is really weird, because the code for iterating over the slice is identical!

Conclusion: Non-memory-mapped I/O is better for both performance and memory usage. It seems like the code that parses the slice is causing the mmapped-function to be significantly slower, and I’m not sure why.

Sze · February 19, 2025, 5:38am

My guess would be that with the non memory mapped version the cpu predicts that the next batch of memory will be needed and prefetches it from memory to the cpu cache, before the program has even asked for it.

I haven’t done any investigation to confirm that, but that is the biggest possible cause for big performance differences I can think of.
Maybe it would make sense to look into what parts are slow by using strace and similar tools, but I haven’t used those a lot yet.

I mostly used GitHub - KDAB/hotspot: The Linux perf GUI for performance analysis. but I don’t know whether it would show helpful differences in this case.

loremayer · February 19, 2025, 5:53am

Running strace on both programs gives me the same result except that the non-mmapped one has a bunch of pread64 system calls where the mmapped one has mmap calls. I think your guess about CPU cache is probably right. I don’t know that much about low-level stuff since I’m new to it but I can’t think of another explanation.