Why is 0.16-dev a lot slower than 0.15.2?

I used the same logic from this showcase, main branch which is on 0.15.2 and 0.16-mixed which uses the same recursion that calls processing functions and I get this result when benchmarking:


Only difference logical difference in the code is this:
0.15.2

    for (processed_args.items[0 .. processed_args.items.len - 1]) |arg| {
        var open = std.fs.cwd().openDir(
            arg,
            .{ .no_follow = true },
        ) catch |err| {
            std.log.err("{s}: {s}\n", .{ @errorName(err), arg });
            continue;
        };
        defer open.close();
        const src_path = open.stat() catch |err| {
            std.log.err("{s}: {s}\n", .{ @errorName(err), arg });
            continue;
        };

0.16.-mixed:

    for (processed_args.items[0 .. processed_args.items.len - 1]) |arg| {
        var src_path = std.Io.Dir.cwd().statFile(
            io,
            arg,
            .{ .follow_symlinks = false },
        ) catch |err| {
            std.log.err("{s}: {s}\n", .{ @errorName(err), arg });
            continue;
        };

I know, I should figure out how to stat files with out going through symlinks on 0.15.2 instead, but does that really impact the performance this much?

…and all the changes from the standard library.

Looking at the times, it appears that the program execution is on the order of a couple hundred microseconds. At this scale I expect that even things, like the creation of an Io instance, or the code that runs before even calling your main (since you are using juicy main now), could make a measurable difference.

So to figure out, what exactly is the issue, I’d suggest to try running this in a full profiler (though I’m not sure if I know any that support such short-running programs, you may need to add a for loop around the main or something like that to get reasonable data).

3 Likes

Today I changed my SchmidlCox-Algo to 0.16-dev 2135 on the raspberry pi:


old: 460800 samples
sc time: 18 ms (0.040 µs/Sample)

new: 460800 samples
sc time: 14 ms (0.031 µs/Sample) 

Big improvement for me!