Why Zig memsets allocated memory even in ReleaseFast?

Today i was investigating effects of MAP_POPULATE mmap flag, and noticed that Zig’s stdlib allocator intrefacte touches every byte of memory even when compiled in ReleaseFast.

This line:

is not eliminated by optimizations as of Zig 0.11.0.

Is this intentional? I would expect ReleaseFast to not incur extra memory scan

(not really related, but, for the curious, the benchmark for populate is here: https://github.com/matklad/benchmarks/tree/master/prefault)

2 Likes

Seems like a big regression, and pretty straightforward to fix. Likely this flew under the radar with the new compiler implementation. @memset to undefined is certainly supposed to be a no-op in ReleaseFast optimization mode. In theory this is LLVM’s fault since memset to undefined technically helps the optimizer more than a no-op does, but if it can’t handle it in practice then Zig should simply omit it in the LLVM backend.

4 Likes

I looked into this a bit today. I think you should open an issue for discussion, because at least in small examples I am observing status quo working as designed:

export fn foo(ptr: [*]u8, len: usize) void {
    @memset(ptr[0..len], undefined);
}

After optimizations I get:

define dso_local void @foo(ptr nocapture nonnull align 1 %0, i64 %1) local_unnamed_addr #0 {
  ret void
}

The relevant source code in the compiler:

            if (elem_val.isUndefDeep(mod)) {
                // Even if safety is disabled, we still emit a memset to undefined since it conveys
                // extra information to LLVM. However, safety makes the difference between using
                // 0xaa or actual undefined for the fill byte.
                const fill_byte = if (safety)
                    try o.builder.intValue(.i8, 0xaa)
                else
                    try o.builder.undefValue(.i8);

…and then it calls LLVM’s memset intrinsic.

I think if you found an issue here, it is an LLVM bug and should be reported there. I take back what I said about this being a straightforward fix.

2 Likes

Ok, upon double checking, it actually does work in 0.11.0 as expected, and only 0.10.0 is affected!


22:20:50|~/tmp
λ zig version && zig build-exe -O ReleaseFast ./main.zig && perf stat -- ./main
0.10.1

 Performance counter stats for './main':

          2,589.83 msec task-clock                       #    1.000 CPUs utilized          
                 7      context-switches                 #    2.703 /sec                   
                 0      cpu-migrations                   #    0.000 /sec                   
         2,621,447      page-faults                      #    1.012 M/sec                  
    12,124,763,418      cpu_core/cycles/                 #    4.682 G/sec                  
     <not counted>      cpu_atom/cycles/                                              (0.00%)
    16,540,599,726      cpu_core/instructions/           #    6.387 G/sec                  
     <not counted>      cpu_atom/instructions/                                        (0.00%)
     3,400,612,061      cpu_core/branches/               #    1.313 G/sec                  
     <not counted>      cpu_atom/branches/                                            (0.00%)
         1,824,870      cpu_core/branch-misses/          #  704.628 K/sec                  
     <not counted>      cpu_atom/branch-misses/                                       (0.00%)
    72,744,610,344      cpu_core/slots/                  #   28.089 G/sec                  
    19,398,562,758      cpu_core/topdown-retiring/       #     26.7% Retiring              
     1,996,910,872      cpu_core/topdown-bad-spec/       #      2.7% Bad Speculation       
    13,407,830,141      cpu_core/topdown-fe-bound/       #     18.4% Frontend Bound        
    37,941,306,571      cpu_core/topdown-be-bound/       #     52.2% Backend Bound         
     3,993,821,744      cpu_core/topdown-heavy-ops/      #      5.5% Heavy Operations       #     21.2% Light Operations      
       570,545,963      cpu_core/topdown-br-mispredict/  #      0.8% Branch Mispredict      #      2.0% Machine Clears        
     6,561,278,580      cpu_core/topdown-fetch-lat/      #      9.0% Fetch Latency          #      9.4% Fetch Bandwidth       
    26,815,660,283      cpu_core/topdown-mem-bound/      #     36.9% Memory Bound           #     15.3% Core Bound            

       2.590434744 seconds time elapsed

       0.268046000 seconds user
       2.322399000 seconds sys



22:21:22|~/tmp
λ nix shell nixpkgs#zig_0_11

22:21:25|~/tmp
λ zig version && zig build-exe -O ReleaseFast ./main.zig && perf stat -- ./main
0.11.0

 Performance counter stats for './main':

              0.06 msec task-clock                       #    0.132 CPUs utilized          
                 0      context-switches                 #    0.000 /sec                   
                 0      cpu-migrations                   #    0.000 /sec                   
                 9      page-faults                      #  151.174 K/sec                  
           225,243      cpu_core/cycles/                 #    3.783 G/sec                  
     <not counted>      cpu_atom/cycles/                                              (0.00%)
           246,533      cpu_core/instructions/           #    4.141 G/sec                  
     <not counted>      cpu_atom/instructions/                                        (0.00%)
            50,879      cpu_core/branches/               #  854.621 M/sec                  
     <not counted>      cpu_atom/branches/                                            (0.00%)
             1,466      cpu_core/branch-misses/          #   24.625 M/sec                  
     <not counted>      cpu_atom/branch-misses/                                       (0.00%)
         1,351,458      cpu_core/slots/                  #   22.701 G/sec                  
           259,691      cpu_core/topdown-retiring/       #     19.3% Retiring              
           127,196      cpu_core/topdown-bad-spec/       #      9.4% Bad Speculation       
           577,682      cpu_core/topdown-fe-bound/       #     42.9% Frontend Bound        
           381,588      cpu_core/topdown-be-bound/       #     28.3% Backend Bound         
            42,398      cpu_core/topdown-heavy-ops/      #      3.1% Heavy Operations       #     16.1% Light Operations      
           127,196      cpu_core/topdown-br-mispredict/  #      9.4% Branch Mispredict      #      0.0% Machine Clears        
           413,387      cpu_core/topdown-fetch-lat/      #     30.7% Fetch Latency          #     12.2% Fetch Bandwidth       
           222,593      cpu_core/topdown-mem-bound/      #     16.5% Memory Bound           #     11.8% Core Bound            

       0.000449748 seconds time elapsed

       0.000553000 seconds user
       0.000000000 seconds sys

22:21:30|~/tmp
λ bat main.zig
pub fn main() !void {
    _ = try @import("std").heap.page_allocator.alloc(u8, 10 * 1024 * 1024 * 1024);
}

I guess I messed up ReleaseFast flags when originally testing with Zig 0.11.0

2 Likes