Strange program performance dependence

Here are two programs with big (1_048_576 bytes) array.
They are almost identical, the only difference is:

  • fpaqi16-stack holds that big array on the stack
  • fpaqi16-help holds that big array on the heap

Array on the stack:

  • compiling in Debug mode
$ zig build-exe fpaqi16-stack.zig -fsingle-threaded
$ ./fpaqi16-stack c ~/CC/obj1 z.z
obj1 (21504 bytes) -> z.z (13852 bytes) in 29684 msec !!!!!!!

As can be seen it is pathologically slow.

  • compiling in ReleaseFast mode
$ zig build-exe fpaqi16-stack.zig -fsingle-threaded -O ReleaseFast
$ ./fpaqi16-stack c ~/CC/obj1 z.z
obj1 (21504 bytes) -> z.z (13852 bytes) in 60 msec

60 msec (Fast) vs 30 sec (Debug)!!!

Array on the heap:

  • compiling in Debug mode
$ zig build-exe fpaqi16-heap.zig -fsingle-threaded
$ ./fpaqi16-heap c ~/CC/obj1 z.z
obj1 (21504 bytes) -> z.z (13852 bytes) in 79 msec
  • compiling in ReleaseFast mode
$ zig build-exe fpaqi16-heap.zig -fsingle-threaded -O ReleaseFast
$ ./fpaqi16-heap c ~/CC/obj1 z.z
obj1 (21504 bytes) -> z.z (13852 bytes) in 61 msec

No significant difference.

So the question is why does the combination of ‘array on the stack’ and “compiling in Debug mode” produce such a slow program?

1 Like

One reason for faster heap could be that the OS holds pages ready for use and a single allocation thus finishes very quickly. Pushing large array to the stack could affect locality and the CPU cache. This is complete speculation and is not based on a very deep knowledge. I think that a rule of thumb is to not use the stack for much bigger than few k:s of data.

Of course, but my confusion was mostly about the effect of ReleaseFast/ReleaseDebug.
Seems like array boundary checking is extraordinary slow when big array is on the stack.

Pretty stellar difference. Did you spot any smoking guns in the disassembly?

No, I did not. I mean I did not look into the asm code.