Performance "double" loop

ericlang · October 23, 2024, 11:05am

Little question: would solution loop 1 be faster than solution loop 2?

const src_line: []T = src.data[src_idx..src_idx + copy_width];
const dst_line: []T = dst.data[dst_idx..dst_idx + copy_width];

// solution 1
for(src_line, dst_line) |s, *d|
{
    copy(s, &d.*);
}

// solution 2
for(0..copy_width) |i|
{
    copy(src_line[i], &dst_line[i]);
}

Sze · October 23, 2024, 11:56am

when in doubt: measure

sidenote:

You can instead write this like this:

const src_line: []T = src.data[src_idx..][0..copy_width];
const dst_line: []T = dst.data[dst_idx..][0..copy_width];

chung-leong · October 23, 2024, 12:37pm

The two loops are identical.

LucasSantos91 · October 23, 2024, 12:38pm

Dereferencing and taking the address cancel each other out, you could do this:

for(src_line, dst_line) |s, *d|
    copy(s, d);

Compare the disassembly of both options. It’s easier than measuring and a lot of times you can derive a conclusive answer without having to measure. I’m pretty sure both options will generate the same machine code.

ericlang · October 23, 2024, 1:54pm

That is interesting! And good to know.

Dereferencing and taking the address cancel each other out, you could do this:

Ok. Great. I was already dissatisfied with the look of &d.*

In my copy routine (which i adjusted to your example) i however have to deref y.

fn copy(x: i64, y: *i64) void
{
    if (x > 0)
    {
        y.* = x;
    }
}

(Note that I made this for a generic comptype).

I was wondering if the optimizer could do something SIMD when there are u8’s involved.

Validark · October 24, 2024, 2:19pm

Compile it and check the assembly, my friend. If you see SIMD instructions emitted for the platform(s) you care about, then it does SIMD.

If you are unfamiliar with assembly, you can look for vector registers being used. On x86 this would be any time you see xmm/ymm/zmm registers being used. On ARM it’s usually a register with a v in front but unfortunately they have aliases as well.

ericlang · October 24, 2024, 2:56pm

I am a bit familiar with assembler, no expert, but i cannot see the assembler anywhere. I still have to find out some details.
Currently in vscode I cannot build or debug. Just run…