Is it okay to use &array instead of array[0..]?

jasperdunn · December 2, 2023, 7:13am

Are they both describing the same thing?

std.mem.copyForwards(f32, &one_second_of_frames, one_second_of_frames[frame_count..]);

or

std.mem.copyForwards(f32, one_second_of_frames[0..], one_second_of_frames[frame_count..]);

My gut feeling is that array[0..] is more explicit. Are there any performance differences here?

AndrewCodeDev · December 2, 2023, 7:33am

If we take a look at the implementation, we can see that’s both arguments are slices. Either way you look at it, you’re setting up a slice.

/// Copy all of source into dest at position 0.
/// dest.len must be >= source.len.
/// If the slices overlap, dest.ptr must be <= src.ptr.
pub fn copyForwards(comptime T: type, dest: []T, source: []const T) void {
    for (dest[0..source.len], source) |*d, s| d.* = s;
}

Here’s my test code for reference:

onst mem = @import("std").mem;

const inp: [5]usize = .{ 1, 2, 3, 4, 5 };
var out: [5]usize = undefined;

export fn implicit_slice() void {
    mem.copyForwards(usize, &out, &inp);
}

export fn explicit_slice() void {
    mem.copyForwards(usize, out[0..], inp[0..]);    
}

So here’s what ol’ godbolt has to say about it (no optimization flags and arrays are initialized first)…

implicit_slice:
        push    rbp
        mov     rbp, rsp
        movabs  rdi, offset example.out
        mov     ecx, 5
        movabs  rdx, offset example.inp
        mov     rsi, rcx
        call    mem.copyForwards__anon_1098
        pop     rbp
        ret

explicit_slice:
        push    rbp
        mov     rbp, rsp
        movabs  rdi, offset example.out
        mov     ecx, 5
        movabs  rdx, offset example.inp
        mov     rsi, rcx
        call    mem.copyForwards__anon_1098
        pop     rbp
        ret

Identical assembly. Now for something neat… let’s put on -O ReleaseFast…

explicit_slice:
        ret

implicit_slice:
        jmp     explicit_slice

You can see here that the implicit slice call actually is jumping to explicit slice. It’s decided they’re the same thing when all is said and done.

jasperdunn · December 2, 2023, 7:37am

Awesome!
Thanks for the quick response, i’ll have to take a look at godbolt.

AndrewCodeDev · December 2, 2023, 7:42am

As an addendum about readability… personally I prefer when things are consistent. My vote is if you go with one as a slice, do the other as well. And because slices are adjustable (you have to adjust one in your example), that immediately puts me in the camp of slice-ville (which is conveniently located right next to flavor-town, apparently).

jasperdunn · December 2, 2023, 7:44am

Yeah that makes sense, better to be explicit where you can, cool to see the identical output assembly.

androm3da · December 17, 2023, 4:52am

Okay – but hold on, it looks like the optimizer did two things here. Yes, the outliner decided to collapse these together. But only after it decided that it could optimize them to have no effect because it only effects data that’s never read anywhere. So that might not necessarily be convincing that they’re the same if they both happened to get elided.

If instead we allow out[n] to escape the TU (see Compiler Explorer ) , the optimizer cannot elide these functions anymore. Now we can see that the outliner indeed made the same decision here to reduce these funcs that had the same effect.

explicit_slice:
        vmovups ymm0, ymmword ptr [rip + example.inp]
        mov     qword ptr [rip + example.out+32], 5
        vmovups ymmword ptr [rip + example.out], ymm0
        vzeroupper
        ret

implicit_slice:
        jmp     explicit_slice

AndrewCodeDev · December 17, 2023, 5:06am

Right - under higher optimization levels (the compiler explorer link you sent is on ReleaseFast), I think your example makes a more convincing argument because the whole thing itself isn’t elided. Even from the unoptimized thing all the way up and when it can’t be elided completely, it reduces to the same thing.

Good catch, and welcome to the forum @androm3da.

permutationlock · December 17, 2023, 5:17am

Using slicing syntax with a comptime-known length always produces a pointer to an array, not a slice. So they should be exactly the same thing before they even hit an optimizer.

const arr: [4]u32 = .{ 3, 5, 2, 12 };
const slice = arr[0..];
@compileLog(@TypeOf(slice));

Compile Log Output:
@as(type, *const [4]u32)

androm3da · December 17, 2023, 5:21am

Sure – dumping early gives yet better evidence still.