Below is simple program which asks a user for 8 numbers, fills two 4-element vectors with these numbers, then adds these two vectors:
const std = @import("std");
pub fn main() !void {
const stdin = std.io.getStdIn().reader();
const stdout = std.io.getStdOut().writer();
var a = @Vector(4, u32){0,0,0,0};
var b = @Vector(4, u32){0,0,0,0};
var buf: [16]u8 = undefined;
for (0..4) |k| {
try stdout.print("a[{}] = ", .{k});
if (try stdin.readUntilDelimiterOrEof(buf[0..], '\n')) |inp| {
a[k] = try std.fmt.parseInt(u32, inp, 10);
}
}
for (0..4) |k| {
try stdout.print("b[{}] = ", .{k});
if (try stdin.readUntilDelimiterOrEof(buf[0..], '\n')) |inp| {
b[k] = try std.fmt.parseInt(u32, inp, 10);
}
}
const c = a + b;
std.debug.print("{}\n", .{c});
}
The program works correctly. But there is a question about generated code.
I compiled it with
zig build-exe v2.zig -O ReleaseSmall -femit-asm -fsingle-threaded
Then I inspected assembler output for SIMD instructions.
I see some, for ex. movups xmmword ptr [rdi], xmm0 and xorps xmm0, xmm0,
but I do not see no adding instructions, only moves and a couple of xors.
Why did not the compiler generate adding instructions? Does it mean that actual vector addition is done without SIMD instructions in this particular example?
Accoding to lscpu, CPU on the machine has sse, sse2, ssse3, sse4_1 and sse4_2 flags.