Below is simple program which asks a user for 8 numbers, fills two 4-element vectors with these numbers, then adds these two vectors:
const std = @import("std");
pub fn main() !void {
const stdin = std.io.getStdIn().reader();
const stdout = std.io.getStdOut().writer();
var a = @Vector(4, u32){0,0,0,0};
var b = @Vector(4, u32){0,0,0,0};
var buf: [16]u8 = undefined;
for (0..4) |k| {
try stdout.print("a[{}] = ", .{k});
if (try stdin.readUntilDelimiterOrEof(buf[0..], '\n')) |inp| {
a[k] = try std.fmt.parseInt(u32, inp, 10);
}
}
for (0..4) |k| {
try stdout.print("b[{}] = ", .{k});
if (try stdin.readUntilDelimiterOrEof(buf[0..], '\n')) |inp| {
b[k] = try std.fmt.parseInt(u32, inp, 10);
}
}
const c = a + b;
std.debug.print("{}\n", .{c});
}
The program works correctly. But there is a question about generated code.
I compiled it with
zig build-exe v2.zig -O ReleaseSmall -femit-asm -fsingle-threaded
Then I inspected assembler output for SIMD instructions.
I see some, for ex. movups xmmword ptr [rdi], xmm0
and xorps xmm0, xmm0
,
but I do not see no adding instructions, only moves and a couple of xors.
Why did not the compiler generate adding instructions? Does it mean that actual vector addition is done without SIMD instructions in this particular example?
Accoding to lscpu
, CPU on the machine has sse
, sse2
, ssse3
, sse4_1
and sse4_2
flags.