Scatter, gather, reduce w/ @Vector

hi, I am exploring a bit the use of Zig’s Vector which I initially thought would be great for some algorithms I’m using ISPC for, but ran immediately into some ops which didn’t fit with the existing support. As a simple example, according to godbolt, a kernel in ISPC such as

uniform float sum_square_at(uniform float *num, varying int idx) {
    varying float num_idx = num[idx];
    return reduce_add(num_idx);
}

results in fairly compact assembly compared to an equivalent C++ snippet forced to use SIMD, and the intent is clearer IMO.

I was curious if Zig’s Vector will get similar sets of operations or if inline assembly would be the required approach?

1 Like

Hello @maedoc - good question.

I did some searching through the github issues to check. I found better search results when I looked up “SIMD” instead of “Vector” but you’ll probably find examples of both.

Primarily, we’re talking about proposals here. A big chunk of the Zig effort at the moment is centered around things like incremental compilation, but there is a few open proposals for vectorized operations. One, for instance, is:

If you don’t see what you’re looking for, then I’d say that there isn’t a plan at this moment unless there’s more internal conversation about it.

There’s a lot of inline assembly in the Thread module, so there’s definitely precedence for having inline assembly calls for standard utilities. An issue that people seem to be having in that particular link I posted is whether or not the proposal would qualify as a builtin function. Since @Vector deals with builtins quite extensively, I can see this as an ongoing point of contention.

What are your top utilities that you would like to see added? Also, have you considered doing C bindings for the functionality that you would like? I see that ISPC is a C extension, so I can’t speak to how easy that would be in this case.

3 Likes

Here’re two examples of how that kernel could be written in Zig right now:

const std = @import("std");

const DIM = 4;
const Mat = [DIM]Vec;
const Vec = @Vector(DIM, f64);
const VecIdxComp = @Vector(DIM, i32);
const VecIdx = @Vector(DIM, VecIdxUint);
const VecIdxUint = std.simd.VectorIndex(Vec);

fn sumSquareAtComp(num: Mat, comptime idx: VecIdxComp) f64 {
    const mask = comptime idx * std.simd.repeat(DIM, [_]i32{ 1, -1 });
    const num_idx1 = @shuffle(f64, num[0], num[1], std.simd.extract(mask, 0, 2));
    const num_idx2 = @shuffle(f64, num[2], num[3], std.simd.extract(mask, 2, 2));
    const num_idx = std.simd.join(num_idx1, num_idx2);
    return @reduce(.Add, num_idx);
}

fn sumSquareAt(num: Mat, idx: VecIdx) f64 {
    var num_idx: Vec = undefined;
    for (num, 0..) |sub_num, i| {
        num_idx[i] = sub_num[idx[i]];
    }
    return @reduce(.Add, num_idx);
}

test "sumSquareAt & sumSquareAtComp" {
    const num = Mat{
        @splat(1.0),
        @splat(12.0),
        @splat(123.0),
        @splat(1234.0),
    };
    const idx = std.simd.iota(VecIdxUint, DIM);
    try std.testing.expectEqual(@as(f64, 1370.0), sumSquareAt(num, idx));
    try std.testing.expectEqual(@as(f64, 1370.0), sumSquareAtComp(num, idx));
}

The simpler sumSquareAt function already works on inputs with arbitrary number of dimensions, while sumSquareAtComp doesn’t scale as easily, plus, it requires the idx (input index vector) to consist of i32 and be known at compile-time.

3 Likes

I should have been looking through the proposals indeed and I somehow missed the std.simd module completely (how/where are docs on that sort of thing?), so I will do some of that catching up…

that’s an interesting suggestion I hadn’t thought; ISPC is a way to do CUDA/OpenCL style code for the CPU w/o a runtime all while generating very compact assembly (e.g. better than autovec in Intel’s commercial compilers), and similarly to Zig, it makes attempts to ensure integration with C/C++ is easy: once the ISPC code is compiled it is immediately callable via C ABI, so Zig could just call specific bits. I still like the idea of not needing a second compiler, if possible.

I’m not versed enough with Zig and its history but there may be some useful lessons to grab from ISPC since it’s worked out a fairly complete & usable SIMD story built on/with LLVM.

Thanks for the examples, but there’s a compile error on godbolt with Zig v0.10, what version do I need to compile those? I agree that I completely missed @reduce(.Add but the @shuffle doesn’t seem like a scalable approach for larger vector sizes. That’s why I was looking for an equivalent of num[idx] which would be a (performance-warning-worthy) gather op for which ISPC issues vgatherqps (which neither GCC nor Clang will do).

Thanks for the replies and I will keep reading!

2 Likes

That’s a pain-point we have at the moment. The standard library is full of utilities (people often don’t realize how much is actually in there… it’s quite extensive). Let me grab a few threads for you here (expand the subsections to see links)…

There’s always the “std” documentation itself (experimental): Zig Documentation

I’m sure there’s more but I think that’s probably a good start to understand some of the thinking towards the standard library.


In the words of Professor Farnsworth “A man can dream… a man can dream…”

That said, you could probably add a build step for this - I haven’t done much of multi-stage building/linking yet (and heads up, we get more questions about that by an order of magnitude than anything else). At the very least, there’s a good amount of conversation regarding the build system and now some popular youtubers are starting to go in on tutorials so hopefully we’ll break even at some point.

This may be beneath your level, but here it is for the sake of argument:

1 Like

I compiled those examples with Zig v0.11.0. I agree @shuffle doesn’t cut it here, I decided to include it just as a showcase. Currently Zig doesn’t have a gather function like the one ISPC uses, it would definitely be a worthy addition and a more appropriate function to use in that example.

1 Like

I really appreciate the effort to pull up all those links, I’ll definitely dig in.

:100:

I will definitely try this, since it would be only a handful of handy functions (maybe a slippery slope let’s see).

Once I have a good read through open proposals I may take a stab at it.

3 Likes

I’d stick wit the op names or else it is a pain to find them in the docs and you always wonder why they changed the name - is it the same or different?