On Vector Syntax

nyc · May 26, 2024, 12:27pm

The big question is why @Vector(num, type) like @Vector(4,u32) which is totally backwards from literally everything else – @Vector(type,num) would be more consistent.

And more to the point, why isn’t @Vector(4, u32) really just v4u32.

These are the important questions.

dude_the_builder · May 26, 2024, 2:05pm

The first time I saw @Vector(N, T) I thought the same, but then I realized since it’s highly interoperable with arrays, it kinda makes sense that it follows the array type syntax [N]T. I don’t know if that’s the official reason though. In terms of v4u32, I think following the lead of arrays once more would lead to a syntax involving some sort of enclosing character. Since [ { ( are already in use, maybe <N>T, so <4>u32 is a vector of 4 u32s?

Sze · May 26, 2024, 2:14pm

I think if @Vector was made into syntax, I would prefer just reusing [N]T maybe adding a prefix like vec[N]T

nyc · May 26, 2024, 2:15pm

so its array ordering of notation but used like a function. i can see that, but i think that just makes it inconsistent with both other constructs. yuck. vectors need to pick a side. Is it a function, then use function argument ordering. is it an array when use array-like syntax.

I do like the idea of <N>u32 though.

They are a little unlike arrays in that multi-dimensional doen’t make as much sense (all interpretation have some weirdness involved), and you can’t have a vector of arbitrary types, but even with those small things, I like the <N>T.

Or make arrays be a123type Then the length really is encoded in the type. (edit: lol you could have an agglutinative – german-like – type naming system a10a10v4u32 would be an a 10x10 matrix where each element was a vector of 4 u32s)

Validark · June 1, 2024, 4:02pm

It’s planned to give dedicated syntax to Vectors. [|N|]T was an option Andrew seemed to like.

mnemnion · June 1, 2024, 4:50pm

Eww. Vectors shouldn’t have special syntax, because vectors shouldn’t exist as a distinct Zig concept.

Vectors are just arrays, with a special syntax (bad) so you can perform SIMD operations (good) with overloaded operators (bad!).

What I’d like to see is a distinct vectorized syntax for each operator. For +, it would be [+], and so on. These could be applied to any two arrays of the same length, or comptime-known slices, of an appropriate type (same restriction as vectors have now). Using vectorized operations on arrays of the wrong type would be a compile error, just like using scalar operations on scalars of the wrong type is now.

Currently, we have four ‘contiguous aggregate’ types: tuples, slices, arrays, and vectors. That’s one more than we need, arrays and vectors are both single-type aggregates of a compile-time known length.

What’s worse is that every operator has two meanings: it’s either vectorized or it isn’t, and you have to check the types of the operands to find out. That’s… operator overloading. If we’re going to have operator overloading in Zig, I want some! If we’re not (this is the consensus, and one I agree with), then let’s not have operator overloading.

I think [+], [*%], and all the rest, would provide maximum clarity. It combines the syntax for slices and arrays, with the operator. Hard to miss. If people simply hate that, we could take a page from Julia and use their broadcast syntax: .+, .*%, and so on. It’s more subtle and less obvious, but it’s lighter on the page. Expressions like v1[0..3] [*] v2[1..4] vs. v1[0..3] .* v2[1..4]? I still prefer the bracketed operators, but either way, a case could be made.

nyc · June 1, 2024, 6:00pm

Almost there with you, zig shouldn’t have vector array-like types like it does now. Too much overlap with arrays, especially considering auto-vectorization by llvm.

Sometimes they are difficult to get what you want to be generated, and there are some concepts missing (like mask registers). I think it would be better to have vector register types (a type that represents a 256-bit packed u32 for instance – in one thread i was bs’ing about calling it v256u32 for lack of a better name.

This would allow you to control the bit width of the code gen (which i haven’t found a way to do yet without casting too wide a net). Always generating 512-bit simd instructions just because the CPU supports them is horribly naieve. There are two schedules of simd instructions “heavy” and “light” in each bit class. The heavy 256-bit instructions are 1 cycle more expensive than the light ones, but the heavy 512-bit instructions are painful and the CPU delays going into simd mode so the first few are even more expensive (so you can’t just throw them in randomly - the cpu wants a long stream of them).

combine that with some machines (eg, certain AMD) might only have a 256-bit wide bus, and the 512-bit ops have no benefit, but the that isn’t accounted for.

At some level, great simd (anything less than great and letting auto vec do its thing is just a better trade off) requires knowing your reigster sizes, and masks, and all the intrinsics that aren’t exposed in zig. If you have the intrinsics, the register types help tremendously or else all the simd routines (instinsic and libraries) turns into a mess of anytype and comptime magic again.

And please a ligher syntax than [*] (plus that overlaps with the many-pointer syntax too much I think - even if it doesn’t conflict or make weird parser corner cases, I’d prefer something less overlapping and less typing [|v|] is an absolute lack of pragmatism and just a lot of characters to type (pipe is non trivial to type on some keyboads too - i have a split QMK where I even gave it a special key with _ down near the space bar it i such a PITA). It’s sad ++ and ** are taken or else those might have been nice. Backtick is hugely unused too `+ might be good or ^+ even.

mnemnion · June 1, 2024, 7:41pm

Drat, that didn’t occur to me. The parser could handle it but that’s a strike against the syntax for sure. .* doesn’t suffer from the same problem (I hope!). I’m pretty indifferent to how it’s expressed, so long as array operations are syntactically separate from scalar ones. It isn’t a huge deal in the grand scheme of things, but IMHO it’s better for the language.

Edit: although I remember thinking that vecptr.* and vecptr .* othervec is asking for trouble…

nyc · June 1, 2024, 9:04pm

definitely, too many dots. time to switch over and start abusing another punctuation mark for a change. Time to give dot a break.

AndrewCodeDev · June 1, 2024, 9:10pm

This whole conversation brings up an interesting point.

Who here thinks of “SIMD” more through the lense of types vs processes? I suppose the same question could be asked of atomics.

I’m going to propose that it’s important to pick an interpretation and build towards that. The @Vector builtin encourages thinking of them as types.

nyc · June 1, 2024, 10:18pm

To get top simd performance you need the intrinsics. Another way hasn’t been shown yet. Yes can get lke 60% without them, but to get the fast code, so far there is no other way. To make one would be PhD project in itself.

I think about them in terms of process definitely, but then I have to convert that process into the actual simd langage (what you are referring to types, I think).

Sze · June 1, 2024, 11:27pm

I haven’t done something that low-level / performance specific, but I was under the impression that with Zig you essentially can write a library of “intrinsics” in user space by combining branching on the target with inline assembly wrapped in functions, to create functions that boil down to the appropriate instructions for the architecture? Do intrinsics have benefits beyond that?
(To me they seem like a c/c++ thing that could be solved in other ways)

If that is the case, it might still be annoying having to do that work manually per architecture, but I guess somebody has to do that work anyway, so maybe a library built by a community of people that want to write code that low-level, makes more sense and is more flexible and adaptable to different needs, then forcing one specific way into something that is implemented by the language directly?

It seems sensible to me to limit the scope of the language, towards the more common cases, plus tools that allow you to fill the gaps, instead of putting everything into the language.

That said, I am more of a curious observer of these things, until I eventually have time for a project, where I can invest the time, to also get practical experience with these things.

nyc · June 2, 2024, 12:09am

asm in a poor replacement for intrinsics. It is just a black box for the compiler, and it has a very rough time optimizing when the code involves asm blocks,. They are best when trying to tune routines for a very specific architecture and you can write blocks that this include the optimizations you’re trying to get into code.

They are very bad at generic code (since each asm block is placed verbatim in the instruction stream – this is why you can make an empty asm block that takes in an argument to blackhole the value and prevent its optimization – the compiler doesn’t see into the block even when the block is literally empty). Intrinsics produce better code especially the code surrounding the call vs the code around the asm block. The compiler knows more and has more freedom with intrinsics. They are universally preferring by ever compiler I’ve ever worked with.

mnemnion · June 2, 2024, 12:10am

I think of them in terms of operations, which is why I would like them to be exposed through distinct operators. <+> maybe? |+|?

But vector operations are something you do to an otherwise ordinary array. I’m not even entirely convinced they should be expressed in the syntax. Now that Zig has implicit (no index) for loops across two arrays of the same length, it isn’t exactly playing on hard mode to vectorize those loops. It can be done with index based for loops in the C tradition, but there’s more to recognize, more cases to exclude, the Zig case makes it dead easy.

It’s probably too late in the game to make a major breaking change like introducing vectorized operators with a distinct syntax, especially since they’re ahem overloaded, so it isn’t feasible to provide a migration script.

But that doesn’t mean I like it. Operators have certain expected qualities: they don’t allocate, they’re on numbers, and they’re O(1). Zig’s autovectorizing operators are O(n), and again, you have to check the types involved to know what you’re dealing with.

And it’s kind of a hard sell to the scientific-numerics types that Zig does have array ops, but only for arrays of one dimension. If you want to do matrix ops, welcome to the land of add(a, mul(b, c)). It’s not like matrix multiplication requires heap allocation, either, the size of the resultant is known.

You can’t exactly tell them that Zig wants operators to be predictable, because that ship has sailed. Is a + b one instruction, or are a and b 128k vectors, and the cost varies based on what width of SIMD is available?

Arguably, just remove them from the language. Instead of this:

    const a = @Vector(4, i32){ 1, 2, 3, 4 };
    const b = @Vector(4, i32){ 5, 6, 7, 8 };

    // Math operations take place element-wise.
    const c = a + b;

This

    const arrayA = [4]i32{ 1, 2, 3, 4 };
    const arrayB = [4]i32{ 5, 6, 7, 8 };
    
    const arrayC = for (arrayA, arrayB) |a, b| a + b;

This is a compile error now, “error: value of type ‘i32’ ignored”, which is good, because it means that introducing that syntax would be backward-compatible.

You’ll note that this is trivial to vectorize, it calls for no analysis to speak of for the compiler to see that it’s a candidate.

AndrewCodeDev · June 2, 2024, 12:52am

Depends on what scientific types you’re selling to.

Many of them can’t even consider working on something run on a CPU as opposed to a GPU, TPU, etc… I’m not sure if Zig’s take on SIMD is make-or-break here. I’ve been working on CUDA stuff for a while now and I’d probably not use SIMD for most of the things I’m doing (it’s painfully slower… like 5-10 minutes compared to 10-15 seconds).

I get your point about the a + b example but I’m not as extreme about what it can (or in this case, cannot) imply. In a hyper minimal statement like that, there’s not much information to judge. However, when we reintroduce a context, I’d say we gain most of our predictability back:

const a: usize = 2;
const b: usize = 3;
const c: usize = a + b;

The issue you’re talking about is whether an @Vector actual does something SIMD or not. It’d be great to have a compiler flag around that or something that says “warning, this isn’t actually SIMD… it’s a loop…”

nyc · June 2, 2024, 1:00am

llvm and gcc both have switches to dump out a bunch of info about its optimization decisions including dumping out the autovec tree. that would be sooo useful (while your at it, the gprof flag would be great too).

mnemnion · June 2, 2024, 1:53am

I mean the ones who want to do some matrix math and are told that you get one, and only one, dimension in Zig. It’s just a weird place to stop. Not supporting GPU programming isn’t an arbitrary limitation, this very much is.

If Zig is “no overloading”, great, it should be no overloading. If it’s “no user-defined overloading but we support vectorized operators”, don’t limit the dimensions, just finish the job and support n-dimensional array operations.

The issue isn’t SIMD or not SIMD for me, it’s having one and only one overload for the operators. My point with the loop construct was to show that it’s just as capable of vectorization as the overloaded operators are, and to demonstrate a way that the result could be inferred, like the documentation example.

This kind of conversation has a tendency to make people sound like they care more than they do, though. It’s not going to stop me using the language for sure. But the @Vector syntax predates the for loop, so it was probably a useful bootstrap for getting SIMD into the compiler without adding somewhat complex analysis of while loops to see if they might qualify.

So maybe it’s time to revisit that experiment. I don’t think the language would miss it.

AndrewCodeDev · June 2, 2024, 2:04am

Oh of course, we’re just talking about what we’d like to see happen. I just respectfully disagree on a few points here.

Overloading as far as types is concerned is a big part of the Zig ecosystem. Let’s take + for instance. It works for u32, i32, f32 and that’s just mentioning the scalar types under the common banner of +. I think any language that fundamentally mixes types over operators already has a precedent for operator overloading.

I’d rather have the best set of overloads that allows me to express my intent at a level I feel confident in (for instance, I don’t want a unique symbol for addition across i8 and i7), thus I take a softer stance on the consistency here. However, we do have an issue when we try to move to array like types.

I’ll have to think about n-dimensional array support. I’ve come across very few that I really like and I think standardizing the wrong thing can do a lot of damage (I think that’s how we got onto this subject via async).

dee0xeed · June 2, 2024, 1:59pm

@Vector creates a type, qoute from the doc:

Vector types are created with the builtin function @Vector.

So you can redefine type names as you like:

const std = @import("std");
const log = std.debug.print;

const v4u32 = @Vector(4, u32);

pub fn main() void {
    const v = v4u32{1, 2, 3, 4};
    log("v = {any}\n", .{v});
}

nyc · June 2, 2024, 3:51pm

that wasn’t a serious comment