Plan to address "Attack of the killer features" parameter reference optimization aliasing issue just dropped!

kj4tmp · September 30, 2024, 5:30pm

eliminate hidden pass-by-reference footguns

opened 08:40PM - 01 Aug 20 UTC

proposal accepted

[Accepted Proposal](https://github.com/ziglang/zig/issues/5973#issuecomment-2380…332493) ----- Zig 0.6.0 (not master). This is related to, actually maybe a subset of, https://github.com/ziglang/zig/issues/4021 / https://github.com/ziglang/zig/issues/3696 (this issue doesn't involve result copy elision). I understand that this was intended to be a feature of zig: args passed as values "may" be silently translated to pass-by-reference by the compiler. I think the intent was to stop the user from passing const pointers as an "optimization". But it's also a footgun, sort of like https://github.com/ziglang/zig/issues/2915. The problem occurs when you have another non-const pointer aliasing the same memory as the argument value. ```zig const std = @import("std"); const Thing = struct { value: u32, }; const State = struct { thing: Thing, }; fn inner(state: *State, thing: Thing) void { std.debug.warn("before: {}\n", .{thing.value}); // prints 10 state.thing.value = 0; std.debug.warn("after: {}\n", .{thing.value}); // prints 0 } pub fn main() void { var state: State = .{ .thing = .{ .value = 10 }, }; inner(&state, state.thing); } ``` The behavior here depends on the compiler implementation. It seems that right now, if `thing` is a struct value, it's passed by reference. But if it's a bare `u32`, it's passed by value. I don't know if it will always be this simple (I assume there are plans to pass "small" structs by value.) The workaround for this situation is to make an explicit copy using a redundant-seeming optimization, probably accompanied by a comment explaining what's going on. Or else to restructure the code at a higher level, but then this footgun will still be lurking in the shadows. I think that any optimistic "assume no aliases" optimization ought to be opt-in rather than opt-out. That would mean, either go back to the C way of things, or add a new syntax (some symbol that means "compiler can choose between value and const pointer"). Either way, a plain argument should always be passed by value. What do others think?

To be clear, I’m not sure if this was discovered as part of “Attack of the Killer Features”, that is just what first brought the concept (PRO) to my attention.

kj4tmp · September 30, 2024, 7:00pm

" So, here’s our conclusion. PRO in its current form will cease to exist. The Zig compiler’s optimizer will gain the ability to notice that a function is pure, and promote parameters to references accordingly."

Does this have implications for incremental compilation?

dude_the_builder · September 30, 2024, 7:38pm

it will become desirable to use explicit *const T in cases where this was previously not idiomatic.

Yes, explicit is good.

pierrelgol · September 30, 2024, 8:22pm

I’m really glad they decided to go with this, I think it’s better to be explicit even if that let less room for the optimizer, because at the end of the way working code is better than that kind of footgun. I hope this gets implemented soon.

LucasSantos91 · September 30, 2024, 9:36pm

So we’re back to the C way of passing parameters. I’m glad that this is finally being addressed, but it’s sad that after all this time and all the talk about how awesome PRO was, we’re back exactly where we started.

kj4tmp · September 30, 2024, 10:06pm

I don’t think so? My interpretation is that very little has changed:

Using * const T (reads like pass by reference) has always and will always restrict the compiler to passing by reference. (no change here)
Using T (reads like pass by value) will only automatically pass by reference if the function can be detected as pure (change here is that the compiler is just doing less optimization).

I don’t see this as a closed door to more aggressive PRO happening in the future (this is not a semantic change to the language, only a optimization). And I don’t see this as a change that requires a lot of change to peoples code. (Unless there are a lot of people relying on PRO?)

Existing implementations that are using T may see a performance hit, but no aliasing bug.

I think this is just growing pains in implementing PRO.

kj4tmp · September 30, 2024, 10:13pm

Maybe its a bigger change than I am thinking?

Will there be a lot of changes to the std lib to change to * const T? Will we have two versions of many APIs? One for big data * const T and one for small data (T)?

LucasSantos91 · September 30, 2024, 10:29pm

PRO will only be applied to pure functions.
When writing our functions, we’ll have to, once again, start thinking if we are passing something big or small. Big things we want to pass by pointer and small things by value. That is the C way of passing parameters.

dimdin · September 30, 2024, 10:42pm

I don’t see a solution description.

The first problem is how zig will define a function as “pure”.
A definition might be: when you call the function with the same arguments you get the same results. But can the compiler actually detect these functions?
Will an imported function from a library be defined as non-pure, or will zig add a way to specify pure functions?

LucasSantos91 · September 30, 2024, 10:57pm

Certainly, a function will only be classified as pure if the compiler can see its body. Probably the rule will be:
Any function is pure unless it:

Is an external function (including dynamic functions, library functions, functions that came from C or functions that came from object files)
Does a syscall
Modifies global variables
Receives a pointer as a parameter and modifies it (including pointers in fields)
Calls an impure function

andrewrk · September 30, 2024, 11:24pm

just want it on record that I personally never claimed it was awesome. I only said it solved certain problems while creating other problems, and that its future was uncertain.

lots of people out there have bad zig takes

chung-leong · September 30, 2024, 11:58pm

I wish the programming world had kept the distinction between “function” and “procedure”.

chung-leong · October 1, 2024, 1:04pm

In Zig, a function’s purity also affects its availability at comptime. As such, perhaps it’s sensible to make it a property that programmers have to declare explicitly.

FObersteiner · October 4, 2024, 9:44am

Also, will the programmer be able to know what the compiler detected as ‘pure’, without looking at the assembly?

markus · October 4, 2024, 9:53am

I love that they just decided to go the explicit route. Back when I watched the talk and read some diacussion about it there were a lot of solutions that seemed like more overhead and complexity than just thinking about the arg size. Im so glad that this proposal was made.

Validark · October 4, 2024, 10:47am

Unfortunately, software borrows a lot of terms from math and then butchers them. For one, the term “vector”, which can mean:

A dynamically growable array in C++
A math concept of a point or magnitude and a direction.
A CPU concept of an extra-large register that supports a different instruction set, with most instructions allowing the register to be semantically divided into multiple pieces which are operated on in parallel of each other. Also called “SIMD vector”
A CPU concept similar to the previous one, except those multiple data elements are not operated on entirely in parallel but in consecutive time steps. So e.g. you might have a “vector processor” that allows you to operate on 512 byte vectors in one instruction, but it is probably going to sextuple-pump those instructions and pipeline starting them across the next 16 cycles. People familar with this concept would refer to the previous concept as Array processing, NOT vector processing. Although I think “array processing” could be an even worse term to say to software developers.
And more. Vector - Wikipedia (And that list doesn’t even mention SIMD at the time of writing)

Within SIMD, we have shuffles and permutes. Based on the original definition you might think it implies that you can’t end with multiple copies of the same individual parts, but actually that’s perfectly acceptable. In x86 lingo, a shuffle means “intra-lane” and conditionally zeroing if the top bit is set, and permute means “cross-lane”. And in this case “lane” refers to 16-byte chunks of the vector, not the other definition of “lane” which refers to how big the pieces are (bytes, in this case). The term “swizzle” is more common in the GPU space for a similar concept, although I think it’s more of a compile-time decision, whereas arbitrary shuffles and permutes can be computed at runtime. The Broadcom Videocore IV uses the terminology “Lane rotate” instead of swizzle, which solves the problem of the word “rotate” not really being butchered that much by most software and hardware people.

Validark · October 4, 2024, 11:05am

Conceivably there could be a builtin function that allows you to query during compilation whether it detected a function as pure or not, but one wonders whether this idea leads to people obsessing over this idea when perhaps they shouldn’t.

It reminds me of a proposal that was floated for Zig where you could specify that a variable is constant beyond a certain point, or that it is not to be used after a certain point. Both of these are things you can do already with blocks and by making a new variable declared with const, but this feature would make it easier. Andrew Kelley rejected it on the basis that it would become “good practice” to always make sure you specify that a variable isn’t used anymore. Even though this kind of feels like a good idea, we have to ask ourselves whether it makes sense for devs to obsess over something which the compiler should be able to figure out pretty easily. Some people even argue that const falls in that category!

Likewise, should programmers try to eliminate all the non-pure functions, except on the boundaries of the application (like unsafe in Rust)? Personally, I don’t know that that makes sense or matters that much.

Sze · October 6, 2024, 2:19pm

12 posts were split to a new topic: Parameter passing

Sze · October 6, 2024, 2:21pm

A post was merged into an existing topic: Parameter passing

mnemnion · October 6, 2024, 4:54pm

I learned to program using Turbo Pascal, a language which preserved the distinction, and it wasn’t as nice as you might think.

A function was just a procedure which returns a value. You could pass by pointer and mutate stuff in a function just as easily as a proc, the difference was quite minor and I think we’re better off just having a void type rather than two names for essentially the same thing.

Whether that name is function or procedure is just stylistic: Odin abbreviates proc and Zig abbreviates fn but they end up being the same thing.

You could get a “real” function by declaring all parameters Const, but of course this is possible in Zig also.

Then you have languages in the ML family which have actual functions in the mathematical sense. But those don’t have procedures.

If there was a language which had procedures which can mutate state and functions which can’t, I never saw it. Could be pretty nice! But that would be a new thing.