Idiomatic Use of Self and Aliasing

so putting aside the nuances of aliasing, this thread raises a more fundamental question i have about struct arguments – in particular the idiomatic use of self

consider the following struct which exposes a getter/setter pair:

const Thing = struct {
    const Self = @This();
    _data: u8,
    pub fn getData(self: Self) u8 {
        return self._data;
    }
    pub fn setData(self: *Self, d: u8) void {
        self._data = d;
    }
};

my actual use-case is far more complex, needless to say… but i’ll sometimes find that i need to declare getData(self: *Self) to get things working; or i’ll sometimes go further and declare self: *const Self, etc… i compile everything for releaseSmall

if i understand the intent behind the compiler’s choice in passing a struct by value or reference, wouldn’t self: Self and self: *const Self mean the same thing??? even more subtle, would the compiler actually choose to pass a Thing instance to getData by value – even if i declared the parameter as *const Self???

said another way, is the choice to pass by copy or reference a two-way street in the case of getData???

2 Likes

Sometimes yes, but not always. Consider this:

var store: *const usize = undefined;
fn useStored() void{
  const value = store.*;
  // use value.
}
fn doStore1(ptr: *const usize) void{
  store = ptr;
}
fn doStore2(value: usize) void{
  store = &ptr;
}

In this snippet, we store a pointer to be used later. doStore1 does the correct thing, but doStore2 has undefined behavior. Sometimes you actually need to point to something, in this cases, the pointer is mandatory.
In the vast majority of cases, the value will be read and used immediately, so taking the parameter as either pointer or value would work. In such cases, Zig prefers that we pass by value, as it enables more optimizations, but it can result in the aliasing issue.

Theoretically, it could. As far as I know, parameter reference optimization is currently implemented in one way, that is, it transforms values into references, but not the other way. Someone could implement what you’re saying, but in order to do this transformation, the compiler would have to analyse the code and prove that the reference isn’t necessary, which would be quite expensive and would rarely pay off.

1 Like

Context - moved from solved thread: Copy or reference?

@LucasSantos91 – so in my example with “getters” and “setters”, i should prefer self: Self for the former and self: *Self for the latter… this seems to align with the current design/implementation of the compiler…

but correct me if i’m wrong, a “small” struct that the compiler elects to pass by value can still contain a (readonly) pointer referencing (readwrite) memory…

my use-case involves “box-ing” scalar values (including pointers/slices) into single-field structs that are trivially copied via registers… methods on these “smart box structs” can often enforce stricter semantics that simply exposing the boxed value…

2 Likes

[I edited doStore2 to what I infer you meant by it].

I think it would be a good goal to make this specific thing a compiler error: any pointer to a reference parameter which escapes the function the parameter is passed into should be illegal. I don’t know how good the compiler’s escape analysis is, but it should be possible for cases which aren’t insane, like the old integer-and-back trick where you deserve whatever problems you’ve caused yourself.

I do see some problems with covering every case, because passing the value parameter down the stack, as a *const T, should be legal. Separately, a *const T parameter escaping the function should also be legal. Put those things together and congratulations, you have aliasing issues. Zig will always be a sharp object, though: each of those operations is valuable separately, and detecting all cases where they’re combined into UB might be hard (meaning, slow).

Still, I see no reason to allow a pointer-to-reference-parameter to directly escape the scope or travel up the stack via return values. That code is a priori incorrect and there’s no reason it should compile (I’m open to counterexamples).

Yes, but that is a different aliasing that what we were discussing. What you’re refering to is the classic aliasing that happens in shallow copy vs deep copy.

That is proposed.

1 Like

Similar enough that it’s probably not worth having a separate tracking issue, sure. Pointer-to-reference-parameter is a subtly different kind of wrong from pointer-to-stack-variable, but they have the same feature of being safe to pass down stack (assuming nothing keeps that address) but undefined to capture out-of-scope or to pass up the stack.

The same kind of analysis would be needed to identify either, so one issue is fine here I reckon.

1 Like

This bit me a few times, and I tried hacking zig to enforce not copying when using the “a.foo()” syntax. My takeaway was that it probably would prevent a narrow class of bugs, but they weren’t as prevalent as I expected, and it makes certain idioms more awkward.

You can read my trip down the rabbit hole (I recommend reading the proposal and my final comment at the end):

3 Likes