Having never programmed in C before, this is an area I am not always 100% confident which is the best approach: use a reference (ie pointer) or use values?
So question is, what are the general recommendation or best practices related to when to use either?
When I have a struct and I have a field that holds a reference, what are the gotchas or things I need to worry about?
When I have a function that takes a reference, what are the gotchas or things I need to worry about?
If I have a struct, that holds another struct as value, is there ever a case where copying the struct could lead to an undefined behavior? or such undefined behavior could ever be the case when the struct holds a reference?
How does both affect performance, memory usage?
How does both work when threads are in the picture?
Just basically general knowledge that a neophytes who never used a language with manual memory management needs to know when working with pointers/values.
If the size of the value is smaller or equal to the size of the reference, always use values. (e.g. u8 is a single byte; it is smaller than the pointer size in any system).
If the size of the value is up to two times the size of the architecture (cpu registers size) values are preferred. (e.g. a slice, that is used as value, is two times the size of a pointer)
Otherwise use references. (e.g. an 80 byte buffer).
The guideline says: “the size of the reference” and not “the size of the pointer” on purpose. The reference is not always a pointer. A reference can be a small integer that acts as index in an array.
Don’t restrict the meaning of “use values” for function arguments. “Use values” means functional style, pass the old value as parameter in the function and return a new copy of the value.
Zig optimizes the use of references for function arguments and return values. Zig can choose the best way to do it (by value or by reference) when you pass the arguments by value.
When you need to update a big argument value, you must pass it by reference, since values passed by value are constant.
Zig does not optimize value capturing. For example in:
for (values) |value| {
value.print();
}
the value is always a copy! So if the size of value is big you must use a reference:
This guideline is for optimizing code in C. In zig, it doesn’t apply, and it also doesn’t address correctness, only performance.
In Zig, always pass by value, unless the correctness of the code requires a pointer. This applies even to things smaller than a register, even a byte.
If you need to modify something inside a function, pass a mutable pointer, even if it’s smaller than a register.
If you need to store an address that will be read or written later, pass a pointer, either const for read or mutable for read or write.
For everything else, Zig wants you to pass by value and trust the compiler to optimize it (that’s the goal, but there’s the aliasing miscompilation, and sometimes the compiler misses optimizations…).
For the gotchas, there’s always the risk of dangling pointers, specially if you are storing pointers to stack variables. You need to ensure that you won’t use the pointer after the pointed data is freed. The easiest way to do that is to ensure that the pointed data will outlive the struct holding the pointer, or that both will die simultaneously, that is, you free the data in the deinit method of the struct.
If there are no pointers anywhere, that is, it’s just values nested inside values, you are always safe. The only risk I could think of is accidently copying something and then mutating the copy instead of the value that you meant to mutate, but const correctness makes that very unlikely. This is the preferrable way, and should be default. It improves perfomance and safety of the code. Only store pointers when it is absolutely necessary.
Like I mentioned before, if you need to modify something inside a function or if you need to store an adress now but only read it later, you need a pointer. Your program won’t be correct if you do anything else.
But I can’t think of any other use for pointers in Zig.
Passing by reference explicitly is a bad practice in my opinion. First, you’re building the assumption of symmetrical memory access into your code, that the callee can just as easily read the memory in question as the caller. That’s not always the case. For example, suppose you’ve compiled a Zig function into WASM for use in a web browser. The WebAssembly VM does not have access to JavaScript memory. Arguments have to be copied first into the WASM memory. A call to a function that passes by-value would require only one transfer–the argument, whereas a function that passes by-ref would need two–the pointer and what it points to.
Another problem is the implicit assumption of data cohesion. Say we have a function that accepts the following struct as an argument:
const Point = struct {
x: f64,
y: f64,
z: f64,
};
When you pass .{ .x = 1, .y = 2, .z = 3 } by value, the language guarantees that the function will see .{ .x = 1, .y = 2, .z = 3 } through the entire duration of the call. You lose that guarantee when you pass by ref. While your function is reading x another thread could conceivably alter y to some different value.