When to use a struct vs a struct pointer

Hi, I was wondering what are the implications of using structs vs struct pointers when passing data around, since Zig automatically dereferences struct pointers, the implementation looks very similar in a lot of cases and I want to understand better the implications of either approach; in particular I have two examples to illustrate what I mean:

  1. Nested structs
    When using a struct as the field of another struct, is there a significant difference between these two approaches?

    const A = struct {
        x: u32,
        y: u32,
    };
    const B = struct { a: A };
    const B_ptr = struct { a_ptr: *A };
    

    In particular, if A was a relatively large struct and I did some like this:

    var a = A{ .x = 1, .y = 2 };
    const b = B{ .a = a };
    const b_ptr = B_ptr{ .a_ptr = &a };
    std.debug.print("b: x = {d}, y = {d}\n", .{ b.a.x, b.a.y });
    std.debug.print(
        "b_ptr: x = {d}, y = {d}\n",
        .{ b_ptr.a_ptr.x, b_ptr.a_ptr.y },
    );
    

    Would b_ptr be smaller than b because it only needs to store a pointer to a,
    or is this optimized away by the compiler? Is there any other significant different
    between how b and b_ptr are compiled?

  2. Struct methods
    When declaring a function inside a struct, the function turns into a method if it’s fist parameter has the type of the struct or of a struct pointer, so my question is this: Is there a difference between using a struct or a struct pointer? The only difference I have noticed is that, if the method modifies the struct instance, it can not be defined without using a pointer, because function parameters are always immutable, e.g.,

    const C = struct {
        x: u32,
        pub fn mehod(self: C) void {
            // This fails because function parameters are always constant.
            self.x += 2;
        }
    
        pub fn mehod_ptr(self: *C) void {
            // This will succeed as long as the instance is a variable.
            self.x += 2;
        }
    };
    

    Beyond that, are there some other considerations that I should take into account when deciding between these two approaches?

2 Likes

More importantly, the function receives a copy of the value, i.e. a copy of the struct. So modifying it would have no effect outside of the function.

What I’m unsure about is whether the compiler may optimize the copying or whether there is a performance advantage by explicitly passing a pointer in some cases (even when no mutation is needed).

The alignment is probably different (depends on target pointer size), this is very visible if you add a u8 field to both b structs, B would increase by the size by 32 bits, whereas B_ptr (assuming 64 bit system) would increase in size by 64 bits.

if A is larger than a pointer then a pointer to A is smaller than A, zig doesnt optimise this out as knowing the size of types can be very important, it leaves these things up to the programmer.

if the struct is large it will be faster to call with a pointer as pointers can be passed through registers and it can avoid making a copy of the struct.
The optimiser may do this for you, zig used to force this optimisation but it caused issues (i forget if zig still does).
The ABI can also require structs to be passed via pointers, which the compiler will respect.

Apparently these issues are discussed in #5973.

1 Like

Doing some more web search, I found this reddit post regarding questions about aliasing. While I don’t share any impatience if there is some (I rather prefer stabilizing/freezing later than sooner), it makes me notice that it’s still unclear how some fundamental rules regarding pointers are going to turn out in the end. See also the open #1108 in that matter.

If I understand right, then we can currently pass a struct of type S to a function in a couple of ways:

  • x: S (copies the data, or, if safe, internally uses a *const S reference, invisible to the programmer, but you can’t rely on this optimization)
  • x: *S (passes a reference such that the function may modify the struct)
  • x: *const S (enforces passing by reference but doesn’t allow mutation)
  • noalias x: *S (passes a reference where the caller must ensure that during the runtime of the function no other reference to the struct is used)
  • noalias x: *const S (same as the previous, but disallowing mutation)

Did I get that right? noalias seems to be somewhat underdocumented yet (and also not sure if it will remain existent, if I understand #1108 correctly).

I’d be interested to know what’s the current most-probable future of aliasing in Zig. I understand that as of yet, as a programmer I have to expect breaking language changes in the future, and I don’t mind that. I like when things are done right, even if it takes time.

I’m fairly certain noalias applies only in relation to other parameters and global variables, aliases outside of those are still allowed, though I could be wrong.

That’s how it works in other languages at least, zig ofc provides no information :3.

As far as I understand, other aliases may exist, but not be used during runtime of the function.

The optimiser may do this for you, zig used to force this optimisation but it caused issues (i forget if zig still does). The ABI can also require structs to be passed via pointers, which the compiler will respect.

if A is larger than a pointer then a pointer to A is smaller than A, zig doesnt optimise this out as knowing the size of types can be very important, it leaves these things up to the programmer.

So my impression is that when passing a struct to a function, I don’t need a pointer*, since the compiler can choose if it’s better to pass the argument by reference. When using a struct as a a field though, the compiler won’t modify how things are structured so I may want to think about it. My guess would be that when creating a lot instances which each have sizeable fields, it may it’s probably better to use pointers for the fields, but when the fields are light and there are few instances, dereferencing the fields will be more work than just coping the stucts.


* I’m talking here in terms of performance, I may need a pointer for some of the function logic, e.g., modifying the underlying struct

Yes, that’s how I understand it as well, at least up to noalias, which I was unaware of.
My question would then be: In which cases would one want to use option 3 x: *const S?

Passing by reference avoids copying the structure. If your structure is big, even if the function isn’t gonna modify it you can avoid making a copy of it during the call by passing a reference to it.
Marking it const ensure that the function isn’t gonna modify it

1 Like

I think what matters more here is memory layout and cache efficiency.

If you just count the total memory then adding the indirection via pointer strictly adds to the needed memory when you count both the parent and child struct plus the pointer, however if you rarely access the child struct then it can be beneficial because if the child is big, if most of the time you only access the parent struct than that means that you can put many of the parent structs in some array where you can quickly iterate through all the parents and then only access a few of the children through the indirection.

On the other side if you need to access the child struct/field always then adding that indirection may be detrimental, especially if the child struct is randomly placed all over many different memory pages.

I think overall we should get away from the overly general philosophizing about performance, instead write concrete code and learn to apply data oriented techniques to that, so I think we should study and experiment with concrete examples instead of generalizing too much.

For example find out how many cache lines your algorithm touches with a typical example set of data with one memory layout vs the other and then measure whether that improves the speed of your algorithm overall, measure cache hits vs misses, etc.

4 Likes

Disclaimer: I’m a newbie yet, so I’m not sure if I get everything correctly either.

I’m not sure if that’s true. I assume the compiler can’t always make the best decision and sometimes must refrain from passing it by reference (even if it would cause no issues).

So I believe when strictly talking about performance, then manually passing a struct by reference just because of performance reasons may still be useful in some cases.

But honestly I’m not sure how to weigh the pros/cons in each case. Are there some rules of thumb? I’ve been confused sometimes what to do. (And I try to remind myself that *const is yet another option not to forget.)

Exactly when you want to ensure that a big structure isn’t copied (because as of yet, the compiler otherwise can’t ensure that in every possible case for you), but you don’t want to allow modification of the struct within the function.

So x: S ensures that x doesn’t get modified at all by any operation during the runtime of the function (at the price of potential copying overhead). And x: *const S ensures that x doesn’t get modified through that pointer x (but could still be modified by other pointers that exist). (Edit/question: or does const also imply that the contents can’t be changed through other pointers?)

Still, I have not much practical experience when to use which of those. (And maybe this will also change as design choices are being made in Zig.)

The old philosophy was to just pass by value by default, only changing the argument to a *const if you have a specific reason, but since the language shifted away from aggressive PRO, it seems that the by-value/by-ref convention is now basically the same as C:

  • Choose a semi-arbitrary amount of bytes (usually 16-64) as the max size of a pass-by-value parameter.
  • When first writing your function, use this byte limit convention. If you need to improve the performance of the function, decide on by-value vs by-ref based on benchmark(s).

If you are referring to const pointers, the pointed-to value can be changed by other pointers.

If you are referring to const values, the value itself can’t be changed by pointers. It would require @constCast to even get a non-const pointer to a const value, and writing to that pointer would be illegal.


This whole by-value vs by-ref thing has always been a slight sticking point for me with Zig, even though it usually isn’t that much of a practical barrier when writing code. I feel like it goes against the philosophy of ā€œone right way to do thingsā€. You could say that ā€œbenchmark itā€ is the ā€˜one right way’ in this case, but I feel like there still needs to be a widely accepted baseline that is used before optimization/benchmarking, rather than it just being your arbitrary choice of heuristic.

I also don’t think that agreeing on a common byte threshold for by-value/by-ref is a complete solution, as the threshold chosen could become less sensible as hardware changes, plus it would require you to sometimes implement duplicate by-value and by-ref versions of methods for generic structs and structs containing values like usize whose sizes change based on the target.

2 Likes

Then I would attempt to summarize as follows, regarding function parameters:

  • If you need mutation, use x: *S irregardless of the struct size.
  • If you don’t need mutation then:
    • Use x: S for small structs or in cases where you need to ensure that x isn’t changed while the function runs (because in doubt, the compiler will create a copy for the function).
    • Use x: *const S for larger structs. But be aware that the value pointed to by x may change, e.g. if you call some function that modifies the memory pointed to by x thorugh other ways (e.g. by changing a global variable or memory behind a different pointer).
1 Like

Thanks for your perspective, I completely agree with this point:

I think overall we should get away from the overly general philosophizing about performance, instead write concrete code and learn to apply data oriented techniques to that, so I think we should study and experiment with concrete examples instead of generalizing too much.

I completely agree that there is no real alternative to simply testing what the trade-offs are in a concrete system; what I was wondering in this thread was, when I have some code that’s using structs, if I don’t what to experimentally compare all the different potential implementations because performance is not critical, how should I think about the problem to decide whether to use a pointer or a copy of the struct?

In that context, your examples about cache are super valuable.

This whole by-value vs by-ref thing has always been a slight sticking point for me with Zig, even though it usually isn’t that much of a practical barrier when writing code. I feel like it goes against the philosophy of ā€œone right way to do thingsā€. You could say that ā€œbenchmark itā€ is the ā€˜one right way’ in this case, but I feel like there still needs to be a widely accepted baseline that is used before optimization/benchmarking, rather than it just being your arbitrary choice of heuristic.

That’s exactly my feeling, I understand that there’s no general solution that works the best for every case, but I don’t what to be benchmarking every use of a struct in my code either so what I’d like are some guiding principles o rules of thumb I can use to reason about which option to use.

The answer is … by benchmarking rules of thumb you’re considering.

The breakpoint in performance between choices will be cpu-dependent, including factors of register size and availability, cache-line size, etc. Building for an embedded CPU will differ a lot from old desktop/laptop cpus, and again different for modern CPUs. The other factor will be the pattern of usage of the struct fields. For large structs, possible performance issues will perhaps besolved by sharding the struct (DOD) rather than fiddling with pointer vs copy.

I’ve created a little test script for you, to work with, hopefully it helps. I wrote it up really quick so there may be a few mistakes in calculations, or grammar :slight_smile:

const std = @import("std");

fn getCacheLineSize() usize {
    return std.atomic.cache_line;
}

fn calculateCacheLines(comptime T: type, count: usize) struct { lines: usize, efficiency: f32 } {
    const cache_line_size = getCacheLineSize();
    const struct_size = @sizeOf(T);
    const total_bytes = struct_size * count;
    const cache_lines_used = (total_bytes + cache_line_size - 1) / cache_line_size;
    const efficiency = @as(f32, @floatFromInt(total_bytes)) / @as(f32, @floatFromInt(cache_lines_used * cache_line_size));

    return .{ .lines = cache_lines_used, .efficiency = efficiency * 100 };
}

// Monster is 1 + 4 + 4 + 6 = 15, but we can't align this correctly so we add 1 bytes, to get to 16
// 16 is a multiple of 4.
const Monster = struct {
    height: u32, // 4 bytes
    weight: u32, // 4 bytes
    rider: Rider, // alignement of 2, bytes 6
    attack_type: AttackEnum, // 1 byte
};

// the rider struct is 6 bytes, and since 6 is multiple of 2 we do not need to pad
const Rider = struct {
    name: [4]u8, // 4 bytes alignment of 1
    skill_rank: u16, // 2 bytes, alignement of 2
};

// 4 + 4 + 4 + 1 = 13, but since our alignment is still 4, then we need to add 3 bytes ie 13 + 3 = 16;
// we are now wasting 3 bytes or memory
const MonsterWithPtr = struct {
    height: u32, // 4 bytes
    weight: u32, // 4 bytes
    rider: *Rider, // depending on your system, could be 32 128, or 64, we will use 32 bits for now, ie 4 bytes alignement 4
    attack_type: AttackEnum, // 1 byte
};

const AttackEnum = enum { fire, water };
// Memory layout:
// [height----][weight----][attack_type][padding][padding][padding]
//  0  1  2  3   4  5  6  7     8        9       10      11
//
// Total: 16 bytes (not 15!) because:
// - Struct alignment = max field alignment = 4 bytes
// - Size must be multiple of alignment
// - 15 rounds up to 16 (next multiple of 4)

const MonsterFire = struct {
    height: u32, // 4 bytes
    weight: u32, // 4 bytes
    rider: Rider, // alignement of 2, bytes 6
};

const MonsterWater = struct {
    height: u32, // 4 bytes
    weight: u32, // 4 bytes
    rider: Rider, // alignement of 2, bytes 6
};

fn analyzeStruct(comptime T: type) void {
    const cache_line_size = 64;
    const size = @sizeOf(T);
    const align_size = @alignOf(T);

    std.debug.print("Struct: {s}\n", .{@typeName(T)});
    std.debug.print("Size: {} bytes\n", .{size});
    std.debug.print("Alignment: {} bytes\n", .{align_size});
    std.debug.print("Structs per cache line: {}\n", .{cache_line_size / size});
    std.debug.print("Cache line efficiency per Struct: {d:.2}%\n", .{(@as(f32, @floatFromInt(size)) / @as(f32, @floatFromInt(cache_line_size))) * 100});
}

pub fn main() !void {
    analyzeStruct(Rider);
    var data = calculateCacheLines(Rider, 1024);
    std.debug.print("\nNumber of Cache lines: {}\nTotal Memory: {}\nTotal Cache Line Eff: {d:.2}%\n", .{ data.lines, data.lines * getCacheLineSize(), data.efficiency });
    std.debug.print("\n---------\n", .{});

    analyzeStruct(Monster);
    data = calculateCacheLines(Monster, 1024);
    std.debug.print("\nNumber of Cache lines: {}\nTotal Memory: {}\nTotal Cache Line Eff: {d:.2}%\n", .{ data.lines, data.lines * getCacheLineSize(), data.efficiency });
    std.debug.print("\n---------\n", .{});

    analyzeStruct(MonsterWithPtr);
    data = calculateCacheLines(MonsterWithPtr, 1024);
    std.debug.print("\nNumber of Cache lines: {}\nTotal Memory: {}\nTotal Cache Line Eff: {d:.2}%\n", .{ data.lines, data.lines * getCacheLineSize(), data.efficiency });
    std.debug.print("\n---------\n", .{});

    analyzeStruct(MonsterFire);
    data = calculateCacheLines(MonsterFire, 512);
    std.debug.print("\nNumber of Cache lines: {}\nTotal Memory: {}\nTotal Cache Line Eff: {d:.2}%\n", .{ data.lines, data.lines * getCacheLineSize(), data.efficiency });
    std.debug.print("\n---------\n", .{});

    analyzeStruct(MonsterWater);
    data = calculateCacheLines(MonsterWater, 512);
    std.debug.print("\nNumber of Cache lines: {}\nTotal Memory: {}\nTotal Cache Line Eff: {d:.2}%\n", .{ data.lines, data.lines * getCacheLineSize(), data.efficiency });
    std.debug.print("\n---------\n", .{});
}
2 Likes

Given the current situation, I code as though Parameter Reference Optimization does not exist. C coders have a simple rule of thumb: the major C compilers all optimize passing a two-word struct using registers, so if it’s a two-word struct, or less, I pass by copy. Otherwise by const *. Sometimes three words, like an unmanaged ArrayList I will often pass by reference. Nothing larger though.

I can’t guarantee this is optimal. What it is, is a simple rule, which I can almost mechanically follow when writing code.

Trying to determine what’s actually optimal while bringing it into existence is itself suboptimal. Once a) the code is correct and b) I’ve determined that I need some specific part to be faster, then I might start looking at calling conventions, at least if they’re in code which I know is hot.

Consider, for one important example: the compiler might inline the call, and then it literally can’t matter because you’re not actually passing anything.

3 Likes