Calling methods of struct instances in MultiArrayList

Durobot · November 8, 2024, 1:11pm

Is it possible to call methods (member functions) of the structs stored in MultiArrayList?

My understanding is that MultiArrayList does not store struct instances in their intact form, taking them apart field by field and storing values of each field in a separate continuous memory block (all the values of field foo in one block, all the values of field bar in another block, all the values of field baz in the third block, etc.). Which is the whole point of MultiArrayList, compared to ArrayList.

How, then, do I call methods of the struct instances I have appended to the MultiArrayList?
All I could come up with is this:

const std = @import("std");

const Foo = struct
{
    ham: u32 = 12,
    eggs: f32 = 3.14159265359,

    pub fn print(self: Foo) void
    {
        std.debug.print("ham = {}, eggs = {d}\n", .{ self.ham, self.eggs });
    }
};

pub fn main() !void
{
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocr = gpa.allocator();

    var my_list = std.MultiArrayList(Foo) {};
    defer my_list.deinit(allocr);

    try my_list.append(allocr, .{});
    try my_list.append(allocr, .{ .ham = 1, .eggs = 0.001 });

    // Let's call `print` for each element of `my_list`
    for (my_list.items(.ham), my_list.items(.eggs)) |my_ham, my_eggs|
    {
        const my_foo = Foo { .ham = my_ham, .eggs = my_eggs }; // I'm making a new instance of Foo here
        my_foo.print(); // Just to call `print`
    }
    // What if I had to call a method that modifies `Foo`'s fields?
}

But if a method could (potentially) modify the fields of the instance of Foo it is called upon, I would have to manually copy field values from my_foo to each field in the current my_list item, like this:

    for (my_list.items(.ham), my_list.items(.eggs)) |*my_ham, *my_eggs|
    {
        var my_foo = Foo { .ham = my_ham.*, .eggs = my_eggs.* };
        my_foo.modifyFields();
        my_ham.* = my_foo.ham; // Copy the modified values back to the fields of
        my_eggs.* = my_foo.eggs; // this item in `my_list`
    }

Is this the only way?

swenninger · November 8, 2024, 1:34pm

https://ziglang.org/documentation/master/std/#std.MultiArrayList

MultiArrayList does provide get and set functions for getting, modifying, and then setting individual items. But these are just generic implementations of your manually implemented “copy T, modify T, set T” functions.

If I see it correctly, there is no other way, due to the way the memory is stored.

andrewrk · November 8, 2024, 7:05pm

To add to this answer - you’ll get more benefits of storing your data this way if you avoid accessing all the fields when you don’t need to. For example, if your data transformation involves only reading and writing a subset of the fields, I recommend to avoid loading and storing the unused fields. This keeps that entire array out of the CPU cache which is one of the primary purposes of this data structure.

If your data transformations involve all the fields of the struct, then you would perhaps be better off storing each struct in a regular array, so that those data transformations involve memory all stored together.

Durobot · November 9, 2024, 6:53am

Yes, this is exactly what I was thinking about - if I know for certain that fn Foo.modifyFields only modifies .ham, and not .eggs, I could cheat a little and not write my_foo.eggs back into the element of my_list.

Likewise, if I’m sure fn Foo.modifyFields does not depend on the value of .ham, only using the value of .eggs, I could set my_foo.ham to an arbitrary value when initializing my_foo.

However, such optimizations could potentially lead to hard to find bugs in the future, if the behavior of fn Foo.modifyFields changes.

For example, in a year from now, somebody (not necessarily the initial developer) changes fn Foo.modifyFields so that it starts to depend upon the value of .ham, or writes to .eggs.

Cryptic bugs would start popping up in seemingly unrelated parts of the program. Even worse if fn Foo.modifyFields and loops that call it and have optimizations like the ones I have mentioned above reside in a library and in an application that uses this library.

So, this is a bit dangerous, and I’m not entirely sure I’d want to use such optimized code on production.

Using get and set functions of MultiArrayList is less optimal, but seems to be bulletproof in this regard.

Sze · November 9, 2024, 5:21pm

I think if you want to use a MultiArrayList you shouldn’t try to reuse the methods of the Foo type instead you should have functions that directly operate on specific fields of the MultiArrayList without first constructing Foo instances.

If you aren’t willing to do that then there isn’t really a reason to use MultiArrayList at all, because you are throwing away any performance gains it could give you, so in that case just use an ArrayList instead.
If you have only a few such calls that you aren’t willing to create functions for, then the get, set functions of MultiArrayList are the solution to bridge that gap.

Most of the time the types that I use within MultiArrayList are just bundles of data without any methods, that is one way you can avoid wanting to use methods of that type (by not having any).

MultiArrayList is for operating on specific fields of groups of instances all at once, it doesn’t make sense to reconstruct and go back to individual instances, within the loop, that turns SoA back to AoS (with probably worse performance). You need to drop the individual element thinking and organize your code differently if you want to use SoA and get its benefits.

andrewrk · November 9, 2024, 8:26pm

I second everything said by @Sze and also here’s an exercise that I think could be enlightening:

For some codebase that you’re willing to experiment with, try changing the hot methods into functions that accept exactly the set of fields as parameters that they access. By giving up method call syntax for a period of time, it helps to see explicitly what the data dependencies are.

Example commit from the Zig compiler.

If you do this and think carefully, you can often realize a more efficient way to organize the code, since it’s more obvious how the data is flowing through the application.

Durobot · November 10, 2024, 6:11am

@Sze , @andrewrk

My main use case is iterating over MultiArrayList once per frame. The member function I need to call once in a blue moon is deinit - as you can imagine I can don’t care that much if deinitialization (release of dynamically allocated strings) is slower.

I think for occasional operations using get and set is fine.

Changing the methods into functions that accept exactly the set of fields as parameters that they access may be a viable solution, but it has two limitations I can think of:

(1) The list of parameters could be quite long, to the point of being unmanageable or negatively affecting the performance (probably);

(2) You lose the ability to modify fields, unless you’re willing either (a) to pass pointers to the fields that are modified, then on the calling side copy the values back, or (b) return an instance of the struct from the function.

So, basically, if you store your structs in MultiArrayList, you’re then limited on the use of methods, I get it. That’s the tradeoff for faster iteration, and when you iterate 60 times every second, it’s totally worth it.

andrewrk · November 10, 2024, 7:17am

You probably don’t need each element to independently allocate resources. Better to have each element allocate resources out of a pool that is managed independently. Then you don’t need to deinit every element.

Durobot · November 10, 2024, 7:57am

This is something to consider.
Thanks!