How to manipulate arrays containing very large item sizes?

My usual preference is to use fixed-size arrays for everything, and allocate structs on the heap if they’re large. Recently I’ve been dealing with heap-allocated structs that themselves contain arrays, where the individual array type is extremely large (i.e. 800k).

This is causing issues where the normal BoundedArray functions for manipulating the array (like to remove an item) result in a segfault, as it assumes the item will fit on the stack.

I considered switching to something like ArrayList and let it allocate items as needed, but looking at the code it would likely have the same problem, as the manipulation of individual items still takes place on the stack.

Is there any facility in the standard library for manipulating arrays containing very large items?

The only alternative I can think of is to break down the large items and have their components allocated on the heap, but then I’m dealing with heap allocations within heap allocations, and the cleanup of that sounds messy…

I think I am understanding your problem correctly - so let’s say we look at the remove function in ArrayList:

pub fn swapRemove(self: *Self, i: usize) T {
    if (self.items.len - 1 == i) return self.pop();

    // Nairou, in your case, this line causes a problem 
    // because "old_item" doesn't fit on the stack.
    const old_item = self.items[i];

    self.items[i] = self.pop();
    return old_item;
}

First, I’d point out that the standard library is not really geared towards special cases so it doesn’t surprise me that you’ll have to fabricate some utilities here.

The first thing I’d try in your situation is to use “handles” to the data that you have and manipulate the handles instead of the direct structures themselves. For instance, if I need to remove a handle to something, I pop it off an array and then free the item directly through the handle… but I’d stay away from anything that copies 800k every time you want to pop an item off of a list. Also, you’re already outside the size of a normal cache with these things, so an indirection isn’t exactly your worst problem here. At that point (and maybe I’m wrong), I see little value in having the actual data-bodies in a contiguous array.

It sounds like you need to distinguish between data handling and storage here and another layer between the two could be a good place to start.

This may not suit your use-case, but I’ll pitch an idea here just to get the ball rolling.

const LargeHeavyThing = struct {
    array: [10_000]usize,
};

const Handle = struct {
    ID: usize, // or whatever, really...
    data: *LargeHeavyThing,
    // maybe more stuff, but keep it light...
};

And at that point, you can fill up a list with your handles:

var handles = std.ArrayList(Handle).init(allocator);

// later...

const thing = try allocator.create(LargeHeavyThing);

// initialize the thing...

try handles.append(.{ .ID = 42, .data = thing });

// removing...

const old_handle = handles.swapRemove(idx);

// or you could cache it with a free-list
allocator.destroy(old_handle.data);
1 Like

Thanks for the response! I was afraid that would be the answer, but it does make sense. :slight_smile:

In that case, I know what piece should be broken out, but now I’m wondering how to safely manage the deinit case.

Here are the structs I’m dealing with in a bit more detail:

pub const Message = union(enum) {
    // many structs representing message types
};

pub const MessageBlock = struct {
    id: u32,
    message: Message,
    // ...
};

pub const Connection = struct {
    // ...
    incomingMessageBuffer: [4096]MessageBlock = undefined,
    outgoingMessageBuffer: [4096]MessageBlock = undefined,
    // ....
};

pub const Server = struct {    // Very large struct, allocated on the heap
    // ...
    connections: std.BoundedArray( Connection, 16 ) = .{},
    // ...
};

The vast bulk of the size comes from the Connection.incomingMessageBuffer and Connection.outgoingMessageBuffer arrays. It would be perfectly reasonable to replace both with slices, and allocate them on the heap during Connection init.

Which means Connection will need init/deinit. Init is simple enough, but are there any best practices to prevent forgetting to manually deinit the struct every time I’m ready to remove a Connection from the BoundedArray?

The main reason I originally used a BoundedArray to begin with (back with Connection was a much smaller struct :slight_smile: ) was to prevent inner allocations that could easily get forgotten.

As always - it depends. In your case, you may be able to get away with the statically sized buffers if you go the route of creating handles.

I’ve seen a lot of tricks over the years for this sort of thing… people can go crazy with it.

This may not be stellar advice, but I always recommend starting with something obvious and then complexifying as is necessary. Instead of trying to create something crazy, try to be explicit upfront until the pain of being explicit starts to become noticeable.

In your case, you could run into issues with switches and unions. Thankfully, you can handle that quite directly:

switch (u) { // your union
    inline else => |*x| x.deinit(),
}

Then that should enforce that every member of the union implements a deinit function.

If you forget to implement a deinit for one of the union members, you’ll get a handy compiler error:

example.zig:20:38: error: no field or member function named 'deinit' in 'Foo'

Here’s a real simple sketch:

const A = struct {
    pub fn deinit(_: *A) void { }
};

const B = struct {
    pub fn deinit(_: *B) void { }
};

const U = union(enum) {
    a: A, 
    b: B,
    pub fn deinit(self: *U) void {
        switch (self.*) {
            inline else => |*ptr| ptr.deinit(), // pass by pointer or else it can segfault
        }
    }
};
1 Like