Modifying an ArrayList while iterating

Hi Folks,

I was just bitten by a bug in my code around adding to an ArrayList while iterating it. The add was a couple of layers down and I wrote the two parts at different times and there was refactoring involved. It was embarrassing nonetheless.

Coming from a java background, I am also used to a bit more handholding (Java will throw an exception if you tried that).

Is this just a case of git gud? or is there an intention, perhaps later down the road to provide some dev mode rails to help detect some of these “common mistakes?”

Perhaps there are other pattterns that I am not aware of instead of iterating through .items that are a bit safer?

You could write an iterator that has a length field equal to the length of the ArrayList? Then, with each iteration, it could assert that the length in the ArrayList didn’t change.

4 Likes

Adding to an array list while iterating over it is fine as long as items doesn’t get reallocated to a different chunk of memory.
To ensure this doesn’t happen you can use ensureTotalCapacity()/ensureUnusedCapacity() and appendAssumeCapacity() if you already know an upper bound on how many items you’re going to add or check if capacity > items.len before every append and cancel your iteration if it isn’t.

Something like this should work too:

var list = std.ArrayListUnmanaged(u8).empty;
// fill list…
var i: usize = 0;
while (i < list.items.len) : (i += 1) {
    std.debug.print("{d}\n", .{list.items[i]});
    if (cond) {
        try list.append(allocator, 0);
    }
}

because you’re refreshing the version of items you’re iterating over after every loop/accessing items directly.

5 Likes
5 Likes

That’s useful. I guess I could also conditionally compime away the check in prod builds so it doesn’t impact performance.

That’s what I ended up doing (ensureUnusedCapacity because I knew an upper bound of how many could be added) - once I found the issue.

I hadn’t thought about using an unmanaged ArrayList. If I use a while loop with a managed ArrayList, would the effect be the same? Or does the unmanaged list handle it a bit differently?

Also, @squeek502, thanks for those links - being able to (un)lock the arraylist would be a helpful guard :slight_smile:

The managed version is the same as the unmanaged one the only difference is that it stores an Allocator.
I do think that ArrayListUnmanaged has a more clear API, e.g. appendAssumeCapacity() doesn’t even take an Allocator param (in contrast to append()) so you can be sure that there will be no reallocs/pointer invalidations just from looking at the function signature.

3 Likes

That’s a good point, though I guess the balance is that caller then needs to hold or have acess to an allocator.

Thinking about it a bit more, I think there is another nuance here that could have caught me out. Long story short, within the loop I call a method on the item in the list.

list.items[i].doX()

This method can increment a field in the item, and can also add an item to the list.

I am guessing that as long I update the item before adding an item, that the update would work.

If I add to the array, then update the item, there is a possiblity that my item wouldn’t be preserved? In my mind I see the item that is executing the code being in “dead memory” as soon as the ArrayList that it is in is resized.

I wrote a minimal test case

const std = @import("std");

const Item = struct {
    count: u32 = 0,
    inv: *Inventory,

    fn countThenAdd(self: *Item) !void {
        self.count += 1;
        try self.inv.items.append(Item{
            .inv = self.inv,
        });
    }

    fn addThenCount(self: *Item) !void {
        try self.inv.items.append(Item{
            .inv = self.inv,
        });
        self.count += 1;
    }
};

const Inventory = struct {
    const ItemArray = std.ArrayList(Item);
    items: ItemArray,
};

test "count then add" {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();

    var inv = Inventory{
        .items = Inventory.ItemArray.init(allocator),
    };

    try inv.items.append(Item{
        .inv = &inv,
    });

    var i: usize = inv.items.items.len;
    while (i > 0) {
        i -= 1;
        try inv.items.items[i].countThenAdd();
    }

    try std.testing.expectEqual(2, inv.items.items.len);
    try std.testing.expectEqual(1, inv.items.items[0].count);
}

test "add, then count" {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();

    var inv = Inventory{
        .items = Inventory.ItemArray.init(allocator),
    };

    try inv.items.append(Item{
        .inv = &inv,
    });

    var i: usize = inv.items.items.len;
    while (i > 0) {
        i -= 1;
        try inv.items.items[i].addThenCount();
    }

    try std.testing.expectEqual(2, inv.items.items.len);
    try std.testing.expectEqual(1, inv.items.items[0].count);
}

When I tried this out, both test cases passed - but looking into the code for append, ensureTotalCapacityPrecise is called, which calls remap on the allocator.

remap will not always re-allocate memory, and I feel that the case of adding then updating will not retain the new value if memory is remapped.

Is my assessment correct? Java, go etc. would circumvent this since it’s always pointers that are stored. I could also work around this by using pointers - right?

Is it just a case of being a little more careful when using structs directly in collections?

Thank you for helping me understand how these things hang together :slight_smile:

Yes you are correct, if you accidentaly invalidate the memory with append() and then try to write to it you will get illegal behaviour. However you cannot work around this using pointers because once the memory is invalidated any pointers to it are invalid too.

I would suggest using indices to address your items and moving your functions to the owner of the memory (Inventory) like this:

const std = @import("std");

const Item = struct {
    count: u32 = 0,
};

const Inventory = struct {
    const ItemArray = std.ArrayList(Item);
    items: ItemArray,

    // both orders work

    pub fn countThenAdd(inv: *Inventory, item_idx: u32) !void {
        try inv.items.append(.{});
        inv.items.items[item_idx].count += 1;
    }

    pub fn addThenCount(inv: *Inventory, item_idx: u32) !void {
        inv.items.items[item_idx].count += 1;
        try inv.items.append(.{});
    }
};

This way you don’t have to worry about memory invalidations and you save some memory as well because the items don’t need to know about where they are stored.

1 Like

Back in the olden days when I still wrote C++ code my custom container classes used to have a Lock/Unlock method pair, and trying to mutate the container while it was locked would be a hard error.

You just had to think of locking/unlocking the container when looping over its content. And it was mostly useful when the container was mutated somewhere deep down the callstack inside a toplevel loop.

2 Likes

Yeah that’s probably a more straightforward solution. I guess it’s similiar to the planned pointer stability lock mentioned earlier:

2 Likes

didn’t see it mentioned and not sure if it fits the use case, but if you need to remove elements while iterating its safe to use swapRemove() and skip incrementing. something like this:

const std = @import("std");
test {
    var l = std.ArrayListUnmanaged(u8){};
    try l.appendSlice(std.testing.allocator, &.{ 0, 1, 2, 3 });
    var i: usize = 0;
    while (i < l.items.len) {
        if (l.items[i] > 1) {
            _ = l.swapRemove(i);
        } else i += 1;
    }
    try std.testing.expectEqualSlices(u8, &.{ 0, 1 }, l.items);
}
2 Likes

Or you can use linked-list, ie. with MemoryPool(T). The only thing you need to watch for is removing current node (which can be hard to spot if you allow recursion). Either you need to fail then, or add a flag (to_be_removed) and do cleanup when the traversal is complete. Any other insertions/removals should be fine (if you don’t mind that currently executed loops will see more/less there was at the beginning).

@Justus2308, thank your answering my questions.

I should have clarified that I meant storing pointers to items created on the heap.

This suggestion makes sense. In my actual code, it’s a queued craft which on completion can unlock a new recipe, which can be auomatically added to the crafting queue.

The counter is the number of items the craft has been completed. Important since unlocks only happen on the first craft.

Coming from an OOP background, it feels like this logic sits with with the craft rather than the Factories which has the loop.

Which leads me to an off-topic question. How much is the encapsulation of logic relevant in zig, compared to Java/Go etc?

This is a good idea and I am looking forward to (un)lock getting merged in.

Excellent point. I will need to do removals later - but I can do a post-loop removal as you suggest, which I won’t need on the array if I’m iterating backwards.

I don’t think I need index based lookups, so a list might be a good option - I shall ponder it

Thank you

I just did something like that recently, here’s the commit if you’re interested, and of course if anybody finds a mistake, thanks for reporting :slight_smile:

BTW: I don’t consider myself to be expert or anything.

1 Like

Well, you kind of can, although I wouldn’t solve this problem that way.

What I mean is, instead of copying the slice with array_list.items, you could take a pointer to the slice, &array_list.items, use a numeric for loop, and get each cell of the slice from the pointer.

Here, that’s probably a bad idea, but it’s worth bearing in mind: pointer invalidation won’t invalidate the address of the pointer, so an extra level of indirection can prevent it from being a problem.

Maybe I shouldn’t have brought this up at all. A pointer to a pointer pins memory, and that’s what a slice is, a fat pointer. It’s imperative to at least try talking yourself out of this kind of thing before deciding it’s the way forward.

Remember: every problem can be solved with another level of indirection, except for the problems caused by having another level of indirection.

3 Likes