Problem with insertSlice in ArrayList

LucasSantos91 · September 18, 2023, 7:26pm

I wrote about this a couple days ago, but I accidently cut out part of the explanation. Oops, my bad. This is my second shot at this.
There’s an inefficiency in insertSlice of std.ArrayList. This is the current implementation:

pub fn insertSlice(self: *Self, i: usize, items: []const T) Allocator.Error!void {
    try self.ensureUnusedCapacity(items.len);
    self.items.len += items.len;

    mem.copyBackwards(T, self.items[i + items.len .. self.items.len], self.items[i .. self.items.len - items.len]);
    @memcpy(self.items[i..][0..items.len], items);
}

The problem is that if ensureUnusedCapacity triggers a resize, it will copy all the elements to the new allocation. inserSlice will then copy a slice of these same items to the end of the allocation, to make space for the insertion. It’s better to copy everything to their final place at once. Here’s my implementation:

/// Insert slice `items` at index `i` by moving `list[i .. list.len]` to make room.
/// This operation is O(N).
/// Invalidates pointers if additional memory is needed.
pub fn insertSlice(
    self: *Self,
    index: usize,
    items: []const T,
) Allocator.Error!void {
    const dst = try self.addManyAtIndex(
        index,
        items.len,
    );
    @memcpy(dst, items);
}

/// Resize the array, adding `count` new elements at position `index`, which have `undefined` values.
/// The return value is a slice pointing to the newly allocated elements. The returned pointer
/// becomes invalid when the list is resized. Resizes list if self.capacity is not large enough.
pub fn addManyAtIndex(
    self: *Self,
    index: usize,
    count: usize,
) Allocator.Error![]T {
    const new_len = self.items.len + count;
    const to_move = self.items[index..];
    if (self.capacity >= new_len) {
        //There is enough space
        self.items.len = new_len;
        mem.copyBackwards(
            T,
            self.items[index + count ..],
            to_move,
        );
        return self.items[index..][0..count];
    } else {
        // Here we avoid copying allocated but unused bytes by
        // attempting a resize in place, and falling back to allocating
        // a new buffer and doing our own copy. With a realloc() call,
        // the allocator implementation would pointlessly copy our
        // extra capacity.
        const old_memory = self.allocatedSlice();
        if (self.allocator.resize(old_memory, new_len)) {
            self.capacity = new_len;
            self.items.len = new_len;
            mem.copyBackwards(
                T,
                self.items[index + count ..],
                to_move,
            );
            return self.items[index..][0..count];
        } else {
            // Need a new allocation. Calculate allocation size as
            // ensureTotalCapacity. We don't call ensureTotalCapacity because there
            // would be an unnecessary check if the capacity is enough (we already
            // know it's not).
            var better_capacity = new_len;
            while (true) {
                better_capacity +|= better_capacity / 2 + 8;
                if (better_capacity >= new_len) break;
            }
            const new_memory = try self.allocator.alignedAlloc(
                T,
                alignment,
                better_capacity,
            );
            @memcpy(
                new_memory[0..index],
                self.items[0..index],
            );
            @memcpy(
                new_memory[index + count ..][0..to_move.len],
                to_move,
            );
            self.allocator.free(old_memory);
            self.items.ptr = new_memory.ptr;
            self.items.len = new_len;
            self.capacity = new_memory.len;
            return new_memory[index..][0..count];
        }
    }
}

The comment about realloc was copied straight from ensureTotalCapacityPrecise. It’s curious that the problem they alluded there is the same problem in the current implementation of insertSlice.
Also, while implementing my own version, I noticed the tests for insertSlice don’t cover the case where the ArrayList needs to resize. If you comment out the entire else branch of the first “if” (“if (self.capacity >= new_len)”), all tests still pass.

IntegratedQuantum · September 18, 2023, 8:10pm

Pretty cool and better explained than last time ;D. I think this would be a useful improvement to the standard library.
Have you considered making a pull request?

Also a few things I spotted:

self.allocator.resize(old_memory, new_len)

This kind of goes against the resizing strategy of the ArrayList.
Check out ensureTotalCapacity: It tries to increase the capacity by at least 1.5×. This reduces the number of allocations when the array is often resized by a small amount. Imagine adding lots of small arrays in a row.

            self.items.len = new_len;
            mem.copyBackwards(
                T,
                self.items[index + count ..],
                to_move,
            );
            return self.items[index..][0..count];

This exact code appears twice in your function. You could probably combine their conditions by adding a resizeCapacity method or something like that that returns true if a resize was made or the capacity was already sufficient.

LucasSantos91 · September 18, 2023, 9:15pm

I considered making a pull request, but I didn’t know if Zig is taking pull requests from random people. For a large project like this, even looking at every code that gets submitted could be a daunting task, specially if half the explanation is missing ;D. But I could do that if there’s enough interest.

About resize(), I copied that part straight from ensureTotalCapacityPrecise. resize is much less expansive than realloc, because it can only expand the current allocation, it doesn’t move data around. If it cannot do that, it returns false. In that case, you want to request as little memory as possible, to increase the chances that the allocation can, indeed, be expanded in place.

kristoff · September 18, 2023, 9:29pm

I haven’t gone through the code so I’m just answering to this part here: feel free to open a PR, we do accept PRs from new people.

If you want some more technical feedback before (or after) submitting – which is in any case totally optional – the big Zig discord server has a channel dedicate to working on the stdlib.

squeek502 · September 18, 2023, 9:38pm

What @IntegratedQuantum was getting at is that the better_capacity calculation should be moved up before the resize. Something like:

        const old_memory = self.allocatedSlice();
        var better_capacity = new_len;
        while (true) {
            better_capacity +|= better_capacity / 2 + 8;
            if (better_capacity >= new_len) break;
        }
        if (self.allocator.resize(old_memory, better_capacity)) {
            self.capacity = better_capacity;
            self.items.len = new_len;

LucasSantos91 · September 18, 2023, 11:26pm

I get it, whenever we need to expand, and we have the freedom to choose the new capacity, we expand according to the formula. This would make the behavior of insertSlice match up with append.
I’m not entirely conviced that this approach leads to better performance, though. I understand amortization of allocation cost, but the resize here (and in append) is an important optimization. If the resize actually takes place (resize returns true), we avoid having to move all of the data. And the smallest the memory that we request, the greater the chance that the resize actually takes place. If we use the formula for better_capacity when requesting the resize, we would be requesting more memory than what we need at this moment, which increases the chance of resize returning false, which, in turn, would lead to the worst case branch being taken (full blown new allocation, followed by transfering everything over).
Ultimately, what we want, both in insertSlice() and all the variations of add* and append*, is to tell the allocator that we want as much as possible to keep the allocation in place, we need a minimum of “x” amount of memory, but the ideal would be to get “y” amount of memory. Something like resize, but with an extra parameter like:

/// Expands the allocation size to at least `minimum`, 
/// without moving the allocation. 
/// Tries to get the size of the allocation as close to
/// `ideal` as possible.
/// If the minimum size cannot be reached without 
/// moving the allocation, returns null.
fn resizeMin(
    allocator: Allocator, 
    currentAllocation: []T, 
    minimum: usize, 
    ideal: usize,
) ?[]T

This would be the best of both worlds, as we could call allocator.resizeMin(old_memory, new_len, better_capacity). We still preserve the optimization of resize, when it’s possible, but we still try to maintain as close as possible our memory expansion policy.
Anyways, in the abscence of this ideal allocation function, I think the best policy for memory expansion in ArrayList is to try to trigger resize as much as possible, by requesting the smallest allocation possible, and only use the formula for adjusting the allocation size when we are forced to create a whole new allocation.
But I get that that’s more complicated, as it requires rethinking the class as whole, and all of its use cases. If we just want the behaviour of insertSlice to match up with append, then your suggestion would indeed be the best option.

squeek502 · September 18, 2023, 11:44pm

I see what you’re saying but I think better_capacity is essentially trying to do the same thing from a different angle. Instead of making the resize less likely to return false, by using better_capacity it makes it more likely that self.capacity >= new_len is true on the next insertSlice call(s), which would avoid the resize call entirely. So, it’s a potentially larger cost when actually needing to resize, but that should be evened out by then having the capacity to avoid needing to resize in the future.

In any case, we’re getting into the territory where this would need to be thoroughly benchmarked to come to any real conclusions.

AndrewCodeDev · September 19, 2023, 2:04am

Benchmarking is a good idea at that point. There’s a long legacy of containers that are built with exactly what @squeek502 mentioned - “growth factor”. A well known (and mildly analogous) example of this is std::vector from the C++ standard.

That said, I think @LucasSantos91, my experience with Zig has been very community friendly towards PR’s. I can’t speak for github, but I think in general the standard library can use all the feedback it can get at this point

LucasSantos91 · September 19, 2023, 3:35am

Thanks everyone for your input. I’ll submit a pull request in a few days and update this thread.

LucasSantos91 · September 28, 2023, 12:42am

github.com/ziglang/zig

Fix inefficiency with ArrayList.insertSlice

ziglang:master ← LucasSantos91:master

opened 12:41AM - 28 Sep 23 UTC

LucasSantos91

+174 -30

There’s an inefficiency in insertSlice of std.ArrayList. This is the current imp…lementation: ```zig pub fn insertSlice(self: *Self, i: usize, items: []const T) Allocator.Error!void { try self.ensureUnusedCapacity(items.len); self.items.len += items.len; mem.copyBackwards(T, self.items[i + items.len .. self.items.len], self.items[i .. self.items.len - items.len]); @memcpy(self.items[i..][0..items.len], items); } ``` The problem is that if ensureUnusedCapacity triggers a resize, it will copy all the elements to the new allocation. insertSlice will then copy a slice of these same items to the end of the allocation, to make space for the insertion. It’s better to copy everything to their final place at once. This pull request introduces a new function to the public interface of ArrayList / ArrayListUnmanaged: ```zig /// Resize the array, adding `count` new elements at position `index`, which have `undefined` values. /// The return value is a slice pointing to the newly allocated elements. The returned pointer /// becomes invalid when the list is resized. Resizes list if self.capacity is not large enough. pub fn addManyAtIndex( self: *Self, allocator: Allocator, index: usize, count: usize, ) Allocator.Error![]T ``` This function makes space for a new slice of items at any position in the list. As other add* functions, the items are left unitialized. With this, it is trivial to reimplement insertSlice. addManyAtIndex solves the inefficiency mentioned earlier by adapting code from ensureTotalCapacityPrecise and the old insertSlice. It manually grows the memory, if necessary, and copy everything only once. In the process of implementing this function, I noticed none of the tests for insertSlice covered the case where the ArrayList needs to grow the capacity, so I wrote one.

It’s my first time ever writing a pull request. Let’s see what the maintainers have to say.

LucasSantos91 · September 29, 2023, 1:45am

MultiArrayList.insert has the same inefficiency. Here’s the implementation:

pub fn insert(self: *Self, gpa: Allocator, index: usize, elem: T) !void {
    try self.ensureUnusedCapacity(gpa, 1);
    self.insertAssumeCapacity(index, elem);
}

I might try tackling this as well, although the logic in MultiArrayList is a bit more involved than simple ArrayList.

kristoff · September 30, 2023, 1:47pm

github.com/ziglang/zig

Fix inefficiency with ArrayList.insertSlice

ziglang:master ← LucasSantos91:master

opened 12:41AM - 28 Sep 23 UTC

LucasSantos91

+174 -30

There’s an inefficiency in insertSlice of std.ArrayList. This is the current imp…lementation: ```zig pub fn insertSlice(self: *Self, i: usize, items: []const T) Allocator.Error!void { try self.ensureUnusedCapacity(items.len); self.items.len += items.len; mem.copyBackwards(T, self.items[i + items.len .. self.items.len], self.items[i .. self.items.len - items.len]); @memcpy(self.items[i..][0..items.len], items); } ``` The problem is that if ensureUnusedCapacity triggers a resize, it will copy all the elements to the new allocation. insertSlice will then copy a slice of these same items to the end of the allocation, to make space for the insertion. It’s better to copy everything to their final place at once. This pull request introduces a new function to the public interface of ArrayList / ArrayListUnmanaged: ```zig /// Resize the array, adding `count` new elements at position `index`, which have `undefined` values. /// The return value is a slice pointing to the newly allocated elements. The returned pointer /// becomes invalid when the list is resized. Resizes list if self.capacity is not large enough. pub fn addManyAtIndex( self: *Self, allocator: Allocator, index: usize, count: usize, ) Allocator.Error![]T ``` This function makes space for a new slice of items at any position in the list. As other add* functions, the items are left unitialized. With this, it is trivial to reimplement insertSlice. addManyAtIndex solves the inefficiency mentioned earlier by adapting code from ensureTotalCapacityPrecise and the old insertSlice. It manually grows the memory, if necessary, and copy everything only once. In the process of implementing this function, I noticed none of the tests for insertSlice covered the case where the ArrayList needs to grow the capacity, so I wrote one.

Congrats on the merge!

LucasSantos91 · September 30, 2023, 5:21pm

Thanks! After 15 years of programming alone, this was the first time anyone has ever looked at a line of code I’ve written. It was nice.

AndrewCodeDev · September 30, 2023, 10:35pm

Good work!