Problem with insertSlice in ArrayList

I wrote about this a couple days ago, but I accidently cut out part of the explanation. Oops, my bad. This is my second shot at this.
There’s an inefficiency in insertSlice of std.ArrayList. This is the current implementation:

pub fn insertSlice(self: *Self, i: usize, items: []const T) Allocator.Error!void {
    try self.ensureUnusedCapacity(items.len);
    self.items.len += items.len;

    mem.copyBackwards(T, self.items[i + items.len .. self.items.len], self.items[i .. self.items.len - items.len]);
    @memcpy(self.items[i..][0..items.len], items);
}

The problem is that if ensureUnusedCapacity triggers a resize, it will copy all the elements to the new allocation. inserSlice will then copy a slice of these same items to the end of the allocation, to make space for the insertion. It’s better to copy everything to their final place at once. Here’s my implementation:

/// Insert slice `items` at index `i` by moving `list[i .. list.len]` to make room.
/// This operation is O(N).
/// Invalidates pointers if additional memory is needed.
pub fn insertSlice(
    self: *Self,
    index: usize,
    items: []const T,
) Allocator.Error!void {
    const dst = try self.addManyAtIndex(
        index,
        items.len,
    );
    @memcpy(dst, items);
}

/// Resize the array, adding `count` new elements at position `index`, which have `undefined` values.
/// The return value is a slice pointing to the newly allocated elements. The returned pointer
/// becomes invalid when the list is resized. Resizes list if self.capacity is not large enough.
pub fn addManyAtIndex(
    self: *Self,
    index: usize,
    count: usize,
) Allocator.Error![]T {
    const new_len = self.items.len + count;
    const to_move = self.items[index..];
    if (self.capacity >= new_len) {
        //There is enough space
        self.items.len = new_len;
        mem.copyBackwards(
            T,
            self.items[index + count ..],
            to_move,
        );
        return self.items[index..][0..count];
    } else {
        // Here we avoid copying allocated but unused bytes by
        // attempting a resize in place, and falling back to allocating
        // a new buffer and doing our own copy. With a realloc() call,
        // the allocator implementation would pointlessly copy our
        // extra capacity.
        const old_memory = self.allocatedSlice();
        if (self.allocator.resize(old_memory, new_len)) {
            self.capacity = new_len;
            self.items.len = new_len;
            mem.copyBackwards(
                T,
                self.items[index + count ..],
                to_move,
            );
            return self.items[index..][0..count];
        } else {
            // Need a new allocation. Calculate allocation size as
            // ensureTotalCapacity. We don't call ensureTotalCapacity because there
            // would be an unnecessary check if the capacity is enough (we already
            // know it's not).
            var better_capacity = new_len;
            while (true) {
                better_capacity +|= better_capacity / 2 + 8;
                if (better_capacity >= new_len) break;
            }
            const new_memory = try self.allocator.alignedAlloc(
                T,
                alignment,
                better_capacity,
            );
            @memcpy(
                new_memory[0..index],
                self.items[0..index],
            );
            @memcpy(
                new_memory[index + count ..][0..to_move.len],
                to_move,
            );
            self.allocator.free(old_memory);
            self.items.ptr = new_memory.ptr;
            self.items.len = new_len;
            self.capacity = new_memory.len;
            return new_memory[index..][0..count];
        }
    }
}

The comment about realloc was copied straight from ensureTotalCapacityPrecise. It’s curious that the problem they alluded there is the same problem in the current implementation of insertSlice.
Also, while implementing my own version, I noticed the tests for insertSlice don’t cover the case where the ArrayList needs to resize. If you comment out the entire else branch of the first “if” (“if (self.capacity >= new_len)”), all tests still pass.

2 Likes

Pretty cool and better explained than last time ;D. I think this would be a useful improvement to the standard library.
Have you considered making a pull request?

Also a few things I spotted:

self.allocator.resize(old_memory, new_len)

This kind of goes against the resizing strategy of the ArrayList.
Check out ensureTotalCapacity: It tries to increase the capacity by at least 1.5×. This reduces the number of allocations when the array is often resized by a small amount. Imagine adding lots of small arrays in a row.

            self.items.len = new_len;
            mem.copyBackwards(
                T,
                self.items[index + count ..],
                to_move,
            );
            return self.items[index..][0..count];

This exact code appears twice in your function. You could probably combine their conditions by adding a resizeCapacity method or something like that that returns true if a resize was made or the capacity was already sufficient.

2 Likes

I considered making a pull request, but I didn’t know if Zig is taking pull requests from random people. For a large project like this, even looking at every code that gets submitted could be a daunting task, specially if half the explanation is missing ;D. But I could do that if there’s enough interest.

About resize(), I copied that part straight from ensureTotalCapacityPrecise. resize is much less expansive than realloc, because it can only expand the current allocation, it doesn’t move data around. If it cannot do that, it returns false. In that case, you want to request as little memory as possible, to increase the chances that the allocation can, indeed, be expanded in place.

I haven’t gone through the code so I’m just answering to this part here: feel free to open a PR, we do accept PRs from new people.

If you want some more technical feedback before (or after) submitting – which is in any case totally optional – the big Zig discord server has a channel dedicate to working on the stdlib.

What @IntegratedQuantum was getting at is that the better_capacity calculation should be moved up before the resize. Something like:

        const old_memory = self.allocatedSlice();
        var better_capacity = new_len;
        while (true) {
            better_capacity +|= better_capacity / 2 + 8;
            if (better_capacity >= new_len) break;
        }
        if (self.allocator.resize(old_memory, better_capacity)) {
            self.capacity = better_capacity;
            self.items.len = new_len;
1 Like

I get it, whenever we need to expand, and we have the freedom to choose the new capacity, we expand according to the formula. This would make the behavior of insertSlice match up with append.
I’m not entirely conviced that this approach leads to better performance, though. I understand amortization of allocation cost, but the resize here (and in append) is an important optimization. If the resize actually takes place (resize returns true), we avoid having to move all of the data. And the smallest the memory that we request, the greater the chance that the resize actually takes place. If we use the formula for better_capacity when requesting the resize, we would be requesting more memory than what we need at this moment, which increases the chance of resize returning false, which, in turn, would lead to the worst case branch being taken (full blown new allocation, followed by transfering everything over).
Ultimately, what we want, both in insertSlice() and all the variations of add* and append*, is to tell the allocator that we want as much as possible to keep the allocation in place, we need a minimum of “x” amount of memory, but the ideal would be to get “y” amount of memory. Something like resize, but with an extra parameter like:

/// Expands the allocation size to at least `minimum`, 
/// without moving the allocation. 
/// Tries to get the size of the allocation as close to
/// `ideal` as possible.
/// If the minimum size cannot be reached without 
/// moving the allocation, returns null.
fn resizeMin(
    allocator: Allocator, 
    currentAllocation: []T, 
    minimum: usize, 
    ideal: usize,
) ?[]T

This would be the best of both worlds, as we could call allocator.resizeMin(old_memory, new_len, better_capacity). We still preserve the optimization of resize, when it’s possible, but we still try to maintain as close as possible our memory expansion policy.
Anyways, in the abscence of this ideal allocation function, I think the best policy for memory expansion in ArrayList is to try to trigger resize as much as possible, by requesting the smallest allocation possible, and only use the formula for adjusting the allocation size when we are forced to create a whole new allocation.
But I get that that’s more complicated, as it requires rethinking the class as whole, and all of its use cases. If we just want the behaviour of insertSlice to match up with append, then your suggestion would indeed be the best option.

I see what you’re saying but I think better_capacity is essentially trying to do the same thing from a different angle. Instead of making the resize less likely to return false, by using better_capacity it makes it more likely that self.capacity >= new_len is true on the next insertSlice call(s), which would avoid the resize call entirely. So, it’s a potentially larger cost when actually needing to resize, but that should be evened out by then having the capacity to avoid needing to resize in the future.

In any case, we’re getting into the territory where this would need to be thoroughly benchmarked to come to any real conclusions.

3 Likes

Benchmarking is a good idea at that point. There’s a long legacy of containers that are built with exactly what @squeek502 mentioned - “growth factor”. A well known (and mildly analogous) example of this is std::vector from the C++ standard.

That said, I think @LucasSantos91, my experience with Zig has been very community friendly towards PR’s. I can’t speak for github, but I think in general the standard library can use all the feedback it can get at this point :slight_smile:

1 Like

Thanks everyone for your input. I’ll submit a pull request in a few days and update this thread.

1 Like

It’s my first time ever writing a pull request. Let’s see what the maintainers have to say.

3 Likes

MultiArrayList.insert has the same inefficiency. Here’s the implementation:

pub fn insert(self: *Self, gpa: Allocator, index: usize, elem: T) !void {
    try self.ensureUnusedCapacity(gpa, 1);
    self.insertAssumeCapacity(index, elem);
}

I might try tackling this as well, although the logic in MultiArrayList is a bit more involved than simple ArrayList.

Congrats on the merge!

5 Likes

Thanks! After 15 years of programming alone, this was the first time anyone has ever looked at a line of code I’ve written. It was nice.

8 Likes

Good work!