How does ArrayList.items work?

I’m trying to learn more about std.ArrayList. I have this test code.

test "stdArrayList" {
    var a = std.ArrayList(i32).init(std.testing.allocator);
    defer a.deinit();

    try a.append(1);
    std.debug.print("items.len: {d}\n", .{a.items.len});
    std.debug.print("capacity: {d}\n", .{a.capacity});

    for (a.items) |n| {
        std.debug.print("{d}\n", .{n});
    }
}

I was surprised when the program printed this. I thought it would print 1, then a bunch of garbage because there are 7 unused spaces in the array.

a.items.len: 1
capacity: 8
1

But, OK. I’m guessing the for loop stops based on items.len.

Then I tried doing a[1] = 20 and my code blew up with index out of bounds. Which ok. I guess that makes sense, the len is 1.

But, now I’m confused. Where is the extra capacity stored?

I decided to try to read ArrayList | Zig Documentation. But, it’s kinda long and I’m very new to Zig…

From what I could gather, it seems like ArrayList does some funny business with items that I don’t quite understand.

I think I found where the extra capacity is stored, but I don’t quite understand how it works…

/// Returns a slice of all the items plus the extra capacity, whose memory
/// contents are `undefined`.
pub fn allocatedSlice(self: Self) Slice {
    // `items.len` is the length, not the capacity.
    return self.items.ptr[0..self.capacity];
}

I think I need to learn more about slices… is it possible to mess with the slice internals?

You can think of a slice as

const Slice = struct {
    ptr: [*]T,
    len: usize,
};

It is possible and perfectly okay to mutate the len field separately from the ptr field (and vice versa). Mutating len can be useful when you know the exact length of the region of memory pointed to by ptr externally. For ArrayList, capacity is what stores the actual length of the backing buffer, and it is updated whenever the backing buffer is reallocated.

This is what makes self.items.ptr[0..self.capacity] safe: By accessing the many-item ptr field of the items slice, we bypass the bounds checks that would otherwise be inserted if we sliced items directly, and we also know from how ArrayList is implemented that the backing buffer is capacity elements in length.

You could imagine an opposite design for ArrayList, where items.len represented the capacity and some different used_len field represented the number of live elements, but this means that we would need to slice items like self.items[0..self.used_len] whenever we wanted the live portion of the list, and it would also be a prone to user error since items will contain uninitialized elements that the user might inadvertently access.

By only making items include the live part of the backing buffer, we get an API that is both nicer to use (just use and pass items directly!) and more safe (from the bounds checks).

5 Likes

The backing buffer gets dismembered between the ArrayList fields. Logically, you can think of it like this:

const buffer = try allocator.alloc(Element, capacity);
return ArrayList{ 
  .items = [0..elementCount], 
  .capacity = capacity
};
1 Like

Wow! OK! I didn’t realize we had access to the slice fields! :exploding_head:
Neat. Makes a lot more sense now.