Lazy allocation / reallocation in a loop - how would you do this?

Suppose you have to write a loop that reads data from a file, or receives data from an interface, be it serial, network, or whatever.
You want to capture certain parts of this stream, but you don’t know in advance the size of the data blocks you would be capturing, and whether they’re going to be there at all.

This is what I came up with:

var data_buf: ?[]u8 = null;
while (<more data to read>)
{
    if (<got a data block we want to keep>)
    {
        _ = data_buf orelse
        {
            data_buf = try alloc8r.alloc(u8, <data block length>);
            errdefer alloc8r.free(data_buf);
        };
        // Was the buffer allocated previously for a shorter data block?
        // Also, from this point on we're 100% certain `data_buf` is not null.
        if (data_buf.?.len < <data block length>)
        {
             // allocate twice the size we need
            data_buf = try alloc8r.realloc(data_buf.?, <data block length> << 1);
        }

        // Read the data into `data_buf`, etc.
    }
}
...
// Before we leave this function - we can't use `defer` because we allocate
// in an `if`, inside a `while` loop.
if (data_buf) |data_slice|
    alloc8r.free(data_slice);

I think this should work, but I wonder if there’s a better way.
For example, I use _ = data_buf orelse to check if data_buf is null. It looks better than if (data_buf) |_| {} else { <allocate data_buf> } but not by much.
What do you do when you have to do something if optional is null, but nothing if it’s not?

Instead of using null, I would use an empty slice to signal no data. All allocators in Zig have a special case for freeing/reallocing empty slices. So this makes it a lot simpler to use:

var data_buf: []u8 = &.{};
defer alloc8r.free(data_buf);
while (<more data to read>)
{
    if (<got a data block we want to keep>)
    {
        // Was the buffer allocated previously for a shorter data block?
        if (data_buf.len < <data block length>)
        {
             // allocate twice the size we need
            data_buf = try alloc8r.realloc(data_buf, <data block length> << 1);
        }

        // Read the data into `data_buf`, etc.
    }
}

Also note that for freeing an optional buffer you can use defer like this:

defer if (data_buf) |data_slice|
    alloc8r.free(data_slice);
5 Likes

There are a couple of things I don’t quite understand here -

  1. After var data_buf: []u8 = &.{};, what is data_buf.ptr going to point to? It can’t be null, can it? Would it then point to an empty tuple on the stack, or something?
  2. Are you saying you can use realloc() on data_buf that … has not been initialized? It’s ptr pointing to null? Sorry, I’m not sure about the state of data_buf initialized with &.{}.

Also, it is my understanding that with potentially multiple realloc’s, you still register one errdefer which frees the buffer, after the initial alloc(), but in your code there’s no line which does just this first allocation. So, I’m not sure where would I place errdefer.

Another thing - in your code, if I understand it correctly, even the initial allocation would be twice the size of the first data block.
I have a reason to believe all the data blocks in my case will be of the same size (except for the last block, which can be smaller). This is why initial allocation allocates the size of the first data block exactly, and this is a feature I would like to keep.

It doesn’t matter what it points to. The important bit is that the length is 0. The &.{} here is just syntactic sugar to simplify

var data_buf: []u8 = undefined;
data_buf.len = 0;

realloc on a slice that has length zero, never attempts to free the old memory:

Similarly free just returns if the len is zero:

Note that the initial errdefer in your code was wrong. Defers are scope-dependent, and since your errdefer was the last statement in a scope, it will never be called since no further errors can happen in this scope.

Also note that defer statements are called when the scope is left, no matter if this happened due to an error, or regularly with a return statement. So if you have a defer on a given resource you don’t need an errdefer.

That should be easy. You can just check if the previous length was 0:

var new_len = <data block length>;
if(data_buf.len != 0) new_len *= 2; // allocate twice the size we need
data_buf = try alloc8r.realloc(data_buf, new_len);
4 Likes

Thank you for your reply, it really gives me some food for though. I have never considered zero-length slices and what they could be good for.

And you’re right, having errdefer inside of a block that ends on the next line is not too useful, if you think about it.

1 Like

Not sure if you’re intentionally trying to avoid using ArrayList, but it seems pretty well-suited for this sort of thing.

var buf = ArrayList(u8).init(allocator);
defer buf.deinit();

while (<more data to read>) {
    if (<got a data block we want to keep>)
    {
        list.clearRetainingCapacity();
        try list.ensureTotalCapacity(<data block length>);

        // Read the data into `buf`. Could use `writer()`,
        // appendAssumeCapacity(), or write into the return of
        // unusedCapacitySlice() and set buf.items.len to
        // the number of bytes written (or call resize())
    }
}

Note also that the ‘empty slice’ stuff mentioned by @IntegratedQuantum is also used by the ArrayList implementation:

5 Likes

Well, not intentionally. But ArrayList’s (in different languages, like Java, etc., and not always known by this exact name) and buffers I use for raw data have always lived in different secions of my mind.

Maybe this is because I (almost?) never dealt with raw data in languages that have ArrayList.