How to use toOwnedSlice() and alternatives

mscott9437 · December 14, 2023, 8:42pm

So recently I was looking into this function. I found it to be a way to say in the code that I’m done with the ArrayList by storing the items field in a slice while simultaneously freeing the ArrayList. This also cleans the code up pretty well and helps to organize the ArrayLists. So basically I have two questions following up on this. #1 What are some typical use cases for toOwnedSlice/fromOwnedSlice and #2 What would be a way to trivially accomplish this with Zig primitives?

dude_the_builder · December 14, 2023, 11:53pm

My experience with ArrayList and toOwnedSlice has been as a really convenient gathering mechanism within a function. For example, let’s say you have a lexer that produces tokens like an iterator by calling the next method. You want to gather up all the tokens in a []Token that you can easily pass around or store in a struct. To accomplish the same using just an allocator would be tedious and error-prone. The ArrayList makes the code easy to write and read.

fn lex(self: *Lexer, allocator: Allocator) ![]Token {
    var list = std.ArrayList(Token).init(allocator);
    defer list.deinit(); // just in case
    while (try self.next()) |token| try list.append(Token);
    return try list.toOwnedSlice();
}

Why the just in case comment on the call to deinit? I prefer to always setup a call to deinit even if at this point it’s a no-op. You never know if in the future ArrayList may have something more it needs to clean up on deinit (beyond just the slice of items), and it doesn’t do much harm to make this call anyway.

squeek502 · December 15, 2023, 12:06am

Note that you can use errdefer instead and it’ll give you the same behavior without the redundant deinit call on success.

(fun side note: there was a very brief time when defer was necessary: `ArrayList.toOwnedSlice`: Fix potential for leaks when using errdefer by squeek502 · Pull Request #13947 · ziglang/zig · GitHub)

squeek502 · December 15, 2023, 12:25am

As @dude_the_builder said, toOwnedSlice is mostly a convenience for ending up with a slice with a precise length without needing to know the precise length ahead-of-time. In terms of what the pattern would look like without ArrayList/toOwnedSlice, it’d be something like this:

// Since resizing is inconvenient without ArrayList, we start with a large
// initial size
var buf = allocator.alloc(T, 512);
errdefer allocator.free(buf);

var i: usize = 0;
// Same sort of thing as dude_the_builder's example
while (some_iterator.next()) |entry| {
    // ArrayList will auto-resize if there is more space needed, but
    // this implementation would need to handle that case manually
    // (which is not done here, this example would hit index-of-out-bounds
    // if there end up being more than 512 entries)
    buf[i] = entry;
    i += 1;
}

// Now we want to return a slice containing only the values that were actually
// populated, but if we just return buf[0..i], then when the caller
// frees it, the length of the slice won't match the originally allocated length
// (which is something that the allocator interface expects and e.g.
// the GeneralPurposeAllocator will complain about ["Allocation size does
// not match free size"]).
//
// So, we resize the allocation down to the populated slice and return that.
// This is what `toOwnedSlice` does for you (but it tries to resize
// in-place first and then only re-allocates if the resize isn't possible)
const populated_slice = try allocator.realloc(buf, i);
return populated_slice;

gonzo · December 15, 2023, 10:57am

A question, for my own sanity. It seems clear from the code (particularly from the redundant but very informative defer list.deinit(); line) that the ArrayList itself will be deallocated upon returning from the function. But what about the slice you are returning? Doesn’t that have to be freed by the caller at some point? If yes, what is the exact incantation for doing that?

dude_the_builder · December 15, 2023, 11:44am

Yes, the caller must free those returned bytes with the same allocator they passed in. This warrants a good doc comment explaining this for the function:

// Caller owns and must free the returned slice with `allocator`.
fn lex(self: *Lexer, allocator: Allocator) ![]Token {

Then the caller should

var tokens = try lexer.lex(allocator);
defer allocator.free(tokens);

mscott9437 · December 17, 2023, 12:09am

So based on everybody’s responses, I was able to go back and put together an example of where I was considering using toOwnedSlice.

There are 2 ArrayLists here. input which represents one line of input from a console, and bytes which represents all the lines concatenated into one list. So after every line of input the items field of input is added to bytes using appendSlice(). I found it easier to use 2 ArrayLists because you can track each length independently, and you don’t have to worry about calculating any relative lengths.

var input = std.ArrayList(u8).init(allocator);
var bytes = std.ArrayList(u8).init(allocator);

defer input.deinit(); // errdefer?
defer bytes.deinit();

while (reader.streamUntilDelimiter(input.writer(), '\n', 800) != error.EndOfStream) {
   try input.append('\n');

   try bytes.appendSlice(input.items); // appendslice(try input.toOwnedSlice)
   input.clearRetainingCapacity();
}

First I have one question which is would you use errdefer list.deinit() when you are not calling toOwnedSlice(), or or would it be good to use it here?

Next, I was considering using toOwnedSlice with the appendSlice, since it would also release the memory here. After that I wouldn’t need clearRetainingCapacity or defer deinit. But based on what I got out of everybody’s replies to my original post, toOwnedSlice in this case would be overkill, sine it’s being called from an inner block relative to where the ArrayList is defined. It would almost be like calling deinit over and over again on the ArrayList, when it’s just needed to be called once from the outer block. Also without the defer deinit on the outer block, it’s possible to break out of the loop before toOwnedSlice is called. In this case the memory would not be freed.

Also to mention from what I can tell from the lexer, toOwnedSlice seems to be more useful this kind of situation, maybe where you need to pass around some arbitrary data and have it behave as expected.

So after thinking about it some more and looking at some of the implementations used here, I decided it would be better to go with this pattern.

//defer deinit

while (reader.streamUntilDelimiter(input.writer(), '\n', 800) != error.EndOfStream) {
   try input.append('\n');

   try bytes.appendSlice(input.items);
   input.clearAndFree();
}

I think this is a good middle ground since clearAndFree does just a step more than clearRetainingCapacity, with no noticeable performance impact in this case. Every line of input is processed individually when pressing Enter, so there is opportunity there for some minimal cleanup.

Finally I had another idea along the way. Not sure if this has already been suggested, but would it be possible to have something like std.ArrayList(u8).initAndDefer(allocator)? Just curious

AndrewCodeDev · December 17, 2023, 12:29am

I definitely think you’re conceptually on track with what you’re proposing here. Here’s an example I made to help clarify what happens with defer and errdefer:

const err = error {
    err_test
};
pub fn makeError() !void {
    return err.err_test;
}
pub fn errDeferFunc() void {
    std.debug.print("\nI am an errdefer.\n", .{});
}
pub fn deferFunc() void {
    std.debug.print("\nI am a defer.\n", .{});
}

pub fn testDeferrals(make_error: bool) !void {

    // this will run regardless of creating an error
    defer deferFunc();

    // this will run only if there is an error
    errdefer errDeferFunc();

    // the end result is that defer will always run
    // and errdefer will only run if there's an error

    if (make_error) {
        try makeError();
    }
}

pub fn main() !void {    
    std.debug.print("\n\n-- Without an error -- \n\n", .{});
    testDeferrals(false) catch { };
    std.debug.print("\n\n-- With an error -- \n\n", .{});
    testDeferrals(true) catch { };
}

Essentially, errdefer simply means that it’s only going to fire if there’s an error. If you want something to be invoked regardless of whether or not there was an error at a given scope, use defer.

The problem with initAndDefer is where would the defer statement be placed? It’s not being placed after the function call, so it would need to be placed within the function call. So the defer statement would be called before the function itself exits.

More seriously though, I think what you’re looking for is something like a destructor. Zig very consciously chose to not have those, but it’s a common feature in other languages - it’s just a function that gets called when the object itself goes out of scope (like every object you create has it’s own deinit statement attached to it).

Help me understand your reasoning behind clearAndFree instead of clearRetainingCapacity? appendSlice copies it’s arguments:

        /// Append the slice of items to the list. Allocates more
        /// memory as necessary.
        /// Invalidates pointers if additional memory is needed.
        pub fn appendSlice(self: *Self, items: []const T) Allocator.Error!void {
            try self.ensureUnusedCapacity(items.len);
            self.appendSliceAssumeCapacity(items);
        }
        /// Append the slice of items to the list, asserting the capacity is already
        /// enough to store the new items. **Does not** invalidate pointers.
        pub fn appendSliceAssumeCapacity(self: *Self, items: []const T) void {
            const old_len = self.items.len;
            const new_len = old_len + items.len;
            assert(new_len <= self.capacity);
            self.items.len = new_len;
            @memcpy(self.items[old_len..][0..items.len], items);
        }

Since it’s copying the memory over, it really doesn’t matter what happens to input.items at that point. Why call free repeatedly when you can just reuse the same memory for each iteration of the while loop? I may not be understanding your intention here.

Edit: Ah, I see, you want to clean up after every iteration. I’d argue it’s best to just do that once when you’re done taking in input.

dude_the_builder · December 17, 2023, 1:17am

Thinking about what you want to do with the 2 array lists, I realized that you can achieve the same with just a single list:

while (true) {
    reader.streamUntilDelimiter(bytes.writer(), '\n', 1024) catch |err| {
        if (err == error.EndOfStream) break;
        return err;
    };
    try bytes.append('\n');
}

mscott9437 · December 17, 2023, 1:26am

The reason I brought in the second ArrayList was do deal specifically with multiline vs single line input. Here is the full code where I tried to re-implement multiline input. End the line with ‘\’ to combine the following line. An empty line will cause all the bytes stored to be output to the console and reset.

const std = @import("std");

pub fn main() !void {
   const allocator = std.heap.page_allocator;

   const reader = std.io.getStdIn().reader();
   const writer = std.io.getStdOut().writer();

   var input = std.ArrayList(u8).init(allocator);
   var bytes = std.ArrayList(u8).init(allocator);

   defer input.deinit();
   defer bytes.deinit();

   try writer.print("> ", .{ });

   while (reader.streamUntilDelimiter(input.writer(), '\n', 800) != error.EndOfStream) {
      if (input.items.len > 0 and input.items[input.items.len - 1] == '\r') {
         input.shrinkRetainingCapacity(input.items.len - 1);
      }

      if (input.items.len == 0) {
         try input.append('\n');

         try bytes.appendSlice(input.items);
         input.clearRetainingCapacity();

         for (bytes.items) |c| {
            try writer.print("{c}", .{ c });
         }

         bytes.clearRetainingCapacity();

         try writer.print("> ", .{ });
      }

      else if (input.items[input.items.len - 1] == '\\') {
         input.shrinkRetainingCapacity(input.items.len - 1);

         try bytes.appendSlice(input.items);
         input.clearRetainingCapacity();

         try writer.print("+ ", .{ });
      }

      else {
         try input.append('\n');

         try bytes.appendSlice(input.items);
         input.clearRetainingCapacity();

         try writer.print("> ", .{ });
      }
   }
}

Also when using one ArrayList you would have to compare the new length of the bytes array to the old one to get the length of the current line. With a second ArrayList you already have input.items.len

Maybe there is a way to do this mulltiline input with one ArrayList. I tried several times but did not get it right yet.

AndrewCodeDev · December 17, 2023, 1:52am

     for (bytes.items) |c| {
        try writer.print("{c}", .{ c });
     }

@dude_the_builder I believe you can just print the whole slice here as as string, no? I’m not at my main computer but it looks like that may be an option here. I know there’s a way to dump the whole thing in because it’s calling std.fmt.format under the hood.

Otherwise, @mscott9437, I think you’re example is looking pretty good. You’re managing memory more efficiently and it looks like clean code. If you need the second array (or it just serves your problem well and you’re happy with the performance), then I’d say you’re fundamentally understanding the toOwnedSlice vs alternatives issue.

dude_the_builder · December 17, 2023, 2:05am

You’re right @AndrewCodeDev , since bytes.items is a slice of u8, that’s a string in Zig!

try writer.print("{s}", bytes.items);

As a side note @mscott9437 , if you were to keep the byte by byte version, I’d recommend the {u} format specifier which will support Unicode versus {c} that only works with ASCII.

mscott9437 · December 17, 2023, 2:14am

I think everything was covered pretty well in all the responses. There’s still a question of whether I want to use bytes.appendSlice(input.items) or something else. So that can be for another thread. Also I might use a for loop to print each character individually, rather than printing the slice all at once. So I can operate on each character. But in this case yeah I was just printing the slice.

squeek502 · December 17, 2023, 2:22am

Wouldn’t it need to use Utf8Iterator for {u} to actually do something? Iterating byte-by-byte will only be able to represent codepoints 0...255 (and it would be misinterpreting bytes > 127 if the slice is UTF-8).

dude_the_builder · December 17, 2023, 2:41am

Yes, that’s correct . You would need to iterate over the code points and not the bytes.

dude_the_builder · December 17, 2023, 12:38pm

After looking at the source of toOwnedSlice and clearRetainingCapacity, it seems to me that your current code is much more efficient, given that toOwnedSlice does a whole lot more (including possible re-allocation) than clearRetainingCapacity.

mscott9437 · December 17, 2023, 1:57pm

Yeah basically I decided that toOwnedSlice was overkill for this case. I also tried swapping clearRretainingCapacity for clearAndFree, but as @AndrewCodeDev pointed out appendSlice already does a copy. So i am still kindof where I started, since my original idea was to see if there was anything else I could do besides simply appending the items field and calling clearRetainingCapacity. I noticed the implementation of toOwnedSlice calls allocatedSlice, which returns the slice plus the extra capacity by using the slice syntax, so it doesn’t copy anything.

        /// Returns a slice of all the items plus the extra capacity, whose memory
        /// contents are `undefined`.
        pub fn allocatedSlice(self: Self) Slice {
            // `items.len` is the length, not the capacity.

            return self.items.ptr[0..self.capacity];
        }

So there might be something I can do with that here. Also there is still the question of whether or not I really need 2 ArrayLists. But either way I am close enough. The main question I had was about toOwnedSlice() and what it’s good for, which has been cleared up pretty well for me now based on everybody’s initial responses.

Edit: There’s also a side issue which provided some motivation for the original post, which is using the pattern of calling defer deinit right after initializing the ArrayList. Originally I was experimenting with toOwnedSlice as a way to use an ArrayList without defer deinit(), which didn’t hold up. But that is one other thing I was looking into here.