SliceBuilder style data structure

Hello there,
I have been using Zig for over 2 years now and did quite a lot in it.
Love the language and most of the decisions that have been made so far.
In my time using the language, I have build some custom data-structures that were not supported by standard library. Most of them were usecase specific, but one was comming back all the time . . .

SliceBuilder

Its quite simple, you have possibly runtime known size of a slice you want to create and combine your data in. You want to do it in a performant and debug safety checked way. Without data-structure, the code for it is often littered with @memcpy() math and if you are adding more then few items, its easy to make a mistake in it.

There are currently to my knowledge BoundedArray and FixedBufferStream, but both serve a bit different purpose and have runtime checks.

The MAIN QUESTION is, does it make sense for something like this to be in standard library? I find it really common pattern, but would love to get opinion from someone else. Also for code improvements!

This is my really simple implementation (unmanaged version without specified aligment), just to get the idea:

const std = @import("std");

pub fn SliceBuilder(comptime T: type) type {
    return struct {
        data: []T,
        used: usize,

        pub fn init(allocator: std.mem.Allocator, size: usize) !@This() {
            return .{
                .data = try allocator.alloc(T, size),
                .used = 0,
            };
        }

        pub fn deinit(self: *@This(), allocator: std.mem.Allocator) void {
            allocator.free(self.data);
        }

        pub fn appendSingle(self: *@This(), value: T) void {
            std.debug.assert(self.used < self.data.len);
            self.data[self.used] = value;
            self.used += 1;
        }

        pub fn appendSlice(self: *@This(), value: []const T) void {
            std.debug.assert(self.used + value.len <= self.data.len);
            @memcpy(self.data[self.used..][0..value.len], value);
            self.used += value.len;
        }

        pub fn toSlice(self: @This()) []T {
            std.debug.assert(self.used == self.data.len);
            return self.data;
        }

        pub fn toSliceOwned(self: @This(), allocator: std.mem.Allocator) ![]T {
            std.debug.assert(self.used == self.data.len);
            const ret = try allocator.alloc(T, self.data.len);
            @memcpy(ret, self.data);
            return ret;
        }
    };
}

test "SliceBuilder init" {
    const StringBuilder = SliceBuilder(u8);

    const random_string_2 = "o World!";

    var builder = try StringBuilder.init(std.testing.allocator, random_string_2.len + 4);
    defer builder.deinit(std.testing.allocator);

    builder.appendSlice("Hell");
    builder.appendSlice(random_string_2);

    try std.testing.expectEqualStrings("Hello World!", builder.toSlice());
    try std.testing.expectEqual(12, builder.toSlice().len);

    const my_slice = try builder.toSliceOwned(std.testing.allocator);
    defer std.testing.allocator.free(my_slice);

    try std.testing.expectEqualStrings("Hello World!", my_slice);
    try std.testing.expectEqual(12, my_slice.len);
}

test "SliceBuilder init fail" {
    const StringBuilder = SliceBuilder(u8);

    try std.testing.expectError(
        std.mem.Allocator.Error.OutOfMemory,
        StringBuilder.init(std.testing.failing_allocator, 12),
    );
}

test "SliceBuilder appendSingle()" {
    const StringBuilder = SliceBuilder(u8);

    var builder = try StringBuilder.init(std.testing.allocator, 5);
    defer builder.deinit(std.testing.allocator);

    builder.appendSingle('H');
    builder.appendSingle('e');
    builder.appendSingle('l');
    builder.appendSingle('l');
    builder.appendSingle('o');

    try std.testing.expectEqualStrings("Hello", builder.toSlice());
    try std.testing.expectEqual(5, builder.toSlice().len);

    const my_slice = try builder.toSliceOwned(std.testing.allocator);
    defer std.testing.allocator.free(my_slice);

    try std.testing.expectEqualStrings("Hello", my_slice);
    try std.testing.expectEqual(5, my_slice.len);
}

test "SliceBuilder appendSlice()" {
    const StringBuilder = SliceBuilder(u8);

    var builder = try StringBuilder.init(std.testing.allocator, 12);
    defer builder.deinit(std.testing.allocator);

    builder.appendSlice("Hel");
    builder.appendSlice("lo");
    builder.appendSlice(" ");
    builder.appendSlice("Wo");
    builder.appendSlice("rld");
    builder.appendSlice("!");

    try std.testing.expectEqualStrings("Hello World!", builder.toSlice());
    try std.testing.expectEqual(12, builder.toSlice().len);

    const my_slice = try builder.toSliceOwned(std.testing.allocator);
    defer std.testing.allocator.free(my_slice);

    try std.testing.expectEqualStrings("Hello World!", my_slice);
    try std.testing.expectEqual(12, my_slice.len);
}

test "SliceBuilder append()" {
    const StringBuilder = SliceBuilder(u8);

    var builder = try StringBuilder.init(std.testing.allocator, 12);
    defer builder.deinit(std.testing.allocator);

    builder.appendSlice("Hel");
    builder.appendSlice("lo");
    builder.appendSingle(' ');
    builder.appendSlice("Wo");
    builder.appendSlice("rld");
    builder.appendSingle('!');

    try std.testing.expectEqualStrings("Hello World!", builder.toSlice());
    try std.testing.expectEqual(12, builder.toSlice().len);

    const my_slice = try builder.toSliceOwned(std.testing.allocator);
    defer std.testing.allocator.free(my_slice);

    try std.testing.expectEqualStrings("Hello World!", my_slice);
    try std.testing.expectEqual(12, my_slice.len);
}

I feel like this use case is already covered by ArrayListUnmanaged with initBuffer() in the std library. Although your API is definitely more clear/convenient for what you’re trying to do.

1 Like

Even without initBuffer, ArrayList can give you a slice back with toOwnedSlice
For example the first test with array list:

const StringBuilder = std.ArrayListUnmanaged(u8);

const random_string_2 = "o World!";

var builder = try StringBuilder.initCapacity(std.testing.allocator, random_string_2.len + 4);
defer builder.deinit(std.testing.allocator);

builder.appendSliceAssumeCapacity("Hell");
builder.appendSliceAssumeCapacity(random_string_2);

try std.testing.expectEqualStrings("Hello World!", builder.items);
try std.testing.expectEqual(12, builder.items.len);

const my_slice = try builder.toOwnedSlice(std.testing.allocator);
defer std.testing.allocator.free(my_slice);

try std.testing.expectEqualStrings("Hello World!", my_slice);
try std.testing.expectEqual(12, my_slice.len);
1 Like

You are correct, its really similar in function.
BUT
There is also an argument, that BoundedArray is also replaceable by ArrayListUnmanaged with overflow check returning an error.
Its just convinience of not having to do some checking before calls, similar to the way SliceBuilder takes common pattern, that appears often and makes it nicer.

I am not trying to push for its inclusion, i am just asking since its nice quality of life for this kind of operation, if it could be included in standard library.
It has nice checking that all space was filled before accessing it, since that is a programmer error in case its not.

Similar, if it had added for example writer interface (and other methods it might want), it could have some nice debug build checks there too, for example non-growable writer without any release checks. Since in ideal situation there is no way of it failing, the size should be known at initialization time, if its incorrect its again programmer error and should be handeled in safe builds, no?

As a bonus, it comunicates the intent precisely.

ArrayListUnmanaged with initBuffer() is such a horrible hack. It throws away invariants and type safety, and there’s nothing that tells you that the ArrayList is in this mode, other than just reading the code. It’s a loaded footgun.
ArrayList does 2 things, manage memory and manage lifetimes. initBuffer is an ad-hoc attempt of getting just the lifetime management, without the memory management. A proper solution would been creating something that manages lifetimes, another that manages memory, and then composing them.

1 Like

I agree, still I think having multiple data structures in the stdlib that do essentially the same thing is not a good solution either. There’s already like half a dozen ring buffers in there…
Reimplementing ArrayList as a composition and exposing its composites is definitely something I could get behind though.

1 Like

The point of bounded array is that it contains the array itself and thus can be copied around as a value.

Anyhow, you will probably like the BufferedWriter from the writer/reader refactor branch.