Is there a cost to using a FixedBufferAllocator over directly writing to an array?

Alumman · August 23, 2024, 2:25pm

I have been playing around with file reading and found I like using a FixedBufferAllocator over a simple read to buffer since it carries it’s own size not dependant on the buffer it is held in / I don’t have to figure out where the input ends in my buffer manually. But is there a speed penalty for constantly allocating and freeing the FBA?

Standard file read:

const file = try std.fs.cwd().openFile("example_file.txt", .{.mode = .read_only});
var buffer:[200]u8 = [_]u8{0} ** 200;

var buffered_reader = std.io.bufferedReader(file.reader());
const buf_reader = buffer_reader.reader();

while (try buf_reader.readUntilDelimiterOrEof(&buffer, '\n') != null) {
    //doing things here
    clear_buffer_function(replace all values with 0);
}

With FBA:

const file = try std.fs.cwd().openFile("example_file.txt", .{.mode = .read_only});
var buffer:[200]u8 = undefined;

var fba = std.heap.FixedBufferAllocator.init(&buffer);
const allocator = fba.allocator();
var file_line: ?[]u8 = undefined;

var buffered_reader = std.io.bufferedReader(file.reader());
const buf_reader = buffered_reader.reader();

file_line = try buf_reader.readUntilDelimiterOrEofAlloc(allocator, '\n', 200);
while (file_line != null) { 
    //do things here
    allocator.free(file_line);
    file_line = try buf_reader.readUntilDelimierOrEofAlloc(allocator, '\n', 200);
}

pierrelgol · August 23, 2024, 2:41pm

I don’t hink there is a meaningful penalty really all an FBA does, is returning a pointer to some array of bytes, and increasing it’s internal index by that amount of bytes plus the alignment. As for the free I’m not sure if this is a no_op, or if it does reset it’s index to the beginning of the array of bytes, but in any case you are looking at a very small penalty at best. Depending on your application I think the FBA, is more resilient in the sense that you can easily change your allocation strategy and add logging capabilities. So even if there is a small but noticeable impact (which I heavily doubt) I think an FBA is still the best approach.

Alumman · August 23, 2024, 3:34pm

I know free does something because when I try to print the first index value and the last index (never reached) value I get:

first character of buffer �: 170
last character of buffer : 0

So it looks like it sets those values to undefined?

dimdin · August 23, 2024, 4:02pm

There is no gain using FixedBufferAllocator. You get more complex code and you risk to run out of memory.

In your initial code there is no need to initialize with zeros or to clear the buffer.

const file = try std.fs.cwd().openFile("example_file.txt", .{.mode = .read_only});
var buffered_reader = std.io.bufferedReader(file.reader());
const buf_reader = buffer_reader.reader();

var buffer:[200]u8 = undefined;
while (try buf_reader.readUntilDelimiterOrEof(&buffer, '\n')) |line| {
    //doing things here
 }

pierrelgol · August 23, 2024, 4:14pm

Don’t you think the FBA is a more flexible approach ? like one thing that I believe the FBA helps with, is communicating intent clearly, but of course you are also right, it’s technically more complex. I can also see the flip side of your argument in regards to the error, technically speaking the “running out of memory” part can also be an upside to write something safer right ?

Alumman · August 23, 2024, 4:35pm

So normally yes that is the case, and what I would do, but I am writing a fasta file parser where there is a standard format where the maximum line size is 120. (started with 200 since I didn’t fully understand if that was the most up-to-date standard)

The reason for an FBA is that when using a standard read to buffer name headers are longer than the lines of gene code and when I don’t do a cleaning operation I get holdover reading of the buffer. So an output would look like

AGCCCTCCAGGACAGGCTGCATCAGAAGAGGCCATCAAGCAGGTCTGTTCCAAGGGCCTTTGCGTCAGGT0] [chromosome=11]
GGGCTCAGGATTCCAGGGTGGCTGGACCCCAGGCCCCAGCTCTGCAGCAGGGAGGACGTGGCTGGGCTCG0] [chromosome=11]

and so on.
Maybe using the capture removes this issue? I haven’t tried that yet, but the FBA was the immediate solution I thought of and was easy enough to implement.

dimdin · August 23, 2024, 4:39pm

Yes, the part of the buffer that is filled is returned by readUntilDelimiterOrEof as a slice and captured as |line|.
You are always going to get the exact contents of the line.
The buffer is bigger and contains the line followed by garbage.

dimdin · August 23, 2024, 4:42pm

Not in this case – as a buffer replacement.
If there was multiple allocations for each line it will be a very good approach.

pierrelgol · August 23, 2024, 4:44pm

Fair argument, it’s probably overkill for a one time allocation

Alumman · August 23, 2024, 4:47pm

I see. I still don’t full understand capture groups so that is good to know. I thought it was just something in for loops. Does that mean in any conditional statement that returns a value, that value can be captured?
ex. (ignoring potential syntax errors)

fn is_odd(x:isize) bool {
    if (x % 2 == 0) return false;
    return true;
}
if (is_odd(x)) |output| {
    print("{d} is odd = {any}\n", .{x, output});
}

dimdin · August 23, 2024, 4:51pm

No, It is not about boolean conditions, it is about optionals and errors unions.

See the Ziggit documentation on Captures and Payloads.

Alumman · August 23, 2024, 4:57pm

See the Ziggit documentation on Captures and Payloads.

Bookmarking that for future reference when dealing with this more.
So it looks like more reading for me, but I think I’m getting the use case for captures a bit better.

Thank you for the input.