Loading a file to a string

QQ on allocRemaining. In my 0.15.x code I did this to read a file to a string…

    // Read the entire file into a string
    const file_size = try file.getEndPos();
    if (file_size > std.math.maxInt(usize)) return error.FileTooLarge;
    const file_content = try file.readToEndAlloc(allocator, @intCast(file_size));

In porting to 0.16 I did…

    const file_size = try file.length(io);
    var file_reader = file.reader(io, &.{});
    const file_content = try file_reader.interface.allocRemaining(allocator, .limited(file_size));

but it seems due to the behaviour of appendRemainingAlignedthis won’t work. The last read of the file will return an error error.StreamTooLongand the caller will fail even though it had read the file and not exceeded the limit.

Just adding 1 to the file size or changing to .unlimited fixes the problem.


    const file_content = try file_reader.interface.allocRemaining(allocator, .limited(file_size + 1));
// OR 
    const file_content = try file_reader.interface.allocRemaining(allocator, .unlimited);

The plus one trick is fine because it gives the desired behaviour of avoiding multiple allocations as the buffer grows, in exchange for a system call to get the file size.

Is this surprising behaviour or is it just me? ie should I just deal with it, or is there a clearer way for me to express the intent than just adding one?

1 Like

tldr: just use the +1 case, or preallocate the buffer since you know how much to read

allocRemaining tries to read from current position (here, start of file) to EOF, so it’ll read the entire file, while the limit gives allocRemaining an upper bound, which in this case is the length of the file. now, yoir code says allocRemaining must read the entire file but the limit says it must not read exactly that amount [it should read length - 1], which is contradition so it error.StreamTooLong before the final byte.

so, the case where +1 is used matches your desired effect

3 Likes

Thank you, that helps. It does what it says on the tin.

In this case you don’t want a limit, use .unlimited.
Not even for pre allocating the buffer, the interface will do that on its own.

Won’t unlimited do a number of allocations rather than just one as the loading progresses?

if you want to limit allocations, why not allocate the buffer yourself and pass it as the buffer to a File.Reader?

There is the TOCTOU issue always to be aware of. On any multithreaded operating system, the size of a file can change between the time you request the size and the time you read, so it is not always possible to guarantee a single read to a single contiguous buffer.

3 Likes

Nope! It will do a single allocation of the remaining file size!
unless the file size can’t be gotten, but you did it so clearly it does work for your file

If you follow the function calls:

pub fn allocRemaining(r: *Reader, gpa: Allocator, limit: Limit) LimitedAllocError![]u8 {
    var buffer: ArrayList(u8) = .empty;
    defer buffer.deinit(gpa);
    try appendRemaining(r, gpa, &buffer, limit);
    return buffer.toOwnedSlice(gpa);
}
pub fn appendRemaining(
    r: *Reader,
    gpa: Allocator,
    list: *ArrayList(u8),
    limit: Limit,
) LimitedAllocError!void {
    return appendRemainingAligned(r, gpa, .of(u8), list, limit);
}
pub fn appendRemainingAligned(
    r: *Reader,
    gpa: Allocator,
    comptime alignment: std.mem.Alignment,
    list: *std.array_list.Aligned(u8, alignment),
    limit: Limit,
) LimitedAllocError!void {
    // THIS IS IMPORTANT
    var a = std.Io.Writer.Allocating.fromArrayListAligned(gpa, alignment, list);
    defer list.* = a.toArrayListAligned(alignment);

    var remaining = limit;
    while (remaining != .nothing) {
        const n = stream(r, &a.writer, remaining) catch |err| switch (err) {
            error.EndOfStream => return,
            error.WriteFailed => return error.OutOfMemory,
            error.ReadFailed => return error.ReadFailed,
        };
        remaining = remaining.subtract(n).?;
    }
    return error.StreamTooLong;
}
pub fn stream(r: *Reader, w: *Writer, limit: Limit) StreamError!usize {
    const buffer = limit.slice(r.buffer[r.seek..r.end]);
    if (buffer.len > 0) {
        @branchHint(.likely);
        const n = try w.write(buffer);
        r.seek += n;
        return n;
    }
    const n = try r.vtable.stream(r, w, limit);
    assert(n <= @intFromEnum(limit));
    return n;
}

You can see that it just calls stream with the reader and a Writer.Allocating (made from an array list).

Jumping to the File reader implementation:

fn stream(io_reader: *Io.Reader, w: *Io.Writer, limit: Io.Limit) Io.Reader.StreamError!usize {
    const r: *Reader = @alignCast(@fieldParentPtr("interface", io_reader));
    return streamMode(r, w, limit, r.mode);
}
pub fn streamMode(r: *Reader, w: *Io.Writer, limit: Io.Limit, mode: Mode) Io.Reader.StreamError!usize {
    switch (mode) {
        .positional, .streaming => return w.sendFile(r, limit) catch |write_err| switch (write_err) {
            error.Unimplemented => {
                r.mode = r.mode.toSimple();
                return 0;
            },
            else => |e| return e,
        },
        .positional_simple => {
            const dest = limit.slice(try w.writableSliceGreedy(1));
            var data: [1][]u8 = .{dest};
            const n = try readVecPositional(r, &data);
            w.advance(n);
            return n;
        },
        .streaming_simple => {
            const dest = limit.slice(try w.writableSliceGreedy(1));
            var data: [1][]u8 = .{dest};
            const n = try readVecStreaming(r, &data);
            w.advance(n);
            return n;
        },
        .failure => return error.ReadFailed,
    }
}

In positional and streaming modes, it calls sendFile on the writer. This gives the writer access to the File.Reader implementation!!!

Jumping to the Writer.Allocating implementation:

// for completeness, `Writer.sendFile`
pub fn sendFile(w: *Writer, file_reader: *File.Reader, limit: Limit) FileError!usize {
    return w.vtable.sendFile(w, file_reader, limit);
}
// now Allocating's implementation
fn sendFile(w: *Writer, file_reader: *File.Reader, limit: Limit) FileError!usize {
    if (File.Handle == void) return error.Unimplemented;
    if (limit == .nothing) return 0;
    const a: *Allocating = @fieldParentPtr("writer", w);
    // gets the *remaining* size of the file, or default if error
    const pos = file_reader.logicalPos();
    const additional = if (file_reader.getSize()) |size| size - pos else |_| std.atomic.cache_line;
    if (additional == 0) return error.EndOfStream;
    // allocates the minimum between the limit and remaining size
    a.ensureUnusedCapacity(limit.minInt64(additional)) catch return error.WriteFailed;
    // reads directly into the allocated slice, accounting for limit
    const dest = limit.slice(a.writer.buffer[a.writer.end..]);
    const n = try file_reader.interface.readSliceShort(dest);
    if (n == 0) return error.EndOfStream;
    a.writer.end += n;
    return n;
}
1 Like

Nice! I was just tracing the same thing. Took me a while to find that file_reader.getSize().

In testing the allocation is a bit bigger then shrinks at the end; looks like because of array list growth, but that’s acceptable.

So .unlimited is the way to go thanks!

Yes, its cause array list growth. But in sendFile it would be reasonable to allocate exactly if the size is known.

1 Like

maybe, just `file_size + 1`? :slight_smile:

i stepped into that a couple of times.

if you want to have an entire (text) file content as a C-string inside your program, don’t forget that C-strings are… :slight_smile: C-strings, ASCIIZ they are, since +1.

I don’t really need the +1 for the memory allocation part of things. I return a slice from the function not a c string. The problem stemmed from misunderstanding and mis-using the api of allocRemaining.