Managing file reading, closing + buffer allocation and deallocation

The title is awkward, but here goes…

Say, you want to write a function that (1) opens a file, (2) allocates a buffer large enough to read the whole file, (3) reads the file into the buffer (4) closes the file, (5) works with the buffer, and (6) frees the buffer.

If in the process an error is caught, the file has to be closed (if it’s open) and the buffer has to be freed (if it’s allocated).

Also, the file has to be closed as soon as possible, which is a good policy to follow.

How would you do this?
My code is below, but it has a bug -

fn readFile(fname: []const u8, alloc8r: std.mem.Allocator) !void
{
    const f = try std.fs.cwd().openFile(fname, .{});
    errdefer f.close(); // -- But what if an error is returned AFTER the file is closed?

    const f_len = try f.getEndPos();
    var buf = try alloc8r.alloc(u8, f_len);
    defer alloc8r.free(buf); // This is fine

    const read_bytes = try f.readAll(buf);
    f.close(); // We want to close the file here, but `errdefer f.close();` is going to fire if we return an error somewhere below this line
    std.debug.print("Read {} bytes\n", .{read_bytes});

    // Work with `buf`, but if we return an error, the file is going to be closed again, which means a core dump
}

The problem with the code above is that errdefer f.close(); can be triggered even after the file is closed.

Another idea is to use a block { } around the file handling part. This, however, would mean that buf would be deallocated at the end of this block:

fn readFile(fname: []const u8, alloc8r: std.mem.Allocator) !void
{
    {
        const f = try std.fs.cwd().openFile(fname, .{});
        defer f.close(); // The file will be closed when we leave the block, which is great

        const f_len = try f.getEndPos();
        var buf = try alloc8r.alloc(u8, f_len);
        defer alloc8r.free(buf); // -- `buf` will be freed when we leave the block :(

        const read_bytes = try f.readAll(buf);
        std.debug.print("Read {} bytes\n", .{read_bytes});
    }

    // Can't work with `buf`, it's not defined in this scope :(
}

So, to avoid this, defer alloc8r.free(buf); would have to be placed AFTER the block where file handling takes place.
However, if try f.readAll(buf); returns an error, buf is not going to be freed, so we have to add errdefer immediately after allocating buf:

fn readFile(fname: []const u8, alloc8r: std.mem.Allocator) !void
{
    var buf: []u8 = undefined; // Don't like this...
    {
        const f = try std.fs.cwd().openFile(fname, .{});
        defer f.close(); // The file will be closed when we leave the block, which is great

        const f_len = try f.getEndPos();
        buf = try alloc8r.alloc(u8, f_len);
        errdefer alloc8r.free(buf); // Will be active until the block ends

        const read_bytes = try f.readAll(buf);
        std.debug.print("Read {} bytes\n", .{read_bytes});
    }
    defer alloc8r.free(buf);

    // Work with `buf`, it will be freed when we leave the function, whether normally, or by returning an error
}

This could actually work, if I’m not missing anything, but… It’s far from simple and elegant.

I wonder if there are better ideas.

In other words, how to translate this standard C-pattern

int func(void) {

    int fd = open(...);
    if (-1 == fd) goto __failure;
    u8 *buf = malloc();
    if (NULL == buf) goto __failure;
    int r = read(fd...);
    if (-1 == r) goto __failure;

    // everyrhing ok, do what we wanted to do
    return OK;
 
  __failure:
    if (fd != -1)
        close(fd);
    if (buf)
        free(buf);
    return FAILED;
}

into Zig correctly and “idiomatically”?

1 Like

Yes, something like that.

BTW, at first I was looking for a way to check if my file is open or closed, so that I could implement something like if (fd != -1) close(fd); in my defer.
But I could not find it (did not look hard enough?)

How about this:

fn readFile(fname: []const u8, alloc8r: std.mem.Allocator) !void
{
    var buf: []u8 = blk: {
        const f = try std.fs.cwd().openFile(fname, .{});
        defer f.close(); // The file will be closed when we leave the block, which is great

        const f_len = try f.getEndPos(); 
        const _b =  try alloc8r.alloc(u8, f_len);
        errdefer alloc8r.free(_b); // Will be active until the block ends
        const read_bytes = try f.readAll(buf);
        std.debug.print("Read {} bytes\n", .{read_bytes});
        break :blk  _b;
    };
    defer alloc8r.free(buf);

    // Work with `buf`, it will be freed when we leave the function, whether normally, or by returning an error
}

Please note use of break to return from the labeled block.

5 Likes

I would just put it into a helper function:

fn readFileToBuffer(allocator: Allocator, fname: []const u8) ![]u8 {
    const f = try std.fs.cwd().openFile(fname, .{});
    defer f.close(); // The file closes before we exit the function which happens before we work with the buffer.

    const f_len = try f.getEndPos();
    var buf = try alloc8r.alloc(u8, f_len);
    errdefer alloc8r.free(buf); // In case an error happens while reading

    const read_bytes = try f.readAll(buf);
    std.debug.print("Read {} bytes\n", .{read_bytes});
    return buf;
}

fn readFileAndDoStuff(allocator: Allocator, fname: []const u8) !void {
    const buf = try readFileToBuffer(allocator, fname);
    defer allocator.free(buf);

    // Work with `buf`
}

5 Likes

A citation from docs:

// The errdefer keyword is similar to defer, but will only execute if the
// scope returns with an error.
//
// This is especially useful in allowing a function to clean up properly
// on error, and replaces goto error handling tactics as seen in c

But it’s a bit unclear (for me), how exactly try/defer/errdefer should be combined,
whereas “goto tactics” is pretty clear. What if I need two files and two buffers
(say, to compare them byte-by-byte)? With goto __failure technique
I will just add some code to initialization/cleanup parts of the function - and I
can do these two open()'s and malloc()'s in any order, over boots,
if something went wrong, just jump to cleanup section.

If I understand it correctly, both defer and errdefer statements are auto-inserted at the end of the scope in the LIFO order. When the function returns these statements are executed for all enclosed scopes with a caveat that errdefer are skipped on a normal (non-error) returns. Execution is in a strict LIFO order and inside-out for enclosed scopes. Hope it helps.

I believe you’ll close the files and free the buffers in any case, regardless of whether the operation succeds or fails, so you want to use defer:

const buffer1 = try allocator.alloc(u8, 512); //if this goes wrong, just return an error. No cleanup necessary.
defer allocator.free(buffer1);

const file1 = try openFile(...); //If this goes wrong, free buffer1 and return an error.
defer file1.close();

try file1.read(buffer1); //If this goes wrong, close file1, free buffer1 and return error.

const buffer2 = try allocator.alloc(u8, 512); // same as last comment.
defer allocator.free(buffer2);

const file2 = try openFile(...); //If this goes wrong, free both buffers, close file1 and return an error
defer file2.close();

try file2.read(buffer2); //If this goes wrong, close both files, free both buffers and return error.

try useTwoBuffers(buffer1, buffer2); //Same as last comment.

// We are done, close both files, free both buffers and return void.

If you want something to only happen in case of error, use errdefer. Suppose that in the previous example, if the operation succeds, you want to return the second buffer. You could rewrite it like this:

const buffer1 = try allocator.alloc(u8, 512); //if this goes wrong, just return an error. No cleanup necessary.
defer allocator.free(buffer1);

const file1 = try openFile(...); //If this goes wrong, free buffer1 and return an error.
defer file1.close();

try file1.read(buffer1); //If this goes wrong, close file1, free buffer1 and return error.

const buffer2 = try allocator.alloc(u8, 512); // same as last comment.
errdefer allocator.free(buffer2); // Now we only want to free buffer2 in case of an error.

const file2 = try openFile(...); //If this goes wrong, free both buffers, close file1 and return an error
defer file2.close();

try file2.read(buffer2); //If this goes wrong, close both files, free both buffers and return error.

try useTwoBuffers(buffer1, buffer2); //Same as last comment.

return buffer2; //no error happened, so buffer2 won't be freed and we safely return it. We still close both files and free buffer1.
1 Like

Of course!

So, the rule number one: if I just want to do some cleanup (close files/dirs, deallocate things and so on), I simply put defer right after each operation that needs cleanup given that I acquire a resource with try getSomeThing(). Did I get you right (regarding first part of the answer)?

1 Like

Because you want to close the file in two separate contexts (once you are done and when there was an error), you could also declare it as var file: ?File and, when closing, check if it is valid:

errdefer { if (file) |f| { f.close(); f = null; }
// do stuff
...
// eagerly close the file:
if (file) |f| { f.close(); f = null; }
// do more stuff, file is now closed
...
1 Like

Aha! It seems that the rule number two might look like “if you want to return something from a function, use errdefer after getting this something with try getSomeThing()”. Is it correct?..

Correct. For unconditional cleanup, use defer.
I usually use errdefer when I’m constructing something from pieces. If something goes wrong, I undo all the pieces that were already done. If nothing goes wrong, I want to return the object, so I don’t want to cleanup the pieces:

var result: Result = undefined;
result.piece1 = try makePiece1(...);
errdefer result.piece1.destroy();

result.piece2 = try makePiece2(...);
errdefer result.piece2.destroy();

result.piece3 = try makePiece3(...);
errdefer result.piece3.destroy(); 

// No error happened, so I don't want to do any cleanup, I just want to return the result.
return result; 

The last errdefer is unnecessary, but I usually write the last errdefer even when unnecessary. If I want to add more pieces later, I don’t need to remember to write it. Zig will find and remove unreachable errdefer statements.

2 Likes

Back to initial example. As we are not going to return anything, I guess we can write just:

fn readFile(fname: []const u8, alloc8r: std.mem.Allocator) !void
{
    const f = try std.fs.cwd().openFile(fname, .{});
    defer f.close();

    const f_len = try f.getEndPos();
    var buf = try alloc8r.alloc(u8, f_len);
    defer alloc8r.free(buf);

    const read_bytes = try f.readAll(buf);
    std.debug.print("Read {} bytes\n", .{read_bytes});

    // Work with `buf`
}

No errdefer is needed here at all.

1 Like

And to comply with OP’s #4 step (close the file when done with it as soon as possible), incorporating @gonzo 's code:

fn readFile(fname: []const u8, alloc8r: std.mem.Allocator) !void
{
    var maybe_file: ?File = try std.fs.cwd().openFile(fname, .{});
    errdefer {
        if (maybe_file) |file| file.close();
    }

    const f_len = try maybe_file.?.getEndPos();
    var buf = try alloc8r.alloc(u8, f_len);
    defer alloc8r.free(buf);

    const read_bytes = try maybe_file.?.readAll(buf);
    maybe_file.?.close(); // close when done.
    maybe_file = null; // errdefer now does nothing.
    std.debug.print("Read {} bytes\n", .{read_bytes});

    // Work with `buf`
}

The maybe_file.?... bits may not seem pretty, but I guess they’re the price to pay if you want to close as soon as possible versus letting defer do it at scope exit. You could also unwrap the payload and work with it directly given that you are guaranteed at that point to have a file and not a null.

This is an improvement, in that there’s no undefined variable.

@IntegratedQuantum - Yes, this is a further refinement.
I guess adding the type of buffer to be allocated and returned, instead of hardcoding u8 could be another improvement, not sure if useful though.

Thank you both!

2 Likes

This leaves the file open until the function returns. I was trying to close is as soon as possible.

@LucasSantos91, same thing in your code here -

@dude_the_builder

I wish the File struct would remember its state, that is if the file is open or closed. This way the file var would not have to be optional.

I was actually answering @dee0xeed question about errdefer vs defer, not actually trying to answer the topic. @IntegratedQuantum’s is the best answer. Their readFileToBuffer is actually in the std library as readToEndAlloc.
Also, @gonzo’s answer has the overhead of checking if the file is open, while @IntegratedQuantum’s answer doesn’t.

1 Like

Doing that would add overhead, as each call would need to check the state. However, if you really want it, it’s trivial to implement.

Good point, thank you!

1 Like