Allocation is not Initialization

dude_the_builder · February 4, 2024, 11:16pm

When defining a struct in Zig, you have the option of providing default values for its fields. This can be very convenient because it allows you to rapidly instantiate the struct with little or no extra data required. It is a powerful language feature when defining structs that collect default option values in configuration / initialization functions:

const ServerOptions = struct {
    ip_addr: []const u8 = "127.0.0.1",
    port: u16 = 8080,
    send_timeout_ms: usize = 10,
};

// The Server.init function takes a ServerOptions.
// Here we are only overriding the port field
var server = try Server.init(.{ .port = 8888 });

The Footgun

A problem can arise when you combine memory allocation with a struct with default field values.

const User = struct {
    domain: []const u8 = "ziggit.dev",
    enabled: bool = false,
};

const user_ptr = try allocator.create(User);
defer allocator.destroy(user_ptr);

// Boom!
std.debug.print("enabled: {}\n", .{user_ptr.enabled});
std.debug.print("domain: {s}\n", .{user_ptr.domain});

Building and running in Debug and ReleaseSafe modes produce a panic when trying to print the domain field:

enabled: false
domain: thread 8619926 panic: reached unreachable code

In ReleaseFast or ReleaseSmall you get incorrect output:

enabled: false
domain:

The mistake here is thinking that by allocating a User struct in memory, the default values will be filled in automatically. This is made even harder to detect or understand when you observe that in the case of the bool field enabled, everything seems to be working fine. In this case, the default of false matches the uninitialized memory state of a bool, so the error goes undetected. If you set the default value to true, you will notice the error by being surprised to find the value reported as false.

The Fundamental Problem

Going beyond the specific case of struct fields with default values and allocation, we must realize that any type of allocation only produces uninitialized space in memory. So even for simple primitive types like usize, you can run into undefined behavior if you don’t initialize the newly allocated memory yourself:

const x = try allocator.create(usize);

// What is the value of y?
const y = x.*;

As you can see, allocation is fundamentally not initialization because it concerns itself only with reserving memory, not the value at that memory location.

Possible Workarounds

Manually Set Field Values

You can directly set the field values after allocating the struct’s memory:

// Allocate uninitialized memory for User.
const user_ptr = try allocator.create(User);
defer allocator.destroy(user_ptr);

// Initialize the memory.
user_ptr.* = .{ .domain = "example.com", .enabled = true };

Use a “create” Function

You can define a create function that does the allocation and initializes the fields in one call:

const User = struct {
    domain: []const u8 = "ziggit.dev",
    enabled: bool = false,

    fn init(domain: []const u8, enabled: bool) User {
        return .{ .domain = domain, .enabled  = enabled };
    }

    fn create(
        allocator: Allocator,
        domain: []const u8,
        enabled: bool,
    ) !*User {
        // Allocate uninitialized memory for the User.
        const user_ptr = try allocator.create(User);

        // Initialize the memory with the init function.
        user_ptr.* = User.init(domain, enabled);

        return user_ptr;
    }
};

In Summary

When combining structs with default field values and memory allocation, you must be aware that the allocation process does not fill in the default values for the struct’s fields. Once allocated, it’s your responsibility to initialize the struct’s fields in order to avoid undefined behavior. The same applies for any type when allocating memory for it, you have to initialize that memory with a value of the type before you can use it.

dee0xeed · February 8, 2024, 8:22pm

I think, it’s even better to have both init and create “constructors”, like this:

const std = @import("std");
const Allocator = std.mem.Allocator;

const User = struct {
    domain: []const u8 = "ziggit.dev",
    enabled: bool = false,

    fn init(d: []const u8, e: bool) User {
        return .{.domain = d, .enabled  = e};
    }

    fn create(a: Allocator, d: []const u8, e: bool) !*User {
        var u = try a.create(User);
        u.* = init(d, e);
        return u;
    }
};

pub fn main() !void {
    const a = std.heap.c_allocator;
    const u_on_stack = User.init("dom1.org", true);
    const u_on_heap = try User.create(a, "dom2.org", true);

    std.debug.print(
        "user on stack : d = '{s}', e = {} ({*})\n",
        .{u_on_stack.domain, u_on_stack.enabled, &u_on_stack}
    );

    std.debug.print(
        "user on heap  : d = '{s}', e = {} ({*})\n",
        .{u_on_heap.domain, u_on_heap.enabled, u_on_heap}
    );
}

Note: compile with zig build-exe aini.zig -lc

This way we can conveniently use either stack allocated objects or heap allocated objects, depending on whatever is needed at the moment.

Output of the program:

$ ./aini 
user on stack : d = 'dom1.org', e = true (aini.User@7ffc6442dff0)
user on heap  : d = 'dom2.org', e = true @aini.User@4892a0)

dude_the_builder · February 8, 2024, 8:58pm

Yes this would be a great solution for a complete program or library indeed. But since the topic is specifically about memory allocation on the heap using an allocator, I tried to keep the example as focused on that as possible.

alp · February 9, 2024, 8:20pm

You can do the following (although I think this should be part of the allocator interface):

fn createDefault(comptime T: type, allocator: std.mem.Allocator) !*T {
    const result = try allocator.create(T);
    inline for (std.meta.fields(T)) |field| {
        if (field.default_value) |dv| {
            @field(result, field.name) = @as(*const field.type, @ptrCast(@alignCast(dv))).*;
        }
    }
    return result;
}

AndrewCodeDev · February 9, 2024, 8:49pm

That’s an interesting take, but this problem goes further than struct fields and I think we need to add that to this Doc.

You can allocate items that do not have default values and fundamentally end up with the same problem:

const x = try allocator.create(usize);

// what is the value of y?
const y = x.*;

Allocation is fundamentally not initialization because it concerns itself with reserving memory, not the value at that memory location.

Now, that said, I think you have a cool idea, but the issue here is we need to expand the doc beyond struct field defaults.

dude_the_builder · February 9, 2024, 11:23pm

@alp , @AndrewCodeDev : I added a new section to address the topic of memory allocation in general and also added the comptime field initialization example. @dee0xeed , I added your init function to the create example.

Thanks all for the great input!

Sze · February 9, 2024, 11:36pm

I think I would prefer this:

user_ptr.* = .{
    .domain = domain,
    .enabled = enabled,
};

This is the same, except that it doesn’t set the defaults, just to override them anyway.

And instead of:

This:

user_ptr.* = .{ .domain = "example.com" };

Is there a reason to separate them into multiple steps?
I find it relatively rare that I need to do that…

dee0xeed · February 10, 2024, 9:00am

And this is absolutely obvious for C programmers, since there is no default fields’ values at all

I am wondering how it is possible to think this way. Is this this very feature (defaults for struct fields) that could make people think that heap allocated structures will be filled automagically?

FObersteiner · February 10, 2024, 9:43am

I’m potentially guilty of that - even though from a semantic point of view, I would not assume that allocation and initialization are the same I guess that if you’re coming from “higher” languages, you might be used to the fact that some magic things happen in the background that do this stuff for you?

dude_the_builder · February 10, 2024, 10:08am

Exactly! This is the primary reason for me writing this up in the first place. Especially programmers coming from Go, where everything always has a default value. Go made it one of their language design decisions to always set the default value if no explicit value is assigned. So if you’re coming from a language like that, it’s easy to think that when you allocate memory for a struct that has default values for its fields, those fields would be initialized with those default values.

This is exactly why I wanted to focus this topic on structs with default field values and not the general case of any allocation. The “default values” part of the language is what may cause the confusion and thus the footgun. Putting it another way, if there were no default field values feature, there would be less ground for confusion.

plaukiu · February 10, 2024, 3:31pm

ah, the RAINI technique. never forget your umbrella**!

tauoverpi · March 25, 2024, 1:23pm

The last example in “Use Some Comptime Awesomeness” ends up with a half initialized struct which is also rather bug prone. Using result.* = .{}; syntax is much better in that it forces you to decide which fields remain undefined explicitly (if their default is not undefined). Thus I’d say the last example leads beginners down the wrong path. There are cases for partial initialization but using comptime as shown here is not a good way to go about it.

It’s also a rather complex example that is better covered in comptime metaprogramming as one should not write this kind of code on a regular basis.

dude_the_builder · March 26, 2024, 11:16am

Thanks for pointing this out. I removed that section.

marler8997 · March 27, 2024, 4:08am

I find that in practice this kind of error is rarely a source of bugs. Usually resource allocation/initialization is in a code path that is almost always executed deterministically and during testing so any errors will be caught/fixed immediately before the changes are finished. It’s when you get non-deterministic code paths coupled with code that isn’t exercised as much that your bugs will linger. Cleanup/error handling are big ones.

AndrewCodeDev · March 27, 2024, 4:11am

Probably true for people with a low-level background, but we get a lot of beginners who have issues with this (for some people, Zig is their first low-level language).

marler8997 · March 27, 2024, 4:56am

Yes I agree people can/will make this mistake. My point wasn’t to say that people don’t make this mistake but rather to share a realization I’ve had recently which is:

not all runtime mistakes/errors are created equal

You’ve got compile errors on one side of the spectrum and on the other end you’ve got runtime errors that are very difficult to root cause and fix (i.e. race conditions). Compile errors are better than runtime errors because you find them immediately. Similarly, certain kinds of runtime errors are akin to compile errors in that you almost always trigger them as soon as you run the program. I find allocation/initialization to usually fall into this category.

However, I’ve noticed there seems to be a tendency for APIs and library authors spend more time documenting and designing their types/functions to ensure that resources are created/initialized correctly and less time/effort on the parts that are much more important to get right such as ownership/lifetime/cleanup and error handling. I can make some guesses as to why, maybe it’s because people like to focus on the first part of their APIs, maybe it’s because those parts are easier so they focus on them? When I had this realization my library’s APIs starting becoming a little looser/simpler when it comes to initialization which has freed up some “complexity budget” for other concerns.

Anyway, just discussing recent thoughts/ideas. The point isn’t to say nothing should be done about making allocation/initialization easier, just providing perspective on it.

AndrewCodeDev · March 27, 2024, 4:59am

For sure, and it’s a great point you bring up. Attention is a limited resource - allocating it correctly to what matters is actually crucial to being productive in a meaningful way. I’ve faced this issue myself with software in the form of “what do I do and what does the user do?” and making sure the rolls are kept straight.

I just bring it up because we all get the curse of knowledge sometimes (hard to remember what it’s like to be a beginner again).

tauoverpi · April 18, 2024, 10:04am

I don’t think a create function should be mentioned without also mentioning other strategies for managing memory. This feels like an incomplete topic that misleads the user into making many small allocations where in the real world one often deals with many rather than a single object thus a container such as ArrayList, HashMap, or other is more likely to be used or similar. Consider the example of a User, you have many users of this service so allocating a single User to then only place it in a container supporting lookup of many suggests that User doesn’t exist in isolation or the common case.

This topic should really be merged into a more general “strategies for managing memory” as it’s easier to talk about such in the broader topic than one focusing on a rather small source of errors.

Sze · April 18, 2024, 10:44am

I think it is good to have documents focused on specific problems and I think making this one too general would make it less useful / understandable.

When somebody writes this other more general part, it can link to this doc for certain problems, and this doc can be adapted to call out that creating many single object allocations isn’t a good pattern and should be avoided if possible, this could then link to an explanation in that more general document.

tauoverpi · April 18, 2024, 1:50pm

Yes, having ones for specific issues is fine but when suggesting a solution beyond the trivial = .{} one should include a discussion about design or link to a document concerned with it rather than suggesting .create() as the typical solution. .create() is often not what a user should do thus in this case I would consider it incomplete (or bad) advice as it doesn’t discuss where you should use .create() over other options nor does it even mention any other than it.

Thus, again as with the other doc article it would be better to give a full example with discussion as to why or just not include it until such a document is written. Leading users down the wrong path is not what ziggit documentation should do and here it does so by giving a “solution” without any consideration as to what the actual domain looks like (and the example itself is wrong given users do not exist in isolation).

Also consider that this causes more work at least in the matrix room where one then has to explain that “no, ziggit is wrong here” rather than using ziggit articles as a short intro to the topic. This is why it’s currently not on the suggested resources list.