What's the idiomatic way of initializing a heap-allocated struct field that's also a struct?

I usually follow the init() and create(allocator) idiomatic ways of creating structs, but this time I’m confused about how a field of a heap-allocated struct that’s also a struct should be initialized.

I came up with a setup() function that basically initializes the struct in place because I’m not sure if the allocation will first happen in the stack then copied to the actual place in the heap, for larger buffers that’d cause a stack-overflow.

But this feels wrong (?) in a way that, if I end up forgetting to call it I’d be generating undefined-behavior down at the functions accessing the field.

const LOGGER_BUF_SIZE = 1024 * 8;

const LoggerContext = struct {
    writer: Writer,
    instance: Logger,
    buffer: [LOGGER_BUF_SIZE]u8,

    fn setup(self: *LoggerContext) void {
        self.writer = std.fs.File.stdout().writer(&self.buffer);
        self.instance = .init(self.writer.interface, .Debug);
    }
};

pub const Application = struct {
    const Self = @This();

    allocator: Allocator,
    logger: LoggerContext,

    // TODO: accept different sinks as argument
    pub fn create(allocator: Allocator) !*Self {
        const ptr = try allocator.create(Self);

        ptr.* = .{
            .allocator = allocator,
            .logger = .{ .buffer = undefined, .writer = undefined, .instance = undefined },
        };

        ptr.logger.setup();

        return ptr;
    }

    pub fn deinit(self: *Self) void {
        self.allocator.destroy(self);
    }
};

As you say, you have no allocations in LoggerContext, so I’m not sure why you would lean away from an init().

const LoggerContext = struct {
    // ...
    fn init() LoggerContext {
        return LoggerContext{
            .buffer = undefined,
            .writer = std.fs.File.stdout().writer(&self.buffer),
            .instance = .init(self.writer.interface, .Debug),
        };
    }
};

Then, I think your Application doesn’t need @This() because it’s not generic. Your .logger would just init:

              .logger = .init(),

? But I’m not sure of the motivation for setting up your Application create() with the allocator the way you do; I could see that done in a few different ways. Presumably you really want the caller to get a pointer to an Application allocated (presumably on the heap, depending on the allocator). Nevertheless, couldn’t LoggerContext be “normal”? I think the result would live on the heap, per the allocator.create(Self), IF the allocator is actually heap-allocating (e.g., is not a FixedBufferAllocator).

Having a create fn is saying this type must be dynamically allocated, there are very, very few cases where that is actually true and your Application does not appear to be such a case, you could just have Application also initialise in-place, or a secret 3rd option I will mention below.

Initialising in-place often just uses the function name init. And there is also the option of taking the buffer as a parameter, that way you don’t need to initialise in-place at all.

If you are setting all the fields to undefined you can do it more succinctly .logger = undefined


I just noticed instance is also self-referential, at least I assume you meant to pass a reference to the writer interface, if not you have illegal behaviour!!
You likely can remove that self reference as well, but since you haven’t shown the source of Logger I can’t tell you how.

Also FYI, std.log is a global logging API that works across (zig) dependencies (assuming they use the api). You can override how it works, see its docs for more info.

2 Likes

The motivation is not knowing the exact behavior at runtime, I’m new to Zig and I’m in the process of reshaping my way of thinking (coming from higher-level languages). I wasn’t sure if LoggerContext would first be stack-allocated then copied to the heap or if the compiler would emit instructions to place the initialization directly into the heap-allocated struct.

I was assuming the latter, but I’m still trying to figure out how I can verify that, probably reading the generated assembly.

Application is to keep the application state, I started by heap-allocating it because it’ll grow into a big structure later, I’m doing step by step to learn the language that’s why it looks that small for now. I’m also considering static allocation for it, but I’m still in the process of understanding the tradeoffs.

Thank you for the information about the logger, to be honest I didn’t go too deep into checking the logging patterns in Zig, I was attempting to implement a buffered out-of-thread logger at the same time understand how Zig handles async IO, I understand why the code might look confusing, sorry for that!

Ok, if that’s all there is to it, and armed with @vulpesx’s confirmation of my suspicions, I think what you really want is

pub const Application = struct {
   //... (just fields here; no need for Self = @This(), in this case)
   pub fn init() Application {
      return Application {
         .logger = .init(), // or replace with std logger use, more likely
      }; // and consider NOT holding onto an instance of Allocator....
   }
};

then, significantly, when you want your application, and you want it on the heap:

// somewhere in main.zig, presumably...
var dba = std.heap.DebugAllocator(.{}){}; // or choose your poison
const gpa = gpa.allocator();
var app = try gpa.create(Application);
defer gpa.destroy(app);
app.* = Application.init();

Sometimes one will prefer an .initAlloc() or .create() (member) function, to wrap this into one call (and then that member function would do the allocator.create() and you’d return a *Application, like you started with) – I think this is often done out of fear that somebody will create the instance and fail to initialize, and we wouldn’t want to promote that, right? But, part “trust your callers” and part “keep it simple” discourage me from overworking it in that way. Default values on struct fields could also help with “uninitialized fields” worry, and it’s often good to have a decl like .empty or .default that is a “zero” version, but, if you were really wanting your calling code to create your App on the heap, then this would just give you a different second step instead of an .init() call, and you already don’t have any initialization arguments, so there’s no strong indication that one way is better than the other… but I’d tend toward a .empty decl if there are truly no initialization args to pass in.

4 Likes

I think that’s a very interesting question for newbies like us. I think it might be worth looking into “result location semantics”. Surely, if you instantiate an instance on the stack, explicitly (and can “use” it thus), and then copy it over to a heap-allocated instance, then you’d have “a version on the stack”, first. But I’m guessing that the geniuses who are making this compiler are taking pains to not generate needless kruft, so, if set up idiomatically, I think you can expect no giant (temporary) stack blobs. E.g., I think

   giant_app.* = Application.init();

is safe from the possibility of Application.init() consuming a bunch of stack space since the result location is giant_app.*, on the heap.

But somebody with more experience than I can tell us if I’m wrong about that.

I had an extra pinch of time this morning, and was recently encouraged to try godbolt to see “stuff”; I made a simple struct and, in one variation, just init’ed it on the stack, and, in a second variation, heap-allocated more-or-less according to the pattern I presented above. Honestly, I don’t yet know how to interpret the results properly to answer your question, but perhaps somebody does. Here’s the side-by-side godbolt(see edit below) - hopefully it “shows” for you the way I set it up, with the two versions of source top and bottom left, and the assembly and compiler outputs center and right (right corresponds to bottom). The results aren’t quite the apple-to-apple I hoped for, so, again, somebody smarter will have to weigh in. Indeed, even if everybody and their aunt said what I’m saying, that you’re “stack safe”, here, it’s nice to think that this is verifiable, with the right skills and tools.

EDIT: OOPS, I think that godbolt was faulty in that: since nothing meaningful was ever “done” in my stack-based version, the entire main() was elided away! Here I’ve added a debug.print() (probably not the best choice, actually) to the bottoms of each, to make sure the compiler deems it worthwhile to keep the code! This makes comparison a little easier, and I think I might see useful intel now… though, again, I’m going to defer.

First thank you for taking the time to give such a detailed response!

I ended up using Zig’s standard library log at the end, I’ll follow the patterns for now while learning more of the language and building fun things. I was curious when you mentioned to not have the allocator instance as a field and I thought that was also common but didn’t understand why, after some research it seems I was wrong about that as well.

I’m experimenting with Godbolt as well as using objdump to learn reading the assembly to better answer those questions myself, as you pointed out it’d be great to have someone more experienced giving some pointers.

It occurred to me that yours might be one of those cases where “containing” the allocator makes sense - if your idea is that this Application struct is a kind of singleton, then it may be that the allocator wants to live in it. However, you’d probably just make it a member field and instantiate it yourself within, not have a caller “pass” it in, if the caller is just a 1-line main that makes this Application and fires it off. (Because then the main() wouldn’t have any other use for the allocator). So… anyway… not knowing more about what you want there…..

I consider this an overstatement. It’s fine to have a create / destroy pairing if it happens in practice that something is going to be heap allocated. The existence of create, and the nonexistence of setup or init, that’s a pretty strong hint that the heap is mandatory.

Even then, reading into negatives is a recipe for misunderstanding. The best way to ensure that something which must be heap allocated, is? Is documentation.

Welcome to the forum @gbyte26!

I think it’s best if generic-named (so init, create, and setup[1]) initializers don’t do implicit things like fetching stdout. They should receive everything they need, and have the type of return value implied by the name: *T for create, T for init, and void for setup.

An inner struct shouldn’t be any different than the outer one, in terms of what names mean and how they work: the outer struct initializers need to receive what the inner struct initializers expect, so they can be passed in accordingly.

So giving Application another initializer called something like setupDefault or whatever, that pulls in the implicit defaults and then calls setup with them, it’s nicer.

Particularly for something like a file handle, where it seems very likely to me that you’ll eventually want to stub in /dev/null or a tmp file at some point.


  1. TigerBeetle is influential enough that I consider this one an idiom, even though the stdlib doesn’t use it at all AFAIK. ↩︎

3 Likes

:slight_smile: if that thought seems to contradict what @mnemnion just said, then

  1. Favor his experience over mine, for sure, and
  2. I believe he’s referring to “normal”, where I was referring to the possibility that you could have a very abnormal (singleton) kind of design in mind, which might motivate doing things a little differently than the normal idioms. I’m not a fan of such, at least in zig, but I didn’t want to imply it’s out of the question. I certainly concur with @mnemnion that “they should receive everything they need” (in the arguments of the init() or create() or whatever, that is)… normally.

No, it cannot be done this way. LoggerContext is a “self-referencing” structure, and its internal member writer references another internal member buffer of itself. In-place construction here is obviously not feasible because it uses a non-existent parameter self. And the hypothetical approach of Op to construct on the stack and then copy is also not feasible, as doing so would lead to stack pointer references UAF.

I once thought that it was necessary to use a reference-to-result-location pointer to achieve in-place construction of such structures. But now I have changed my mind: the current situation is the best, because such “self-referencing structures” are very dangerous, and they absolutely cannot have ownership transferred through copying. The way they are constructed can help users notice this. Structures that can be constructed in-place can transfer ownership through copying, while structures that contain self-references and cannot be constructed in-place cannot do this.

There is a proposal for Pinned structs (Proposal: Pinned Structs · Issue #7769 · ziglang/zig · GitHub) that is meant to address these sort of issues with self referential structs.

Just take a pointer to a writer interface as a parameter, instead of storing and creating it as a field, then you don’t even need to store the buffer either, since the writer has a pointer to it.

Seriously!! There are very few cases where a self-referential type needs to store the data that is causing it to be self-referential in the first place!!! It doesn’t need to be self-referential.

2 Likes

That proposal was un-accepted as the main purpose (async frames) is no longer in the language, and it’s not difficult to make a runtime abstraction to safety check “pinned” data.

1 Like

Thanks for the welcomes @mnemnion!!

Getting stdout in this code fragment was temporary, ideally a writer would come as argument. But this led me to think about other structs in my code base, one of them is the compositor where the init() function returns the Compositor struct after successfully connecting to the display and getting a registry (Wayland stuff).

I can understand how this is considered implicit behavior, though the two fields in my type would need either separate functions to initialize them or receive them as arguments. Is that how you’d approach it?

I’ll add the code here for context.

pub const Compositor = struct {
    display: *wayland.wl_display,
    registry: *wayland.wl_registry,

    pub fn init() CompositorError!Compositor {
        const display: *wayland.wl_display = try connectToDisplay();
        errdefer disconnectFromDisplay(display);

        var compositor = Compositor{ .display = display, .registry = undefined };
        try compositor.setupDisplayRegistry();

        logger.info("Compositor successfully initialized", .{});
        return compositor;
    }

    pub fn deinit(self: *Compositor) void {
        disconnectFromDisplay(self.display);
    }

    inline fn connectToDisplay() WaylandError!*wayland.wl_display {
        const display: *wayland.wl_display = wayland.wl_display_connect(null) orelse {
            logger.err("No display available for connection", .{});
            return CompositorError.DisplayUnavailable;
        };

        logger.debug("Successfully connected to the display", .{});
        return display;
    }

    inline fn disconnectFromDisplay(display: *wayland.wl_display) void {
        wayland.wl_display_disconnect(display);
        logger.debug("Display disconnected", .{});
    }

    fn setupDisplayRegistry(self: *Compositor) WaylandError!void {
        self.registry = wayland.wl_display_get_registry(self.display) orelse {
            logger.err("Failed to get registry from display", .{});
            return WaylandError.GetRegistryFailed;
        };

        logger.debug("Display registry acquired successfully", .{});

        const listener_return_code = wayland.wl_registry_add_listener(self.registry, &registry_listener, self);
        if (listener_return_code == 0) {
            logger.debug("Registry listener successfully added", .{});
        } else {
            logger.warn("Attempted to add a duplicated registry listener, this listener will be ignored", .{});
        }
    }
};

EDIT: just for better wording of things, English is not my native language

1 Like

A competing approach: the Writer does not store the buffer slice internally, but instead requires the Writer to input the buffer slice as a parameter each time it executes an API.

This is similar to not storing the Allocator interface within the structure: it avoids having pointers inside the structure and instead passes pointers through parameters, reducing the potential self-referencing issues when packing multiple structures.

I think the various solutions are similar, changing the stored pointer to a passed pointer. Since Writer itself stores a pointer, it is regarded as something that should only be passed rather than stored. Alternatively, considering that writer retains state, its current storage of the buffer slice should be improved to passing it.

std.Io.Writer is a part of the std lib, we (at least I) are talking about changes to Op’s code, not std.

The reader and writer APIs are agnostic to the buffering, and if they are always going to be passed around with each other why not make it a field of the interface.

They have no operation that does not depend on the buffer in some way, that is not the case for dynamic collections which only need an allocator where there is potential to grow, it’s entirely valid for an API to not have the allocator to prevent it from doing any allocations with the dynamic collection.
But the reader/writer will always need the buffer.

Being transparent about buffering is kind of the antithesis of those interfaces, they are supposed to handle it all for you.

2 Likes

Indeed! Sorry, I obviously free-handed and didn’t compile or analyze thoughtfully. I agree that “There are very few cases where a self-referential type needs to store the data that is causing it to be self-referential in the first place” (@vulpesx), and felt funny about some of both struct’s members (including the allocator), but didn’t take the time and care to notice that self-reference.

I agree that passing in the writer, already set up with the buffer by the caller (and probably putting your LOGGER_BUF_SIZE immediately above that, rather than at the top of your file (unless you have a bunch of consts like that and a workflow that involves manually twiddling them a lot), and NOT passing in the buffer, NOR storing the buffer within your LoggerContext, would be more likable. Often, one can do some field setup prior to the return LoggerContext { block, but you can’t do that in this case (e.g., you can’t create a local Writer, and then assign .writer = writer), so you really have to change the interface altogether and get rid of the self-references. I see now that your setup() approach basically “solved” this, but I agree that that’s not the best solution, for several reasons.

Yes, my read on this: your Compositor init()/ deinit() reads much more naturally. I always get nervous around managing pointers like *wayland.foo - it looks like they come from elsewhere, though; looks like there’s a handler that must deal with *wayland.wl_displays ownership, but does that other code clean up the *wayland.wl_registry somewhere, assuming it’s heap-created by that other code (even though an allocator is not provider for such)? (I’m guessing it’s memory management done down in C code or something?) If you have intel like this, because you know how that wayland lib/struct works, it might be worth a quick comment that reminds yourself and others that there really is nothing else to do with those two pointers at deinit() time. Great job remembering the errdefer in init(). You could put an errdefer comptime unreachable; in after your self.registry assignment at the top of setupDisplayRegistry, if you want to clarify that nothing else after that, in the function, can return an error. There’s not a lot that happens after that, but… it’s an option, and a habit that can be worth getting into if you front-load error-returns like that.