Init(), create(), and ~~new()~~ make() (?)

Can experienced zigsters confirm that this is the convention for structs in lib and app code?

  • Provide init() function for callers to initialize a stack-allocated instance.
  • Provide create(al: Allocator) function for callers to heap-allocate* an instance.
  • Provide new(al: Allocator, ...) function for callers to heap-allocate-and-initialize an instance.

Not every fancy new struct designed would need them all, but, significant in my question: a create() function, for instance, should NOT include args for initialization, and should not be named create if it’s doing initialization; rather, it should be called new()(??)

I’m not sure my exposure to enough zig in the wild is sufficient to draw conclusions like this, and, though documentation and such makes me fairly confident about init() and create(), I’m less so about new() and/or about any conventional forbiddance of initialization within a create() function.

Edit: *= or, I should say, ā€œallocator-allocateā€; I realize that, for instance, if a FixedBufferAllocator is the allocator, then such a create() function wouldn’t in fact be using the heap, but the implementation of such a create() function would be blind to that.

init is the default name for creating a new instance of a struct. Usually it is used for creating an object on stack, but this is not strictly necessary. Since sometimes, you need an instance of an object before initialize it (due to the object needing stable pointers), so you can see this from time to time

var object: Stable = undefined;
object.init();

I don’t think create has a specific meaning, the std uses it for different purposes. Like Allocator.create allocates memory for an object of some type and returns the pointer to that memory but does not do any more initialization. However, if you look at the std.Build struct, create is used for both allocating and initializing an object before returning a pointer to that object.

I have never seen new used.

5 Likes

This is actually encouraging, if it’s a passive vote against it… I don’t love that ā€œnewā€ is not an imperative verb like ā€œcreateā€ and ā€œinitā€.

My idea might be a bit quirky, but I think there’s no fundamental difference between initializing an entity allocated on the heap and initializing an entity on the stack, or rather, essentially we are still manipulating the heap by operating on an entity (pointer) on the stack.

E.g., the following two versions of the Foo struct seem like one is initialized on the heap and the other on the stack, but I don’t think there is any fundamental difference between them.

const Foo = struct {
    fields: Fields,
    pub fn init(allocator: std.mem.Allocator) !*Foo {
        const foo: *Foo = try allocator.create(Foo);
        errdefer allocator.destroy(foo);
        ...
        return foo;
    }
    pub fn deinit(self: *Foo, allocator: std.mem.Allocator) void {
        self.* = undefined;
        allocator.destroy(self);
    }
}
const Foo = struct {
    impl: *Impl,
    const Impl = struct {
        fields: Fields,
    };
    pub fn init(allocator: std.mem.Allocator) !Foo {
        const impl: *Impl = try allocator.create(Impl);
        errdefer allocator.destroy(impl);
        ...
        return .{ .impl = impl };
    }
    pub fn deinit(self: Foo, allocator: std.mem.Allocator) void {
        self.impl.* = undefined;
        allocator.destroy(self.impl);
    }
};

I only use init and deinit, except that deinit is optional (only present if needed).

1 Like

I more or less like (and use) create for structs that are explicitely allocated on the heap.

/// Engine is created on the heap.
pub fn create() !*Engine {
}
1 Like

The convention is to use init/deinit if the function returns a T or !T, and create/destroy if it returns a !*T.

Fundamental or not, this is a useful distinction to convey, and that is, by convention, how we do that.

You don’t have to follow convention, of course. You should, but you don’t have to.

9 Likes

I like this convention.

I used to get confused why some functions create things with init and sometimes create, along with new when the library is a binding from a C++ library. Your convention is easy to remember and clear to distinguish returning a struct or a pointer type after created the struct.

1 Like

If I am wrong, people will correct me; this is what I usually do. Sorry if I might have got carried away with being too wordy.


// 1. normal instances
// Use Case:
// - sizeOf(T) < 8 bytes [size of a ptr]
// - or very few instances alive at a time

var instance = T{};   // creation
instance = undefined; // false destruction (reader aid only)
                      // (optional but never pair with defer)
                      // Actual instance is only erased at scope end.
                      // Personally, I have never seen it used.

// 2. instances with inited values, function or system calls.
// Use Case:
// - Same as (1.) + if you need to init with internal checks and logic
// - There is no standard method name for this.

// eg: yours truly, std module

var timer = std.time.Timer.start(); 
var thread = std.Thread.spawn();    
// notice name is more to do with class itself than the purpose of the method.

// heap instances:-

// 1. user-created ptr
// Use When:
// - sizeOf(T) > 8 bytes and multiple instances alive at a time. 
// - to housekeep stack's storage

var inst_ptr = allocator.create(T);
defer allocator.destroy(inst_ptr);


// 2. Class/NameSpace created ptrs
// Use When: 
// - An internal check or register in place (eg: for a singleton class)
// - All the instances are on a slice on the heap or stack. 
// (can be required if some methods require iteration over all the instances)

var inst_ptr = T.create(); // internal checks if there is enough space in slice
defer T.destroy(inst_ptr); // OR: defer inst_ptr.destroy(); I prefer the first tho


// HandlerClass (an integer index that acts as a ptr):-

// Use When: 
// - too many instances are alive
//  AND total possible instances are less than std.math.maxInt(u64);

// (eg: sprites in a scene)
var sprite_hdl = Sprite.create(); // here sprite_hdl is a u32 index in internal slice
defer Sprite.destroy(sprite_hdl); // OR sprite_hdl.destroy();

// using it as .sprite() is a method of the SpriteHandle class 
// that has only one field u32: handle_index
// .sprite() implementation would be something like 
// { return &internalSlice[self.handle_index]; }
sprite_hdl.sprite().someWorkMeathod();


// when to use init() and deinit():-

// 1 - for initiating a class:
//      - when it has an internal slice on heap/stack
//      - some function or system calls to prepare the ground for instantiation

// eg: in a windowing library that needs to draw system resources,
try glfw.init();
defer glfw.deinit();

//eg: initializing the Sprite's internal buffer size;
try Sprite.init(allocator, max_sprites);
defer Sprite.deinit(allocator);

// 2 - for internal allocations in a sitting instance

var my_pack: StorageBox = undefined;
try my_pack.init(allocator, 1024*1024); // allocates 1Mb to internal data field
defer my_pack.deinit(allocator);

// as you can see, avoid init deinit for instantiation altogether.


// zen of zig:  Philosophy of Transparency Over Elaborate
// - a class is better appreciated for its simplicity and lack of  hidden fields
// - if a user can achieve instantiation through one line of allocator code, avoid 
// using create() or named methods.
// - if there is no defer destroy() or deinit() required for init() or create() then
// these methods are likely not required in the first place
// - if an init() makes no sys calls or allocations, it's likely not required

// Note - just because I say that a method is not needed I don't mean it is
// useless only that consider naming it something else like validate(), reset()

2 Likes

And this coincides well with, e.g., init() being used in various std code like allocators… (more later)

Ah! I’ve been doing that the other way ā€˜round - destroy(self); THEN self.* = undefined- is this order preferred?

Ok, votes from @npc1054657282 and @jumpnbrownweasel for ā€œinit() for allā€ and votes from @ericlang and @mnemnion and @Accipiter-Nova for ā€œcreate() for heapā€ - or, well, using @ericlang ā€˜s words. I might suggest that what we actually mean is ā€œcreate() for allocator-createdā€, which is probably why @mnemnion mentioned ā€œcreate/destroy if it returns a !*Tā€. I feel like that’s my leaning, too, though there’s plenty of nuance, even in std. For instance, even ArrayList .empty needs defer a.deinit(al) , and this is an example of deinit(al)– @mnemnion ā€˜s convention still holds up, though, because init/deinit isn’t explicitly about having/not-having an allocation in the initialization function, it’s about returning !T vs. !*T .

The benefit of setting to undefined is to catch use after free, for a stack variable it doesn’t make much semantic sense, but I can see situations where you’d want to; FYI it is an accepted proposal to fill the function stack after it returns with undefined making that redundant.

I don’t think I would do that unless I had repeated issues with some complex logic that I fear might return a stack pointer. Most of the time this isn’t an issue.

while it does break zig convention, it is done to conform to more popular convention which zig specifically suggests you should do when applicable, zig only provides vague convention for the language, it does not cater to the terminology of specific domains.

Timer.start mirrors how you think of timers as a human.
Thread.spawn is ad hearing to common OS/POSIX terminology.

the uses of handles have little to do with the number of instances, they provide a myriad of much more important benefits.

The point of total possible instances are less than std.math.maxInt(u64), makes little sense, as you have the same/technically smaller limit with pointers on a 64-bit systems.

Reaching that limit would be quite astonishing in the vast majority of programs, which is why the number of instances is not really a reason to use handles.

It’s not about syscalls or allocations, it’s about initialisation logic, which syscalls/allocations fall under.

If its just immediately assigning parameters to fields then it likely isn’t needed even if there is some slight processing, though I make exceptions if the processing is unintuitive.


other than those small things (It looks bigger because I was trying to explain thoroughly), you are just following zig convention

The big difference is lifetime / ownership management, but also flexibility for the struct user. IMHO allocation should always be separated from initialization, sort of like the old ObjC x = [[Bla alloc] init], and (ideally) not happen implicitly inside an init/create/new function (even with the concept of allocators). For instance an integrated allocation doesn’t help when you need a whole array of those structs, or when the struct is nested inside another struct.

3 Likes

I feel the general force of this, and think I mostly agree, but it seems there are a lot of ā€œsimple casesā€ where combining seems to do no harm. I think it’s important that you KNOW that’s going on (when you call the create(), e.g.), but that’s normally clear if there are initialization arguments passed in (along with the allocator, in this case). And, as mentioned, if initialization is an expensive operation, it should be communicated pretty clearly to the caller that that’s the case, and probably, in that case, be separated out into a followup function. Then you just have to concern yourself a little with the ā€œtransient state of an invalid/useless object that hasn’t yet been initializedā€ - but that shouldn’t weigh much.

Totally agree that there are OTHER cases, like this, where combined initialization doesn’t make any sense, and may not even be possible.

2 Likes

Your order is a use after free!

I thought the allocator interface did that for you, but I checked, and it only does it when allocating, to help catch if you missed initialising data.

I think it’d be nice for it to do so when on free/destroy, ofc it couldn’t prevent the implementation from overwriting that for its own purposes.


@floooh thanks for accidentally including me in that conversation lol.

I Mostly agree, but want to highlight dynamic collections as exceptions, but not really since their state, at least the part you interact with directly, is rarely allocated.

So its more of a clarification that allocating other data is fine, but not allocating it self.

Not on its own sure, but when your used to it from all the ā€œsimple casesā€ you will probably do the same with a not so ā€œsimple caseā€, that is where the harm is.

I want to point out, that the general advice we give (as a community) pushes people (indirectly) in this direction already. Not that it isn’t worth talking about, as that doesnt mean they have arrived here yet.

4 Likes

One thing to keep in mind when naming things is decl literals, i.e. what will the call-site look like?

When they were introduced, I converted a project to use them heavily and stumbled into this gem:

var compiler: Compiler = try .init(try .init(gpa), try .init(gpa), try .init(gpa));

Not very readable. It used to readable as it previously was type-qualified, such as try ConstantPool.init(gpa)

I have ended up either giving initializers more descriptive names, using extra consts which I pass in, or just stick to good 'ol type-qualified initializers when that’s more readable.

2 Likes

Using Zls inlay hints that show the parameter names also kind of works, but only for people who have that activated.

Yeah cool idea, though had to turn that off as it was annoying when editing. Perhaps other editors are better at it. Even then you’re going to read diffs, so keeping things readable seems important.

1 Like

I have <leader>th for ā€œtoggle hintsā€ (Neovim).

Good thing too because if it weren’t that easy I wouldn’t use them. They’re either extremely distracting or very helpful, no middle ground, and which is which is constantly changing.

1 Like

Sorry, brainfart - not ā€œdestroy(self)ā€, but ā€œfree stuffā€, then self.* = undefined; like ArrayList.deinit() and such. I don’t know if I’ve ever called destroy(self) internally. So, ignore my brainfart.

Ok, I’d like to hash this out a little more.

@floooh suggested the example of ā€œa whole array of structsā€ being alloc’d and initialized (with some data) simultaneously, which, to me, seems ludicrous indeed; I agree. Your comment, ā€œā€¦ when you’re used to it… you will probably do the same with not-so-simple cases, and that is where the harm isā€ - I want to respect, but it doesn’t ā€œfeelā€ like I would. So, if I have a ā€œsimpleā€ Zoo, and it has members that really don’t want default values, e.g., because data invariants of the struct could be violated by omitting a field during initialization (see here), you might provide:

const Zoo = struct {
   a: u8,
   b: u8,
   const default: Zoo = .{
      .a = 3,
      .b = 4,
   };
   fn init(a: u8, b: u8) Zoo {
      return Zoo {
         .a = a,
         .b = b,
      };
   }
};

Now, if you were creating a zoo with an allocator, and create() should really ā€œdo no initializationā€, it would perhaps at least(?):

...
   fn create(al: Allocator) !*Zoo {
      const r = try al.create(Zoo);
      r.* = default; //?
      return r;
   }

But would it be the most conventional? to require the caller to:

var myzoo = try Zoo.create(al);
myzoo.* = { .a = 1, .b = 2 };

Or would it be ā€œunconventionalā€? (inappropriate by convention?) to provide:

...
   fn make(al: Allocator, a: u8, b: u8) !*Zoo {
      const r = try al.create(Zoo);
      r.* = .{ .a = a, .b = b };
      return r;
   }

So that the user can one-liner instead:

var myzoo = try Zoo.make(al, 1, 2);

And, finally, if such a ā€œdefaultā€ (as provided above) didn’t really make sense, and neither does an ā€œuninitialized Zooā€, then would it be appropriate to ONLY provide that make() function? If so, would it be better to call it something like make(), indeed, instead of create(), in order to avoid conveying that it was a standard create() that did NO initialization? (Though, of course, the (…, a, b) parameters would make it pretty clear that it was going to initialize values.)

I don’t mean to nit-pick, but I’d like to develop routines that make sense and coincide with conventions the community has evolved… and I still haven’t spent enough time in various peoples’ libraries to feel confident drawing zeitgeist conclusions of my own (especially as the language and std are still evolving themselves and I do see a lot of ā€œnot so right-lookingā€ code in the wild, too).

(Sorry, I usually compile code before posting, like this, but I’m growing confident enough to try freehanding now, so, forgive any flubs.)

EDIT: of course, for such a simple struct, Zoo wouldn’t really need a create() fn; caller could just al.create(Zoo); first by themselves, but for a more interesting struct, of course, providing that creation helper makes sense, even if it’s ONLY doing allocation. By this token, a make() function would perhaps first call sister create(), then do the initialization (perhaps calling sister init(), such that make() might MERELY be a convenience r = create(); r.* = init();… but I didn’t think of packaging it that way when I wrote the above. The bottom-line question still remains: is this make() idea an anti-pattern that should be avoided? Is it ā€œfineā€, even to do this in a function named create(), with args for initialization, and maybe even ONLY provide such a create(al, args…) if, in your application, it makes no sense to allocate-create an uninitialized version of the object? Or is it best to make sure, regardless, that create() only allocs, and never initializes with provided values (at the very least, so that callers don’t have to worry about whether expensive stuff is being done inside of create())? Sorry that’s so many questions wrapped in one, but I’ll be very satisfied to have the guidance.