Simple function chaining

I tried to thin this to the bare bones; please help me with what I’m missing. Given (the somewhat ludicrous, now):

pub const Foo = struct {
   const Self = @This();
   pub fn bar(self: *Self) Foo { _ = self; return .{}; }
   pub fn bam(self: *Self) Foo { _ = self; return .{}; }
};

I can do this:

   var foo = Foo {};
   var f1 = foo.bar();
   var f2 = f1.bam();
   var f3 = f2.bar();
   _ = &f3;

But I can’t do this:

   var foo = Foo {};
   _ = foo.bar().bam().bar();

I get error: expected type ‘*main.Foo’, found ‘*const main.Foo'cast discards const qualifier

(0.16 master, if it matters)

2 Likes

Hi ! This is completely intended !

When chaining function in your case, you are asking in your function to pass a *Self pointer, that is a pointer to a mutable value. However, when immediately using return values from functions, the value is immutable.

For example the following would not work :

const foo: Foo = .{};
var f1 = foo.bar();

// equivalent to
var f1 = (Foo {}).bar();

In your case, you would need to modify your function to take a const pointer/value :

pub const Foo = struct {
   const Self = @This();
   pub fn bar(self: Self) Foo { _ = self; return .{}; }
   pub fn bam(self: Self) Foo { _ = self; return .{}; }
};

Parameters are by default constants in zig

However, if you need to still mutate data and do function chaining, you would need for example to hold a pointer to mutable data internally, but that is not the zig way :smiley:

5 Likes

I will clarify with for example a Java style factory, the “zig way” would be :

pub const Options = struct {
  allow_a: bool = true,
  allow_b: bool = false,
};

pub const Foo = struct {
  has_a: bool,
  pub const init(options: Options) @This() {
    // init code
    return .{
      .has_a = options.allow_a,
    };
  }
};

If you still need function chaining, do :

pub const Data = struct {
  allow_a: bool,
  allow_b: bool,
};

pub const Foo = struct {
  data: *Data,

  pub const init(allocator: std.mem.Allocator) @This() {
    return .{.data = allocator.create(Data) };
  }
  
  pub const deinit(self: @This(), allocator: std.mem.Allocator) void {
    allocator.destroy(self.data);
  }
  
  pub fn allowA(self: @This()) void {
    self.data.*.allow_a = true;
  }
};
2 Likes

Ah, right, of course. My mistake - the * (mutability) derived from the “real” code, but the purpose is actually avoidable (mutability is not necessary).

For completeness, I guess it’s worth pointing out that another “solution” exists. IF mutability is not the aim, but, rather, avoiding expensive copy* is, then

 pub fn bar(self: *const Self) Foo { _ = self; return .{}; }

*- I appreciate that it’s the compiler’s job to decide whether to copy or just (const) reference, and it’s supposed to do that well, so I shouldn’t have to feel the burden of using const* to avoid an expensive copy. At least, I think that’s the right attitude I’m to take.

2 Likes

Yeah definitely ! The compiler will decide in and on itself if the value should be copied, moved or referenced to (it depends on register size, value size, calling conventions etc.).

You should only do *const Self if you want to specifically do pointer arithmetic on the reference, for example with a @fieldParentPtr.

This also exposes another danger in return .{}; - I think this “works” here because(?) this function will be inlined; if this was potentially a “return of a local, which goes out of scope”, that would be bad. The original code (from which I derived this) was explicitly inline and ONLY consisted of return .{stuff}, but I see that my simplified example code looks more suspect. Oops.

1 Like

Returning a local is never a problem if :

  • it doesn’t contain a pointer to a local
  • it is not a pointer to something local

ie

const local: usize = 42;

return local; // ok
return .{.value = local}; // ok
return .{.value = allocator.dupe(local) }; // ok
return &local; // UB, in master, returns undefined
return .{.value = &local}; // UB

only partially related, but as an aside: I don’t think you need @This in the OP’s example since you don’t have an anonymous struct. You could just use

pub const Foo = struct {
   pub fn bar(self: *Foo) Foo { ...

This doesn’t affect the mutability problem of the OP.

2 Likes

Or you can return the struct as a mutable pointer, *Foo in this case.

That works, but it doesn’t mesh with the classic “builder pattern” because you cannot simply create something and start chaining it, you need a var to take a pointer of first.

If it’s a heap pointer, you can get away with this (awkward) approach:

const my_foo: *Foo = (try allocator.create(Foo)).build1(fee, fie).build2(foe, fum);

Which I do not endorse or recommend, but it works.

Zig’s conventions just don’t favor “chain as much as possible”. Which doesn’t suit everyone, but suits me just fine.

2 Likes

Oops, I was conflating my trimmed posted code with my original code again, and, even then, I was confusing unrelated things; sorry for the noise, but your post is a good reminder, of course.

Ah, great! Actually, my “real” code is @This-less, but I hadn’t worked out the real value of @This, yet (for anonymous structs), and I thought I’d make my posted code look more like I’ve seen other code with @This - must’ve been all anon structs but I missed the pattern. So you answered a question I should have asked. I see the standard doc does say “This can be useful for an anonymous struct that needs to refer to itself“ right near the top; just missed it. Lots of good things learned in this thread, thank you all.

This is the case that I was thinking about in my follow-up about returning references/pointers to locals that will go out of scope. I know you can do what you say, here, but remind me again where the gotcha is? When do you have to be careful about NOT doing this, lest it turn into a UB case? Or am I forgetting that wrong?

your (try foo).build1(… illustration brings to the surface another niggle in this thing I’m trying to do – I have a case wherein those chained function calls would like to return !Object instead of Object, but chaining trys is ugly. I noticed that long ago (2020), trychain was proposed - it looks like it was boo’d, even by the OP eventually, though I didn’t see lots of detail; the issue was closed (as completed?!) but I don’t think anything was ever attempted, as “trychain” doesn’t exist. I can think of some downvote reasons, too, but found it curious as I considered my interest. It may just be that, if my interest really depends on chaining, especially if the chained functions need to do allocations and (or otherwise) might fail, then it’s just not a good fit or zig, or I’ll need to find another way to express that structure building; something other than function calls.

You must not return a pointer to stack memory from a function, because that pointer is no longer valid. So if an initializer returns a *Foo, it had better take an Allocator so that the pointer lives in heap. By convention we call those create btw.

It’s fine to return a Foo, though, because that ends up in the result location: conceptually, the struct is built inside the function call and copied back to the variable (or ‘place’ at least) where the result goes. The optimizer might elide the copy but it’s better to assume it won’t.

A Foo all by itself can only coerce to a *const Foo, not a mutable *Foo:

const my_foo: Foo = .init(stuff).doThing(more_stuff);

This will only work if doThing has the receiver types Foo or *const Foo. Any parameter type which is not a pointer is immutable, and this is consistent with that.

So you’d have to do this:

var my_foo: Foo = .init(stuff);
_ = my_foo.doThing(more_stuff);

Not so convenient! Zig does not encourage or reward chain-heavy code. I think of that as more of a consequence than a design goal.

It’s always best to work with the language. If you want to write something as chained code, but the process is cumbersome, ugly, hard to understand: maybe don’t do it that way.

var foo: Foo = .init;
try foo.preheat(allocator, param1);
try foo.doTheNextThing(allocator, more, stuff);
if (foo.readyForAction()) {
   // ...
}

If you absolutely, positively must write this kind of code as a series of method chains, you’ll need to use another language. That’s just how it is. But I guarantee you: the CPU? Doesn’t care. Users? Don’t care. Do you care?

1 Like

I’m sorry, this was too obvious. I misinterpreted; of course returning a *Foo will need an allocated Foo, rather than returning a reference to a local-stack Foo. But it’s good to spell that out for me anyway, perhaps. However, what I was trying to get ahold of was an “inline” case… I can’t put my finger on the message that highlighted this for me a week ago… but something like: you can(?) return a reference to a local-scope stack var IF the function is inline, because(?) then the var isn’t actually scope-bound? … but it’s not a good idea anyway and may be UB and may eventually compile-time (or run-time?) detected and disallowed….?

But MAYBE it was what you mentioned here about the result location, and how a struct copy might be elided if the compiler can do so, “but it’s better to assume it won’t” – is it possible that inlining the function guarantees or increases the chance that that copy is elided?

This all seems like pretty basic stuff, in a way, so thank you all for you patience; coming back from python-only-land for many years requires reflection on old basics.

In my case, it’s not that chaining is absolutely required, but that it models the higher-level domain better; this would be a case of: (lib) users DO care, because they’d find the ergonomics familiar. But there’s no need for me to force, ultimately; the exercise is good for the brain, and the “not a fit” result is a fine conclusion, or I might be inspired to come up with a better creative way to appease the ergonomic cause without over-bending zig to a bad fit.

1 Like

There was some discussion about whether inlining should be considered a case of stranding memory. A good argument can be made for both sides, recently the Zig compiler will tag that as a memory violation, so that’s the answer to the question. Before that change, in practice, one could get away with it.

I’m a bit conflicted, but it’s probably the better choice. I means there is no need to provide caveats about inlining when explaining that you can’t return function-local memory from a function. It also means no one will strand a pointer by removing the inline keyword, which is all to the good.

Amortizing for familiarity, Zig library users will expect the ergonomics of a Zig library, and function chaining is not especially ergonomic in Zig.

It’s nice in other languages, but it isn’t like Zig is deliberately making it not-as-nice. Rather, other decisions Zig has made about how to model memory, errors, and so on, plus precepts like “one obvious way to do things” and so on, just has that effect. Changes which only have the effect of making function chaining more pleasant are unlikely to happen.

3 Likes

Right, though I guess removal of the inline could still result in the memory violation flag, as currently happens (as you point out). But it is nice when the world has fewer exceptions that aren’t strictly necessary.

Right, of course; makes perfect sense. Doesn’t trouble me, in the end. I think there’s a good chance I’d like zig less (in certain ways) if this particular pattern were “easy”, but it made others harder; the things zig makes nice are pretty nice, so it’s not hard to let go of some patterns that aren’t natural.

1 Like

A couple of years ago there was a topic, just in case.

Yes, thank you for referencing that - I’d read it, and just re-read it. There are some important meatballs in that dialog. There’s also the fluent showcase and conversation that makes foundational use of chaining. In many cases, chaining seems to work well for “operational” ordered-step functionality, like matrix manipulations. As @mnemnion points out, such could be done in order-step by foo.a(); foo.b(); foo.c(), as well, and, in the case of zig, which doesn’t do exceptions, this could be the ONLY way to go if you have error potential in any of the operations because try-chaining is not pretty. Additionally, if any of the operations requires allocations, since an allocator needs to be passed, and can fail (resulting in a return error again), chaining doesn’t look like a nice option. The use case I was considering DID potentially involve allocations along the way and DID (also) involve error-potential along the way, so chaining was not a good option. My case was more of a “builder” pattern, though - not quite akin to matrix manipulation chains. It’s good to feel out the right “fit” of a language and a given pattern, and to not force fits, and this was a good fitting-room exercise.

3 Likes

With enough effort, if you care about it, you can make it possible to write code like this:

const e_thing = Thing.create(allocator)
   .build1(stuff, more_stuff)
   .keepBuilding(even, more, stuff2);
const thing = try e_thing.ok();

The implementation is left as an exercise for the curious reader :slight_smile: . This is called the “railroad pattern” if you want to search around.

Well, I might have abandoned the idea, for better approaches, but I’ll take a look… can you comment quickly on whether this falls apart if build1() and keepBuilding() have to be able to return !Thing?