Thoughts on "Go statemement considered harmfull"

lisael · April 23, 2024, 3:12pm

Notes on structured concurrency, or: Go statement considered harmful is a piece introducing the design of python’s trio async lib. The main talking point, as the title suggests is that Go’s go (and every other async IO semantics around, for that matter, e.g. async/await ), just as goto, breaks abstraction, local reasoning, RAII (and all types of resource cleaning, like defer/errdefer, and error handling.

trio solves this using a so called nursery context. A nursery is the only way to create an async task, and such a task must terminate within the lifetime ( == the context, in pythonic words) of its parent nursery. The nursery can be passed around, even to its own child tasks so they may spawn its own tasks. nurseries can also be nested. They also deal with exceptions, by default they cancel every tasks, but the user may provide its own implementation of a nursery to customize this (and other) behaviours.

I find this concept quite aligned with Zig’s philosophy. Local reasoning and explicit abstraction, the nursery is explicitly passed around and is an abstract interface that may be implemented by the user, but one or more sensible implementation exist in the standard lib, just like allocators. It’s an elegant solution for the The cancellation problem. Python uses the with construct to bind the lifetime of the nursery, this can be emulated in Zig using a block and a defer nursery.join() that would wait for all the tasks to terminate and check for errors. The stdlib could also provide a testing nursery that fail the test if it’s not properly destroyed, mimicking std.testing.allocator behaviour.

What do you think ?

FObersteiner · April 23, 2024, 4:56pm

if I put on the hat of the go programmer, I find a lot of “click-bait” in that article Setting goto equal to go? Hm… Anyways, I’m not sure I’m getting the point here. Keeping a context of some concurrent tasks seems fine, but from my experience with go, the problem isn’t waiting for stuff to finish (we have channels, wait-groups, …), it’s the stuff that doesn’t finish properly - or: how to kill a goroutine? how to not leak goroutines? How does a lifetime/context help here, unless you can terminate it somehow?

gonzo · April 23, 2024, 4:58pm

This sounds a lot like structured concurrency, about which I read this article a few years ago: 250bpm.

dimdin · April 23, 2024, 5:57pm

Hello @lisael, welcome to ziggit

I failed to understand what the author is talking about.

Sze · April 23, 2024, 6:52pm

From my understanding of the article, you would cause an exception within one of those “nurseries” and that would cause it to shutdown itself and all subtasks, bubbling up the exeception until it gets caught somewhere. So I guess in a Zig context an unhandled error would cause the nursery to shutdown and the subtasks.

It seems to me like this could be a nice way to structure things, if it is well integrated with the language. In terms of lisp lingo I think the idea could be described as context objects that constrain the dynamic extend of asynchronous sub tasks to a lexical scope. It reminds me a bit of rackets custodians and also what the article mentions, erlangs nested supervisors (although I think they might be quite different on a detailed level).

I was wondering a bit, whether the tree structure would have any bad restrictions on the allowed possible programs, but considering that you can add new tasks by essentially passing a reference to the nursery and then reading from a queue creating a subtask for items if wanted, it seems you can do pretty dynamic data dependent things within it.

To me it seems like it could be a useful way to organize high level control, supervising and possibly restart behavior, or maybe even redundancy. In that sense it seems similar to how erlang seems to work (I mostly have some anecdotal / theoretical knowledge of erlang). And also could be useful if your problem happens to be processed in a tree structure.

I wonder whether certain graph problems would cause this organization pattern to become useless / (not helpful in the context of the algorithm / have an impedance mismatch) until the graph part is solved (by using a different set of mechanisms, to coordinate and syncronize on progress).

It seems that graph like things, would either have to use other ways to syncronize or map things in weird ways to add an indirection, or they would be expressed in terms of one specific way to walk the graph (with caching which node has already been visited) which would basically flatten the graph into a tree.

some more thoughts

I think having some level of nested restart-ability could be interesting in many programs
I think certain sub problems may still benefit from other ways to synchronize internally, until the sub-problem is solved
this reminds me a bit of Enter The Arena - Ryan Fleury, that uses tree like life times for nested arenas
if you could use this to express (memory, cpu, file-system, network, etc.) usage constraints it would be especially nice
this seems somewhat similar to the plans of making io operations require an interface similar to allocator
with the added detail that, this nursery interface allows you to group a bunch of sub tasks in a standardized way
this could potentially allow you to implement specialized nurseries that could do different things, instead of everyone manually having to join/shutdown/inspect/resource-limit/etc. sub tasks in ad-hoc ways in their programs
I imagine a standardized way to limit resource usage / access to things, would be good for webservers and similar programs that potentially communicate with a lot of different clients, or libraries that are intended to be used in these situations
if you can start all the things through the io interface, but there isn’t a way to potentially stop them, that would seem incomplete, making the case for having something, that also is able to shutdown things
that said, I also could see the argument for saying that Zig operates on a lower level, where you have to handle these things yourself and if you want higher level, that is an application level protocol that needs to be followed

FObersteiner · April 23, 2024, 7:05pm

to keep in mind: trio is a Python library. Python does magic. Based on C

Sze · April 23, 2024, 7:42pm

I don’t quite understand, are you jesting that python is basically a c application level protocol with a bit of a custom syntax frontend?

I meant, I am not sure what exactly Zig sees as part of the scope the language and/or the standard library should have opinions about. It seems to me that Zig wants to stay minimal to some degree and stay agnostic to how things are structured, unless there is some significant benefit to making it part of the language design.

You always can make things higher level by adding more abstractions, but ideally the stack stays very shallow to avoid introducing unnecessary things. At the same time there is benefit if multiple libraries and projects can agree on some common abstractions.

FObersteiner · April 23, 2024, 7:56pm

I just had a very similar though, so I felt like making that stupid comment

I wouldn’t say it that drastically, but after all, Python hides a lot of complexity and details from you. I use Python on a regular basis, and I enjoy the “high level”. What bogs me (still, a bit) is that article that tries to sell a Python library by arguing against goto and the go keyword. It can be a good discussion starter anyways. I’m excited to see where concurrency goes in Zig. To be clear, my experience with Zig and low level programming in general is way too small to argue for or against adding abstractions in that direction. But from what I have learned so far, it’s definitely not the Python level that Zig is aiming for.

Sze · April 23, 2024, 8:35pm

I agree. I think we need to leave behind the idea of “there is one correct way to do things”, I think there are many different useful computation models and their usefulness changes based on the problem you are trying to apply them to.
Picking a computation model that fits the problem can drastically simplify or complicate how the program is structured.

But picking a very high level model like for example providing a description to a constraint solver, that may be very terse and descriptive, may not necessarily lead to an implementation that can be executed efficiently. Ideally an ecosystem allows you to approach problems from many different perspectives that can be combined in different ways, or even lead you down a path that provides helpful tooling, to go from a high level description to a low level efficient implementation.

AndrewCodeDev · April 24, 2024, 1:52am

I’m also trying to stay out of commenting on this at some level because of the reasons you stated… but here’s the thing… in 5 years, we may start seeing articles like “nurseries: considered harmful”. It wouldn’t surprise me.

The only other thing I’ll say here is that I’m not convinced anyone has the “right idea” when it comes to asynchronous programming. There’s better and worse ideas, better and worse abstractions, etc… but it strikes me that this issue is more fundamental than opting in for a new abstraction. Maybe I’m wrong and this will prove me wrong (I’m happy to be wrong here).

The thing is, I’ve seen C++ go down this road for years but in other ways. There’s some fundamental language problem x and they invent library object y to solve it - sometimes it worked, sometimes it didn’t. Often we ended up with a new library object that has its own issues.

LucasSantos91 · April 24, 2024, 12:16pm

The article is 6 years old. I think the reason we are not seeing articles like you describe is because the idea the author proposed didn’t catch on.
I don’t see what’s so magical about what he is proposing, anyways. “Nursery” is just a weird name for a wait group. It just enforces that when the wait group goes out of scope, it joins all the tasks.

lisael · April 24, 2024, 1:02pm

It just enforces that when the wait group goes out of scope, it joins all the tasks.

That’s why it’s a different beast, and is more useful than wait groups. Wait groups are just a synchronization tool. The article argues that nurseries are more of a control flow primitive.

Where it shine is that, in a nursery-based async language, you can call any function from any library, and be sure that as long as you didn’t passed them a nursery, when the function returns, all async operations that this function started are finished. If you did passed them a nursery, then they may not be finished, but you know, locally, when they will be : when the nursery you passed goes out of scope. Same goes with resources you passed to the function. An open file can be safely closed after it returns, because you know it’s not borrowed by any living async function, and so on. You get local reasoning about all those things, which is nice.

lisael · April 24, 2024, 1:07pm

trio is still around, has attention from the python core team (Guido Van Rossum often comment and brainstorm in trio github) and many libs try to be trio/asyncio compatible. It’s far from being an obscure study subject.

matklad · April 24, 2024, 1:33pm

Note that the idea here is “structured concurrency”, not “nurseries”.

The following Rust code is perfectly concurrently structured, without any nurseries in sight:

async fn write_to_disk(message: &Message) { ... }
async fn send_to_peer_replica(message: &Message) -> String {}

async fn persist_and_replicate(message: &Message) {
    let fut_db = write_to_disk(message);
    let fut_net = send_to_peer_replica(message);
    tokio::join!(fut_db, fut_net).await;
}

I don’t know what’s the Zig’s intended async design is, but it looks like that it’ll be similar in a sense that lexical scoping of async frames would correspond to the dynamic scope of the corresponding concurrent processes at runtime.

nyc · April 24, 2024, 6:27pm

tha’s what i was thinking. sounds like a wait group with more steps. If it was lexically scoped and enforced by the compiler that’s just a retread of structured concurrency. dynamical grouping just seems like a wait group that does group.cancel_all() all the way up the error return.

Most concurrency models are pretty flat, so I’ve never really seen it used much beyond what a wait group can do either.

FObersteiner · April 25, 2024, 4:12am

emphasizing this; I think Martin Sústrik’s blog posts on the topic are a nice read (https://250bpm.com/, section “Structured Concurrency”). Turns out he made a C library, GitHub - sustrik/libdill: Structured concurrency in C. He calls it a hobby project somewhere, and it hasn’t been updated in years. Should be worth to take a look anyways. He calls “nureseries” just “bundles” btw., but says that both names are horrible. From what I understand, it’s not just wait-groups, it’s more like wait-groups with kill-switch, or “graceful shutdown”.

Independent of technical details, the question to me is does this abstraction make my life easier in some way, after some getting used-to? I might be just stubborn here; my first programming experience was hacking GOTO into my so-called “graphical” calculator… maybe this is why I found go’s concept of concurrency so intuitive. Paraphrasing Martin Sústrik: if an abstraction makes a good story, it tends to help us humans to handle complex problem (hello, OOP).

Calder-Ty · April 25, 2024, 5:21am

Funny enough, this article is on the same issue thread about the cancellation problem, linked to in the 0.12 release notes.

FObersteiner · April 25, 2024, 6:05am

so is this really considered a (short term) goal for a language feature? I mean, I can see the importance, but why not have a 3rd party library implementation first, then integrate that into the language? For comparison, I’m thinking about PR #19549, which is a much less complex topic in my eyes, but AndrewK’s position is clearly “3rd party implementation first”.

lisael · April 25, 2024, 9:21am

The huge difference with a wait group, is that you must explicitly have a reference to a living nursery to create a concurrent task. This re-allows local reasoning and flow control, and allows strong assumption about the functions you call. Traditional Async IO systems broke abstraction and the black-box reasoning about your code. In this snippet, I know everything I need to fully understand the flow of the code and the use of resources:

// pseudo-code
{
    // initialize a resource
    var f = File.open("/tmp/example");

    var n = Nursery.new();

    // in a real world scenario, we would probably use defer, taking advantage of the
    // execution order (or using scopes/blocks):
    // defer f.close();
    // defer n.join();

    // the syntax here is arbitrary.
    var t1 = n.start_soon(third_party.foo(f));
    const result1 = await t1;

    // At this point, we know for sure that we could safely close the file, 
    // because no nursery can leak outside of third_party.foo(), and therefore,
    // any potential concurrent task in foo() has reach termination.

    const result2 = await n.start_soon(third_party.bar(n, f));

    // here, we shouldn't close the file, because we passed bar() a nursery.
    // Some concurrent tasks may still run, that need the file.

    n.join();
    // now, we're safe;
    f.close();
}

Wait groups won’t give these levels of guaranties, because they are optional, not a fundamental, required primitive of concurrent programming.

That said, Zig is not Rust, and guaranties are are not as strong, but this pattern mimics allocators. A well-behaved lib shouldn’t allocate memory themselves and accept an allocator where it’s needed. A well-behaved lib should always clean the nurseries they create, and accept a nursery if they need to leak concurrent tasks.

The next good part is that nurseries are the place where the systems keeps track of concurrent tasks and it’s fully exposed to the user. It may even be user-defined, to customize things like cancellation, error handling, timeouts, debugging…

Sze · April 25, 2024, 9:58am

I think the being able to shut down things and switching the behind the scenes implementation are the interesting parts. Similar to how switching allocators allows you to do interesting things.

However I think if a language uses or supports these, I also would want to be able to enforce cpu, memory and other usage limits through these nurseries. (Personally I think rackets name for a similar concept “Custodians” is better)
Maybe these limits can all be enforced completely in userspace, but if not it would be good to have some language support.

Without these limits one task can simply spin forever leaking / collecting more and more resources.