What is the status of async with Zig?

GigaGrunch · September 2, 2024, 9:10am

Sorry, off topic:

the function coloring problem

I didn’t know this one yet. Nice one Made me think more about constness though. Especially in C++

buzmeg · September 3, 2024, 12:17am

Implement something with reference counting and suddenly you will understand what RAII is and why it’s so useful. Take a look at the Linux kernel and how much it uses reference counting and the bugs. Doing reference counted stuff in Zig or C is PAIN.

In my opinion, RAII is THE dividing line between small systems programming languages and big ones. If you omit RAII, you have C and Zig. If keep RAII, you have C++ and Rust.

A larger question is whether RAII is even a good idea nowadays. RAII results in things scattered across the heap and lots of fragmentation. This is death to performance on modern CPUs. If you let your RAII become non-deterministic, you’re pretty much back to a garbage collector.

Perhaps someone very clever in the Zig community will figure out how to do reference counted stuff better without RAII. We’ll have to see.

(Personally, I’d rather see efforts to support state machines more directly in some way. State machines are way more important in my opinion than RAII.)

dee0xeed · September 3, 2024, 7:31am

I do know what RAII is. Maybe I used wrong words. I meant that I do not like / feel uncomfortable when a program is doing something implicitly (frees memory, closes files etc at the end of a scope in case of RAII). Well, consider doing everything explicitly as my (or someone’s else) personal preference.

I only remember about file open count (I had some experience in writing device drivers quite a long time ago) - driver release method is invoked only at last close call, something like that. I am not sure if memory allocation/de-allocation (kmalloc/kfree) uses some ARC stuff.

Wow. I really like this definition!

huntrss · September 3, 2024, 10:54am

Don’t you think that tagged unions or using something like the typestate pattern as used in Rust (How To Use The Typestate Pattern In Rust | Zero To Mastery) is sufficient?

Sze · September 3, 2024, 11:17am

I think RAII has a similar problem like Java, that it causes too many verbs (functions) to become nouns (classes).

Instead of having a database handle and then just calling const conn = db.connect(...) on it and defer db.disconnect(conn). It now wants you to create a Connection object that can do essentially the same thing but bind these things to a scope where the end of scope causes the disconnect.

This means that if you have something that has many sequential steps you now need to create many of those nested scope things to follow that philosophy, I find that annoying, I think these ideas work and are “beneficial” only until you get tired of doing unnecessary work of transforming your straightforward program into one that is approved by RAII ideology.

In the end I think it is an ideology and one I don’t find particular appealing, because it overemphasizes a false sense of programmer “safety” and “correctness”, over writing code that actually results in good memory layout, not wasting cache lines etc.

It hides details, forces your program into an arbitrary structure with questionable benefits and thus makes it more difficult to optimize things that actually matter, later. It forces a way of thinking and structuring the code, so that people can avoid thinking about the things they should actually think about. Things like:

How much memory does this take?
How much instances I have of this thing?
Why are they scattered all around all over the place, instead of collected in a single array?
Why are there many objects of different types collected in lists instead of having multiple arrays of a single type?

Sometimes it’s really best to just have things mixed in a list, but I think that with the RAII way of doing things it is more often the default outcome, instead of a deliberate choice.

I think we should optimize for doing things in a way that brings us to consider and pick a lot of meaningful deliberate choices until we are done with the program.

With RAII I find it difficult to see what the code is actually doing, more and more abstraction is piled up until it is hard to tell what is going on, I think non-RAII languages have a tendency to keep it simpler and less abstracted, putting more responsibility on the programmer, but also not creating a false/fake sense of security (in the situations where things where made unnecessarily complicated, just so that the code philosophy can be followed).

I also think that this wrapping things is a distraction that makes people think about “code architecture” instead of just writing the code and then seeing from that, the pieces that are worth abstracting out / seeing what repeats and can be formalized.

In the end it is probably a subjective choice.
But I would rather have the responsibility of avoiding to shoot/stab myself in the foot, then having to wrap every tool in a foot-shoot/stab-prevention wrapper all the time.

Data oriented ideas seem much more practical to me, because they actually care about the hardware the program later runs on, than abstract claims of RAII being useful (without considering where it isn’t and what it makes more annoying).

tgirod · September 3, 2024, 2:54pm

Do you mind expanding a bit about FSM and concurrent execution ? I’m familiar with both but not with how they relate.

dee0xeed · September 3, 2024, 3:25pm

Take a look at this. This is an alternative to async/await (coroutines) that does not require special support on compiler side, as well as coding in asm.

const-void · September 4, 2024, 9:46am

Admiral Javascript, don’t be too proud of this technological terror you’ve constructed. The ability to async is insignificant next to the power of fork w/IPC.

re: evolution

We are fundamentally, very, very, very lazy. To wit, we haven’t elevated beyond the interface laid out by teletypewriters (1902).
CPU and RAM are there to be used. A well-designed system should drive to 100% CPU and 100% RAM consumption because idle resources are wasted (non-deterministic events are a tax that suppresses resource utilization).
Turing taught us we can do anything with anything, but #1 says to wear the right underwear.

For graceful, non-blocking mvvm, Swift.
For “just do it for the masses”, python.
For code that must last another 40 years, C (for now).
For those nights of shame, c++

Enter zig. async in zig appeals to my sense of laziness. A concept I don’t have to know but can easily use, because hey, pthread was a thing and async is way easier to type.

However, is that a good thing? That I don’t know nor care to learn? No. In architectural terms, async is akin to the brutalist style of large concrete buildings, one size fits all concrete for the masses. async is inherently problematic.

We, and thus the world, are better off with elegant, bespoke designs and patterns that solve a specific problem - a Sistine Chapel solution for the problem…a library that does precisely what is needed, with a thread / signal / IO paradigm designed around that problem. to me, this is the zig use-case.

Great thread!

dee0xeed · September 5, 2024, 9:41am

Funny, I’v got more or less similar thoughts/feeling about RAII. Externally it looks just like this - a lot of hard mental work done by a language/compiler designers, but for what? Just to let lazy/capricious/beginner/forgetful programmers omit cleanup code? Is it really THAAAT hard to write cleanup code explicitly? In C I’m quite happy with goto __cleanup way, In Zig we have defer/errdefer. The latter a bit harder to grasp imo, but we have some sensible rules.

alcuin · September 15, 2024, 8:25am

Continuations are the functional expression of the GOTO

This is not really correct. A (classical, or undelimited) continuation represents “the rest of the program”. This may be a good link, scroll towards the bottom, if you are not familiar with it. The problems with memory consumption in undelimited continuations are an obvious issue. See Oleg Kiselyov’s page on why call/cc (with undelimited continuations) is bad for a variety of reasons, including those which may be of a similar mind to Zig.

The reason for (delimited) continuations is that they give very explicit and granular expression of control flow. This would align with Zig’s “make intent explicit” and otherwise “no hidden control flow”. They also have favorable memory characteristics although I haven’t dug super deep into the literature on say, affine or linear typing for continuation passing in this way (that way you could also work towards “no hidden memory allocations”).

dee0xeed · September 15, 2024, 9:32am

Yes, very joyful text, thanks.

Here’s the secret: it’s setjmp/longjmp

I used these risky guys only once and it was more than a decade ago.
Specifically the scenario was as follows.
Suppose you are using some DLL. And you are afraid that DLL may segfault, but you do not want to terminate just because it’s not the fault of the main program, it’s bad DLL.
Ok, do the following.

set a flag, say, bad_dll to false
set a handler for SIGSEGV
prepare jump
check the flag, if it is true, say bad words about DLL
otherwise attempt to call a function from DLL
restore original SIGSEGV handler

In the SIGSEGV handler:

set bad_dll to true
longjmp

What I want to say… all that kinda clever and cool, but it is also a very nice way to confuse a reader of source code since flow control with such tricks is a bit weird imho.

mnemnion · September 15, 2024, 2:28pm

Welcome to Ziggit @alcuin!

Although I’m a delimited continuation respecter, there are marked and unsolved problems with introducing them as a control-flow primitive in Zig. Canonically, they’re stackful (capture a series of stack frames, not just one) and resumable, and that introduces a much harder version of the cancelawait problem which is the #1 reason Zig async hasn’t returned.

Zig is low level enough that it would be possible to write a library for delimited continuations, with some amount of assembler (clearly this takes the rare skill of being a polyglot assembly expert, but it isn’t that different from coroutines, which could form a basis). The big downside there is that assembly blocks are ‘optimization blind’, but as a way of exploring how those problems could be solved, and also just to have them, it’s tractable.

@mlugg (yay!) posted on Reddit (booo!) about other factors in reincorporating async, and everything listed there is as severe or more so for delimited continuations.

Last but not least, I don’t think ‘colorless’ delimited continuations are possible, and, while Zig’s OG async wasn’t truly colorless, it got pretty close, and that was one of the best things about it. Delimited continuations are even more exotic than coroutines, so adding them as a core primitive would create an entire dialect of the language which users can’t ignore (function coloring problem) and won’t recognize.

Not to be a downer about it. It would be worthwhile to see how many of these problems could be solved, because the technique is an elegant one for certain problems of interest.

dee0xeed · September 16, 2024, 10:56am

Sorry for backward question.

Do I understand correctly that this (setjmp()/longjmp() based) implementation are “stackless coroutines” (they don’t have individual stacks, they just reserve some memory in common process stack) and this one ({get/set/make/swap}context() based) are “stackful coroutines” (each one allocates space for it’s stack on the heap)?

dimdin · September 16, 2024, 11:36am

Both set/longjmp and get/setcontext save the cpu registers.

setjmp saves a smaller set, the exact set differs for each cpu. The registers are usually: instruction pointer, stack pointer, the stack frame pointer and the register(s) that stores the C return values.
getcontext saves the entire cpu state, all the registers plus vectors and floating point registers.

None of them copies the stack.

Since both support getting and setting the stack pointer they can be used to change or restore stack contents.
Also note that sigaction handler receives a pointer to the context of the CPU (the same context structure that getcontext fills) that is captured when the signal interrupts the cpu. You can change these values and run something else after the signal handler returns.

dee0xeed · September 16, 2024, 2:32pm

wait… I am not talking about {set,long}jmp vs {get/set}context. I am talking about those two specific implementations regardless of what magic they use.

In the first one coroutines do not have “personal” stacks.
And longjmp does not switch stacks just because there is only one.
Hence I thought this is an implementation of “stackless coroutines”.

In the second one there are many stacks, one per coroutine,
they are allocated via ctx.uc_stack.ss_sp = calloc(1, MINSIGSTKSZ);
And swapcontext do switch stacks.

So what kind of coroutines do these examples implement?
Both “stackless”? Or the first one is “stackless”, and the second one is “stackful”?

mnemnion · September 16, 2024, 3:22pm

These are both stackful coroutine implementations. They allocate a certain amount of space for a program stack, jump execution to it, and from there you have an ordinary down-growing program stack, with as much room as was allocated.

Any time you want to field from that stack back to the main stack (really the calling stack), you can, they have slightly different ways of holding onto the stack context but not that different.

A stackless coroutine gives you one stack frame, corresponding to the body of one function call:

fn oneStack(...) void {
    // a stackless coro can yield anywhere in here
    // ...
    // but not in here
    _ = pushNextStack(...);
}

A stackless coroutine can call as many functions as it would like, but it can’t yield in those functions. Only in its own function body.

dimdin · September 16, 2024, 3:26pm

Actually uc_stack is a stack_t that holds where is the stack and its size.
The stack pointer is part of the uc_mcontext mcontext_t type.

Since it is up to you to manipulate the stack and the instruction pointer you can have any kind of continuations and any kind of coroutines.

dee0xeed · September 16, 2024, 4:07pm

Aha! It seems I’ve already got it.
That local array in cogo,

char n[STACKDIR (tos - (char*)&arg)];

in fact is the personal stack for a coroutine instance, right?
So the difference between those two implementations
is where they hold per coroutine stacks,
in the first one they are “chunks” of common process/thread stack
and in the second one they are on the heap.

dee0xeed · September 16, 2024, 4:31pm

Yes, functions polychromatism hell, I remember

ityonemo · March 29, 2025, 5:26pm

not to necro this, but if anyone in the future is looking at this: language async support does not imply creating a runtime, although zig will instrument your program with a runtime from std if you implement pub fn main as async (zig also creates a small runtime if for example pub fn main returns a error union, or void instead of u8). You absolutely can write your own async executor and for example interface it with some other subsystem (os or userland). As an example this was done in zigler to create ‘yielding FFI’ that interleaves zig async into yield points that the erlang virtual machine expects. This is impossible without language level async.

timestamped deep-dive: