Is it possible to implement Stack Stitching with Hot Split and just work with Zig?

cuzzo · January 12, 2026, 6:46pm

I planned to implement a Go-like runtime (v1) in Zig - with Stack Stitching w/ Hot Split.

I got this working for the happy path, but I want to be able to use Zig’s defer stack to have fiber-fault isolation.

I transpile a language to Zig, so I can make sure that any locks, heap allocations, resources etc are put on the defer / errdefer stack, and this makes cleaning up a fiber overflow easy in theory.

But is there any way to get this working with Stack Stitching w/ Hot Split?

Essentially, I have an LLVM Machine Pass that injects asm at the very top of every function (transpiled) like so:

entry:
  # START MY CUSTOM PROLOGUE BEFORE ANYTHING ELSE
  jmp resume if (rsp - frame_size > LIMIT)
  call morestack   # This creates a larger stack, switches to it, hijacks the return to CALL lessstack, which moves the stack pointer back to the old stack, then RETS whatever would normally be returned
  #  morestack JUMPs to here (not RET)

resume:
  # END MY PROLOGUE
  # the existing function …

This just works if the fiber doesn’t error. But a fiber not erroring for any reason is completely un-realistic. I’m lost at how I can prove this will just work with the defer and errdefer stack, unwinding, backtracing, etc.

For one, I’m in way over my head. I can’t find out definitely what I must guarantee to ensure Zig’s unwinding would just work. Under the hood, I thought errors are just returned not unwound, so this shouldn’t really be a problem. If it works with normal returns, it should just work with errors. But I can’t make assumptions. I’m not sure how much testing would be involved for me to feel confident it actually works, instead of just coincidentally passes whatever tests I write. Lastly, Zig is in flux, so I’m hoping there could be a solution that will be future proof.

Naively, I assumed this would work. Now I’m feeling the pain of my stupidity /=

Stack Splitting would effectively be useless if it only works with fibers that don’t ever error.

pachde · January 12, 2026, 9:42pm

For one thing, there’s no guarantee that the backend is LLVM. When Zig reaches its goal of removing the LLVM dependency and gains parity with LLVM optimized code generation, your solution will have more limited value.

cuzzo · January 13, 2026, 1:20am

Theoretically, this is not a hard problem to solve even without LLVM, no?

As long as Zig can output asm, I can pretty easily insert an epilogue at the top of functions.

I’m less concerned with that, and more concerned that I don’t know anything about how Zig actually stores errors and error defers on the stack, and if something I’m doing is not compatible in some weird edge case.

I can test that at least some errors do propagate through what I’m doing, but I don’t know enough to know how I know if this actually works, or it just passes the one test case that I can think of.

LucasSantos91 · January 13, 2026, 3:27am

I believe this is correct. I don’t know anything in Zig that unwinds the stack (that is, recovers the state of a previous frame). It is mentioned in the std library, but I don’t know in which circumstance Zig itself does it. Of course, if you call into C++ code, then it will do exceptions, but they are not initiated by Zig.
There is, however, stack tracing, that is, recording which functions were called, in order. Zig doesn’t do anything special, it walks the stack according to the platform prescribed way. For Windows, that information is in the unwind tables, which lives in the .pdata section, even though you don’t necessarily need to use it for unwinding. Zig and debuggers will use this information to find where the return address is within the stack. I believe to get your code to work with stack traces, you just need to write an approppriate entry in this table. However there is no mechanism for writing to the table after the program is loaded, so you need to write during building.
For Linux, all I know is that this information is called call frame information (CFI), and it lives in the DWARF file.
Other than messed up stack traces, I think your code will work, regardless of the function returning an error.
If you call into external functions on windows, you have to be careful with the TIB, otherwise Windows will think you caused a stack overflow. I talked about it here.
With all that said, there’s a reason why Go abandoned this approach. It’s not really performant. It’s much better to just know the stack size you need for your function, and allocate it all up front. I also talked about this in the linked discussion. It’s ridiculous that in 2026 we still don’t know the stack usage of our functions. The compiler needs to know this, except for the obvious corner cases, like recursive functions, so why doesn’t it just give it to us? It’s great that Zig is tackling this, but this should have been standard practice long before Zig even showed up.

cuzzo · January 13, 2026, 4:11am

Thank you!

This is about as far as I could get.

I can verify that my setup is correct for libunwind, but it does not work with @panic() and I’m kind of lost as to why, though, that doesn’t seem direly important for v0.0.1.

I’m more worried that Zig may have some expectation of the stack that I may somehow violate, but - theoretically, if I inject into the VERY beginning of the function, and on any/all returns - it seems naively like it should work (and it does in tests).

But maybe Zig does weird jumps instead of returns for certain error handling or something that I don’t know about.

I started with an assumption that I could do what Go did, but then realized I’m lowering to Zig, so that’s a bold assumption. I don’t control the stack completely.

pzittlau · January 13, 2026, 7:05am

I’ve never seen Zig generate such assembly but I’m also not knowledgeable about the internals. There is something about the error implementation in the language reference (unfortunaly - at least for me - the links to the headings don’t correctly work and ziggit replaces the %20 in the url with literal spaces, so search for “To analyze performance cost” for instance).

Is it possible to implement Stack Stitching with Hot Split and *just work* with Zig?

Is it possible to implement Stack Stitching with Hot Split and just work with Zig?