Setting ASM function prologue

I’m building a green fiber runtime like Go in Zig for a custom language I’m building that lowers to Zig.

Something that’s important is to check that I don’t overflow the fiber stacks.

I want to do Go’s runtime.morestack in Zig, and I’m trying to figure out the easiest way to do that.

The assembly is straightforward - but I’m not lost for options to include it before every Zig function.

Does anyone know how to do this?

Thanks!

1 Like

You can refer to the design of zio, where the lowest address page of its stack space is posix.PROT.NONE, which will trigger posix.SIG.SEGV.

When handling signals, determine whether it is caused by a stack overflow.

4 Likes

Hello @cuzzo,
Welcome to ziggit :slight_smile:

If in the zig module the stack_check is true, the zig compiler passes __zig_probe_stack as the probe-stack llvm function attribute.
You can declare it to write your own stack probing that extends stack size.

2 Likes

The stack probe, by default, is only called when a function’s stack frame is larger than 4 kiB. It wouldn’t be reliable for this purpose. There is a way to make llvm be more liberal with stack probes, but I don’t think Zig exposes it.

I was really pleasantly surprised how nicely the SIGSEGV handlers works for this use case. It seems that even Java uses this for it’s memory management. With sigaltstack for the handler, it’s a very clean solution for extending the stack. Obviously, if you want Go-level control over the stack, you will need a more complex architecture, but I think the signal handler is a good base for that as well.

Thank you all for the responses!!

Unfortunately, I don’t believe probe_stack is an option. My transpiler “guarantees” that an individual frame is never more than 288 bytes, but 1) this is a new language, I’m sure it’s not perfect, relying on that now seems like hope, which is a bad strategy and 2) even if it was perfect, I don’t think I could probe every 288 bytes, because you could allocate 3 small (100 byte) frames in a row and overflow undetected, correct?

mmap is a possible option - but 1) it’s slow 1-10 μs, 2) the tail latency is too unpredictable, 3) it blocks an entire core for that time, and 4) cannot effectively be done up front unless you want to reserve space for your MAX predicted concurrency always (not the worst trade, but not favorable).

I am hoping to go the Go route - and simply add a function prologue. I thought I could accomplish this with an LLVM plugin (the plugin is relatively simple, the hard part is getting Zig to accept it).

Theoretically I can do something like:
zig test <my_test>.zig --library c -femit-llvm-bc=<my_test>.bc -fno-emit-bin

Then run the LLVM pass:

opt -load-pass-plugin=<my_plugin>.so -passes=“<my_pass>” <my_test>.bc -o <my_test>-instrumented.bc

Then build:

zig build-exe <my_test>-instrumented.bc --library c --name <my_test>-runner

Yes?

1 Like

I’m curious, how do you plan to manage the stack? Does your language have a garbage collector that is capable of updating all pointers to stack?

Also, given that you have a language that presumably targets Zig, what stops you injecting your own probing into the generated intermediate code?

1 Like

I’m curious, how do you plan to manage the stack? Does your language have a garbage collector that is capable of updating all pointers to stack

The language can detect the worst-case stack size you need at runtime (unpredictable for any recursion or callbacks → assume worst case).

Then, a control plane monitors your tasks, and automatically downsizes them if safe.

If a stack overflow happens now → I just crash.

In the future → I plan to do stack hystersis (abandoned by Go and Rust) EXACTLY once (unless multiple overflows in flight at the exact same time) to finish the overflown task, and then all future tasks are spawned at a larger size to avoid potentially slow hystersis.

Though → The user can configure at the control plane level how to handle this. Maybe they’re aware that 1 in 1m tasks overflows. Rather than burning memory so all tasks for sure have enough space → They could be fine with 1 in 1m having worse latency than Go’s worst-case stack growth via hystersis.

Overall, the system should be much more predictable than Go. No GC. No write barriers. No stack growth (unless you want to trade non-determinism to save memory).

1 Like

Then I’d go exactly with the virtual memory reservation and signal handler. The solution is essentially free if you don’t run into stack overflow, the signal handler being configured costs you nothing. And one copy of the estimated stack in reserved non-committed virtual memory is really not problematic. Probing at every function has a cost, that’s only worth it, if you can resize the stack as smartly as Go, because they can start with tiny stacks.

Isn’t the problem with VM:

  1. there’s not enough of it for 1M connections,
  2. Even if there was, presumably you must page fault and handling that at every 4kb which is 100x more costly than Go’s resize, and
  3. The page fault blocks the entire thread while it’s happening
  4. Like mmap the tail latency is non-deterministic,
  5. You’re only saving a deterministic prologue tax at the top of every function - which is pretty small and deterministic

The problem is - I want a much more deterministic Go. This seems like a less deterministic Go.

I’m sorry, I probably got confused. I thought you said you have no GC in the language and you want no stack growth. In any case, the question still stands, if you have a custom language, that you are compiling, what prevents you from injecting your own probing?

if you have a custom language, that you are compiling, what prevents you from injecting your own probing?

Nothing! But ideally, I want users to be able to safely run linked Zig code → so that you don’t need to switch to G0 to run Zig functions like you do in Go to run C functions. You’ll only need to switch to run C functions.

Thank you all for the input!

If anyone is curious - this does seem to work: Stack Overflow Test v2 - actually works. · cuzzo/easy-vm@45058fd · GitHub