I’m tinkering with an interpreter (byte-code) for a toy language, implemented in Zig.
The state of a thread or fiber consists basically of a value stack, call frame stack, instruction pointer.
If I have a struct which contains this state, each thread/fiber needs a pointer to this state struct for almost everything.
I wonder what’s more efficient:
I could store this pointer (or the struct itself) in a thread-local variable. Every function which needs to access or modify the state, has to do this through this variable then.
Let’s assume that I can get the corresponding pointer for the current thread somehow when the thread switches. I don’t know if that’s possible for threads, but it’s trivial for several fibers running in the same thread). Then I can and must pass this pointer to all the internal functions as an argument.
So I have either the overhead of accessing a thread-local variable in many places, or the overhead of an extra argument for most function calls.
Assuming the TLS has the pointers to the stacks and the IP, you’re storing 3 values on the thread local storage. Accessing them will be direct memory access. Assuming you are passing a pointer to a struct holding the stack pointers and the IP into a function to execute the code, accessing the stack pointers and the IP requires one extra re-direction.
There’s another way. Have a struct holding the stacks and IP. The struct’s methods are your functions for executing the code. The access is one redirection on the self pointer.
I would imagine the performance difference is negligible. You need to run some benchmarks to decide. It’s a matter of taste and code organization. It’s really up to you.
That’s practically identical to passing as a parameter, the only difference is potentially bundling more data in a single parameter, which can have performance impacts in either direction but minimal enough you shouldn’t care until you solve most other inefficiencies.
Pass a struct to your functions and use its methods to reach the contextual variables. That way, you can easily switch between different approaches. To use TLS, just define the struct to be zero-byte.
It’s going to be mostly a learning exercise, of course. Interpreters are inherently inefficient on modern CPU’s. You’re not going to see a difference one way or another.
I’m sorry I really don’t understand what you were trying to get at. The three approaches are obviously quite different, in term of code organization, scoping, and ease to understand. Yes, their performance are probably similar in the scheme of things, but performance is not the only deciding factor to pick one over the other.
You’ve been saying that performance is not the only factor in deciding on an approach, which is true of course. But the OP is asking which approach has the best performance. So I think they are trying to focus the discussion on what the OP is asking.