I have a lot of routines for encoding, decoding chars, strings, output, input.
Depending on my (non-comptime) field contains_unicode
I choose my function.
For example:
pub fn encode_string(self: *const Mapping, str: []const u8) !FixedMachineString
{
return
if (!self.contains_unicode) self.encode_ascii_string(str)
else self.encode_unicode_string(str);
}
Would it be a good idea to assign a appropriate function during initialization?
self.encode_string_function = ...
and then do it like this? Which save a lot of if
@runtime.
pub fn encode_string(self: *const Mapping, str: []const u8) !FixedMachineString
{
return self.encode_string_function(str);
}
I think that solution is not the most beautiful. So I was wondering if there is something less hacky.
It is quite a standard solution.
There are basically 3 ways to implement this, each with their tradeoffs.
- Function pointers, like you mentioned. A major downside is that the compiler doesn’t know which function will be called, and so it can’t do any optimization. Even the code around it gets affected by this, as the compiler can’t move code across the call. On top of that, you have to carry the pointer around, which costs cycles when copying.
- Branches, like your current code. The bad part of it is that every call comes with a branch prediction, or even more than one if your selection code is more complicated. If you never change the pointer after setting it, the branches will be correctly predicted, which minimizes this cost. Since the compiler can see the functions, it can optimize better, even with the branch.
As far as I’m aware, branches tend to do better overall, unless your selection logic is very complicated.
A third option which I’ve used, but I haven’t seen anyone talk about, is to make the selection at compile time at a higher level. Suppose all your functions took a comptime parameter like so:
pub fn runProgram(comptime ascii: bool) void
pub fn encode_string(self: *const Mapping, str: []const u8, comptime ascii: bool) !FixedMachineString
Now in your main function, you do this:
if(is_ascii)
runProgram(true)
else
runProgram(false);
In essence, after your section logic, you branch into a version of your program that has complete knowledge about which functions to use.
This solves all the performance problems mentioned before. There are no pointers and no branches after this first one. It’s the optimal choice for speed. The major downside is that you have basically packed two different programs inside your binary, one program made for ascii and one made for unicode. Although you can reuse most source code, a lot of the binary cannot be shared between these two versions, so your binary size will increase a lot.
1 Like
I think I like this.
I will share the increase of the exe later when I tried it.
(How the branch prediction operates I don’t know. The boolean is set when loading a wordlist for a language and all the rest operates on that.)
1 Like
Now that is interesting. I am curious how this will be implemented.
Update: nvm, I missed the ‘runtime’ part, I still think the below syntax would make sense for comptime selection though 
Hm, syntax looks a bit arbitrary tbh, I’d rather like a general selection on a comptime value:
const myFunc = switch (target) {
.avx2 => fn (args: Args) Result { ... },
.sse4 => fn (args: Args) Result { ... },
.neon => fn (args: Args) Result { ... },
};
…assuming that there would also be a new function syntax like (which IMHO makes a lot of sense too, and I seem to remember having seen a proposal for a similar syntax)
const myFunc = fn (args: Args) Result { ... }
Edit: fixed result syntax, somehow the extra :
from Typescript always sneaks into my Zig code lol
PS: this could be written like this, e.g. one could also group already existing explicit functions under a common name, selected by a comptime value:
const myFuncAVX2 = fn (args: Args) Result { ... };
const myFuncSSE4 = fn (args: Args) Result { ... };
const myFuncNeon = fn (args: Args) Result { ... };
const myFunc = switch (target) {
.avx2 => myFuncAVX2,
.sse4 => myFuncSSE4,
.neon => myFuncNeon,
};
The point is to select function based on runtime cpu detection
1 Like
Oh whoops I missed the runtime part 
Not sure how I feel about that tbh, I would probably prefer to stamp out different executables but I can also see that runtime selection makes sense in some scenarios.
OTH: where do you draw the line? Should it allow to select by CPU arch and stamp out different functions for x86 vs ARM vs WASM? I can’t think of a scenario where that would make sense 
…on further thought, how would that even work with inlining? Small math functions only make sense performance-wise when they can be inlined, so you’d probably want to specialize very large execution blocks at once (more like a complete GPU shader) - but I guess the target specialization would also include all code that’s called from a specialized top level function.
…on further further though: maybe that feature would actually be nice for embedding actual ‘foreign ISA functions’ into regular Zig code (like GPU shaders)
That sounds awesome to me, if you can pull that off. Having a single binary that the user can put in a flash drive and run in whatever computer they find in front of them would be super convenient.
I have some functions that optimize really well for specific targets and it would be great if I could include them in the binary while keeping it compatible with the baseline CPUs as well