Choosing the right function at runtime based on boolean

ericlang · May 6, 2025, 10:42pm

I have a lot of routines for encoding, decoding chars, strings, output, input.
Depending on my (non-comptime) field contains_unicode I choose my function.
For example:

pub fn encode_string(self: *const Mapping, str: []const u8) !FixedMachineString
{
    return 
        if (!self.contains_unicode) self.encode_ascii_string(str)
        else self.encode_unicode_string(str);
}

Would it be a good idea to assign a appropriate function during initialization?
self.encode_string_function = ...

and then do it like this? Which save a lot of if @runtime.

pub fn encode_string(self: *const Mapping, str: []const u8) !FixedMachineString
{
    return self.encode_string_function(str);
}

I think that solution is not the most beautiful. So I was wondering if there is something less hacky.

andrewrk · May 7, 2025, 12:35am

github.com/ziglang/zig

Proposal: Function multi-versioning

opened 06:20PM - 17 May 18 UTC

bheads

proposal accepted

A really interesting concept is function multi-versioning. The general idea is t…o support implementing multiple versions of a function for different hardware and having the correct version of the function selected at run time. Made up sample code: ```C pub fn someMathFunction(vec: Vector) Vector [target: sse4.2] { // optimized for SSE 4.2 } pub fn someMathFunction(vec: Vector) Vector [target: avx2] { // optimized for avx2 } pub fn someMathFunction(vec: Vector) Vector [target: default] { // no asm/intrinsics optimization } // later on const v = giveMeAVect(); const v2 = someMathFunction(v); // calls the best version based on run time selection ``` There are ways to simulate this using function pointers, but the compiler would be better at optimizing this, plus implementing that over and over by hand would suck. LLVM https://llvm.org/docs/LangRef.html#ifuncs GCC https://lwn.net/Articles/691932/

LucasSantos91 · May 7, 2025, 6:45am

It is quite a standard solution.
There are basically 3 ways to implement this, each with their tradeoffs.

Function pointers, like you mentioned. A major downside is that the compiler doesn’t know which function will be called, and so it can’t do any optimization. Even the code around it gets affected by this, as the compiler can’t move code across the call. On top of that, you have to carry the pointer around, which costs cycles when copying.
Branches, like your current code. The bad part of it is that every call comes with a branch prediction, or even more than one if your selection code is more complicated. If you never change the pointer after setting it, the branches will be correctly predicted, which minimizes this cost. Since the compiler can see the functions, it can optimize better, even with the branch.

As far as I’m aware, branches tend to do better overall, unless your selection logic is very complicated.
A third option which I’ve used, but I haven’t seen anyone talk about, is to make the selection at compile time at a higher level. Suppose all your functions took a comptime parameter like so:

pub fn runProgram(comptime ascii: bool) void

pub fn encode_string(self: *const Mapping, str: []const u8, comptime ascii: bool) !FixedMachineString

Now in your main function, you do this:

if(is_ascii) 
    runProgram(true)
else
    runProgram(false);

In essence, after your section logic, you branch into a version of your program that has complete knowledge about which functions to use.
This solves all the performance problems mentioned before. There are no pointers and no branches after this first one. It’s the optimal choice for speed. The major downside is that you have basically packed two different programs inside your binary, one program made for ascii and one made for unicode. Although you can reuse most source code, a lot of the binary cannot be shared between these two versions, so your binary size will increase a lot.

ericlang · May 7, 2025, 7:30am

I think I like this.
I will share the increase of the exe later when I tried it.

(How the branch prediction operates I don’t know. The boolean is set when loading a wordlist for a language and all the rest operates on that.)

ericlang · May 7, 2025, 7:34am

Now that is interesting. I am curious how this will be implemented.

floooh · May 7, 2025, 3:25pm

Update: nvm, I missed the ‘runtime’ part, I still think the below syntax would make sense for comptime selection though

Hm, syntax looks a bit arbitrary tbh, I’d rather like a general selection on a comptime value:

const myFunc = switch (target) {
    .avx2 => fn (args: Args) Result { ... },
    .sse4 => fn (args: Args) Result { ... },
    .neon => fn (args: Args) Result { ... }, 
};

…assuming that there would also be a new function syntax like (which IMHO makes a lot of sense too, and I seem to remember having seen a proposal for a similar syntax)

const myFunc = fn (args: Args) Result { ... }

Edit: fixed result syntax, somehow the extra : from Typescript always sneaks into my Zig code lol

PS: this could be written like this, e.g. one could also group already existing explicit functions under a common name, selected by a comptime value:

const myFuncAVX2 = fn (args: Args) Result { ... };
const myFuncSSE4 = fn (args: Args) Result { ... };
const myFuncNeon = fn (args: Args) Result { ... };

const myFunc = switch (target) {
    .avx2 => myFuncAVX2,
    .sse4 => myFuncSSE4,
    .neon => myFuncNeon,
};

cryptocode · May 7, 2025, 3:42pm

The point is to select function based on runtime cpu detection

floooh · May 7, 2025, 3:44pm

Oh whoops I missed the runtime part

Not sure how I feel about that tbh, I would probably prefer to stamp out different executables but I can also see that runtime selection makes sense in some scenarios.

OTH: where do you draw the line? Should it allow to select by CPU arch and stamp out different functions for x86 vs ARM vs WASM? I can’t think of a scenario where that would make sense

…on further thought, how would that even work with inlining? Small math functions only make sense performance-wise when they can be inlined, so you’d probably want to specialize very large execution blocks at once (more like a complete GPU shader) - but I guess the target specialization would also include all code that’s called from a specialized top level function.

…on further further though: maybe that feature would actually be nice for embedding actual ‘foreign ISA functions’ into regular Zig code (like GPU shaders)

LucasSantos91 · May 7, 2025, 4:02pm

That sounds awesome to me, if you can pull that off. Having a single binary that the user can put in a flash drive and run in whatever computer they find in front of them would be super convenient.

Cloudef · May 7, 2025, 4:37pm

I have some functions that optimize really well for specific targets and it would be great if I could include them in the binary while keeping it compatible with the baseline CPUs as well

tsdtas · May 8, 2025, 6:54am

IIRC that isn’t impossible to implement with a combination of dlopen and [windows/macos equivalent] and compiling a static binary for each baseline target. At least, I seem to remember some big software project (an offline renderer, maybe?) doing something like that.

Of course, having it in the language would make optimizations like that way more convenient.

Cloudef · May 8, 2025, 8:49am

It’s definitely possible to do it in many ways, including compiling separate object files and setting up the boilerplate for runtime detection and setting up the jump table / function pointers. It’s really about the convenience as you said.