Consider bitwise approach instead of branching for std.math.absInt

I was reading through the std.math library and looked at the implementation of absInt and noticed that they are using the following pattern:

return if (x < 0) -x else x;

This introduces a branch condition. Granted, a good optimizer can notice these patterns and attempt to remove them but there exists other alternatives that can give better performance vs branching.

Is there a reason they are not using something more like the following:

const mask = x >> (@bitSizeOf(@TypeOf(x)) - 1);

return (x + mask) ^ mask;

Also, the overflow case of maximum negative value is already handled in the former case and would apply here too.

Thoughts?

I am not sure what you mean by

It doesn’t introduce a branch condition unless there is a branch in the assembly. If you look at the assembly output for both versions, you get the same thing on x86_64 and arm64.

        mov     rax, rdi
        neg     rax
        cmovs   rax, rdi
        ret

The only incongruity between this assembly and the original Zig code return if (x < 0) -x else x; is that the -x operation is technically executed speculatively (software speculation). So code that is more accurate to what you want to do in assembly would be:

const y = -x;
return if (x < 0) y else x;

However, this does not actually add any information we needed to know or care about. Your version is more representative, however, of the assembly produced for WASM. Both the simple version and your version compile to the following:

absInt1:
        local.get       0
        local.get       0
        i64.const       63
        i64.shr_s
        local.tee       1
        i64.xor 
        local.get       1
        i64.sub 
        end_function

I am not quite sure how to read WASM but I don’t even see your addition? Anyway, in the absence of actual benefit in terms of emitted assembly, I think the simple code is clearer, so I would prefer it over the bit manipulation technique.

4 Likes

Thanks for the response - my intention was to make it closer to what appears in assembly but point taken that the current version is clearer for people who are reading it.

And by emitting a branch instruction, we’re relying on the optimizer to remove that for us - that’s the only gripe that I had. Otherwise, totally fine with it.

I’m not necessarily in agreement that we semantically defined a branch. Simple ternary conditionals are often expected to be a CMOV on modern hardware, most of the time, and a single arithmetic negation is a very small amount of work to speculatively execute (software speculation). arm64 actually has cneg, so it avoids the moves all together! I am sure some compilers out there let you be more precise about when you want a branch and when you don’t, but not any that I’ve ever used. Whether there is an actual branch in hardware is more of an implementation detail, albeit an important one that you should understand. One could imagine that code like yours compiled for hardware before barrel shifters were a thing might have a better trick for finding the absolute value than the one you provided for its particular hardware. If you did have a particular hardware platform where the compiler genuinely does not know the best absolute value implementation, then I would support your bit trick, but in the absence of any real benefit I think readability is better. I’m typically not one to argue for readability by the way, but in this case there is no difference in assembly and therefore no actual difference on the real platform, which is the hardware. I would also think that the simple code would be more recognizable by other compilers once there are multiple Zig compilers, and I would not always expect the compiler to be nice and recognize your bit trick is unnecessary and replace it with a CMOV.

6 Likes

I think that’s a great response and I wasn’t aware of that for ternary operations - great to know!

Also, I had no idea, but compiler explorer actually supports Zig - thought I’d mention it here because your post made me interested.

1 Like

As a non-expert in compilers, but a user of quite a few compiled languages throughout the years, I think it’s a great and impressive feat that a compiler can produce the optimal assembly when given the “as anyone would write it” version. What I mean is that in so many languages you have the situation where the docs, books, and examples everywhere show you the idiomatic way to do things. But then you find in blog posts and sites like StackOverflow constant examples of how to do things for optimal performance that turns out to be very different code from the idiomatic way. If you can write it in the idiomatic way and obtain the same results as doing it the hard mode way, that’s a great compiler.

3 Likes