Zmath library not using builtins?

While looking at the source code of gamedev/zmath (zmath/src/root.zig at main · zig-gamedev/zmath · GitHub), I got very confused why a simple dot product of two vectors is not just @reduce(.Add, v1 * v2). Also, why did they implement their own sin and cos functions for calculating the sinus when we have the @sin and @cos builtins? Did they find the builtins to be slower and they chose to go for software approximations instead?

Here are some of the functions referenced above

fn sin32(v: f32) f32 {
    var y = v - math.tau * @round(v * 1.0 / math.tau);

    if (y > 0.5 * math.pi) {
        y = math.pi - y;
    } else if (y < -math.pi * 0.5) {
        y = -math.pi - y;
    }
    const y2 = y * y;

    // 11-degree minimax approximation
    var sinv = mulAdd(@as(f32, -2.3889859e-08), y2, 2.7525562e-06);
    sinv = mulAdd(sinv, y2, -0.00019840874);
    sinv = mulAdd(sinv, y2, 0.0083333310);
    sinv = mulAdd(sinv, y2, -0.16666667);
    return y * mulAdd(sinv, y2, 1.0);
}

pub inline fn dot2(v0: Vec, v1: Vec) F32x4 {
    var xmm0 = v0 * v1; // | x0*x1 | y0*y1 | -- | -- |
    const xmm1 = swizzle(xmm0, .y, .x, .x, .x); // | y0*y1 | -- | -- | -- |
    xmm0 = f32x4(xmm0[0] + xmm1[0], xmm0[1], xmm0[2], xmm0[3]); // | x0*x1 + y0*y1 | -- | -- | -- |
    return swizzle(xmm0, .x, .x, .x, .x);
}

Thanks in advance!

Unlike elementary arithmetic like addition and multiplication, transcendental functions like sin or cos are extremely difficult for computers to perform. Accurate implementations will always be slow by virtue of having lots of branches and table lookups that prevent vectorization. Have a look at the C source code for a correctly rounded sin implementation for 32-bit floats and you’ll understand why it’s slow (the 64-bit version is even crazier).

zmath’s sin32 uses a polynomial approximation which is extremely fast and can be vectorized (see sin32xN), but is also extremely inaccurate. How much this matters depends on what you’re using it for; since zmath is commonly used for graphics and matrix transformations and not scientific computations these inaccuracies most likely won’t be noticeable in practice, but it’s important to be aware of the difference.

Another upside of zmath’s custom sin32 implementation is that it’s produces the same results on all targets (provided you build zmath with the enable_cross_platform_determinism option). Zig’s @sin builtin does not provide the same guarantees (or any guarantees for that matter); it might use Zig’s compiler_rt sin implementation (which is mostly accurate but not correctly rounded), or the legacy x87 FSIN hardware instruction (which is both slow and inaccurate), or some other target-specific intrinsic.

10 Likes