Is there a way to enable SIMD in WebAssembly in ReleaseSmall mode?

Hi! I am building a small WebAssembly library, and would like to make use of the automatically generated SIMD instructions by LLVM.

I have manually added the .simd128 CPU feature, and I can see the compiler generates SIMD instructions by running wasm2wat (stuff like v128.store offset=2120 align=8).

However, this only happens if I set the optimize mode to .ReleaseFast, which results in a fairly large binary (~800 KiB).
I would like to have these instructions generated when I compile with .ReleaseSmall (~50 KiB).

Is there a flag or setting I should change to achieve this?

Here’s the build file I am using (with the latest nightly):

const std = @import("std");
const builtin = @import("builtin");
const CpuFeature = std.Target.Cpu.Feature;
const WasmFeature = std.Target.wasm.Feature;
const featureSet = CpuFeature.feature_set_fns(WasmFeature).featureSet;


pub fn build(b: *std.Build) void {
    const target = b.standardTargetOptions(.{});
    const optimize = b.standardOptimizeOption(.{});

    const lib = b.addSharedLibrary(.{
        .name = "lib",
        .root_source_file = .{ .path = "src/main.zig" },
        .target = target,
        .optimize = optimize,
        .use_llvm = true,
        .use_lld = true,
    });
    lib.rdynamic = true;

    if (lib.target.cpu_arch) |arch| {
        if (std.Target.Cpu.Arch.isWasm(arch)) {
            lib.optimize = .ReleaseFast;  // setting this .ReleaseSmall does not make use of SIMD.
            // lib.target.cpu_features_add = std.Target.wasm.cpu.bleeding_edge.features;
            lib.target.cpu_features_add = featureSet(&[_]WasmFeature{
                .mutable_globals,  // from generic.features
                .sign_ext,         // from generic.features
                .simd128,          // from bleeding_edge.features
            });
        }
    }
}
1 Like

I think the problem is that compiler-generated SIMD code is often quite complex and long because it needs to deal with all sorts of alignment problems.
Check out this simple copy functions here in godbolt. As you can in ReleaseFast it requires 8 times more assembly code and I think most of that extra code is needed to ensure the alignment for the SIMD portion.

I think if you want it to use SIMD, it’s probably best to write the critical sections yourself using Zig’s bultin vector types

(Also note that there is an accepted proposal that would allow you to set the release mode to ReleaseFast on a function level: replace @setRuntimeSafety with @optimizeFor · Issue #978 · ziglang/zig · GitHub)

4 Likes

Thank you for the insights, it makes a lot of sense, I’ll try that.