Optimized runtime known array indexing

I’m attempting to optimize the following function.

/// Swap each byte in data based on the passed lookup table.
pub fn translate(data: []u8, lut: [256]u8) void {
    assert(data.len > 0);
    assert(data.len % 256 == 0);
    for (data) |*px| {
        px.* = lut[px.*];
    }
}`

However I’m having trouble figuring out a way to index into the lut array
efficiently? @shuffle can be used only when the mask is known at compile
time and this would be a lot faster than the implementation above.

/// Swap each byte in data based on the passed lookup table.
pub fn translate(data: []u8, lut: [256]u8) void {
    assert(data.len > 0);
    assert(data.len % 256 == 0);
    const lut_vec: @Vector(256, u8) = lut;
    var cursor = data[0..];
    while (true) {
        const chunk: @Vector(256, u8) = cursor[0..256].*;
        // this will fail with: `note: shuffle mask must be comptime-known`
        const result: [256]u8 = @shuffle(u8, lut, undefined, chunk);
        cursor[0..256].* = result;
        cursor = cursor[256..];
        if (cursor.len == 0) break;
    }
}`

There is an accepted proposal for this: Indexing arrays with vectors (gather) · Issue #12815 · ziglang/zig · GitHub
Andrew also posted a workaround there (which might not be that efficient though)

What assembly are you getting with the naive version?
With PRO slotted for removal, you should probably pass the lut by const pointer, and mark it as noalias. See what assembly you get out of this and work from that.

1 Like