Zig’s pbkdf2 algorithm is significantly slow on WSL

The implementation of the pbkdf2 function is a lot slower on WSL. But in windows it was quick.

As I was writing it, I tried it again after compiling with -OReleaseSafe flag and it was much better. It’s still weird that it was fast in windows even on debug mode.

const std = @import("std");

const salt_length = 32;
const key_length = 32;

const rounds = 100_000;
const Prf = std.crypto.auth.hmac.sha2.HmacSha256;

pub fn main() !void {
    const passphrase = "qweqwweqwe";

    var salt: [salt_length]u8 = .{
        53,  24,  168, 223,
        200, 253, 14,  108,
        197, 188, 12,  72,
        30,  157, 106, 216,
        16,  110, 217, 240,
        222, 194, 23,  82,
        112, 162, 164, 149,
        235, 129, 74,  125,
    };

    var key: [key_length]u8 = undefined;

    try std.crypto.pwhash.pbkdf2(&key, passphrase, &salt, rounds, Prf);

    std.debug.print("{any}\n", .{key});
}

and the C version was much faster on the same platform without any optimizations

#include <gnutls/crypto.h>
#include <gnutls/gnutls.h>
#include <stdio.h>

#define salt_length 32
#define key_length 32

#define rounds 100000

int
main (void)
{
  gnutls_datum_t passphrase = {
    .data = "qweqwweqwe",
    .size = 10,
  };

  gnutls_datum_t salt = {
    .data = (unsigned char[salt_length]){
        53,  24,  168, 223,
        200, 253, 14,  108,
        197, 188, 12,  72,
        30,  157, 106, 216,
        16,  110, 217, 240,
        222, 194, 23,  82,
        112, 162, 164, 149,
        235, 129, 74,  125,
    },
    .size = salt_length,
  };

  gnutls_datum_t key = {
    .data = (unsigned char[key_length]){ 0 },
    .size = key_length,
  };

  gnutls_pbkdf2 (GNUTLS_MAC_SHA256, &passphrase, &salt, rounds, key.data,
                 key.size);

  for (size_t i = 0; i < key.size; i++)
    {
      printf ("%hhu ", key.data[i]);
    }

  putchar ('\n');

  return 0;
}

btw, I’m not sure if these are the same algorithms. The keys it generated were the same.

Pbkdf2, HMAC and Sha-256 are pure computations. Since they run in the same CPU they must take the same time.

How do you measure it?
Also note that WSL clock can be unreliable and WSL process may run in a different less performant CPU or with smaller frequency.

1 Like

I didn’t really measure it. It just took like 5 seconds. While the C version was instant. It’s much better in release mode though, so there’s nothing to complain

Another reason for the performance difference between windows (native target) and wsl (cross compile target) might be using -mcpu=native vs -mcpu=baseline.
When you don’t use the -target flag, the default cpu is native that means everything is enabled for you cpu (e.g. SMD), otherwise it is baseline that means assuming a really old cpu.

Sorry for the late reply. The reason why I posted this is that the C version was much faster even on WSL. Now I think maybe it’s because the gnutls library is already optimized and zig version was built in debug mode.