Improving HashMap using the recent paper

igors84 · September 11, 2025, 6:30pm

So I stumbled upon this recent paper that explains how they found a way to make hashmaps much more performant than was previously thought possible. If I managed to get AI to correctly explain it the main revelation is that when a collision happens you should not linearly probe the following slots. Instead you should use a secondary hash function (or different hash seed) to hash the key and derive a step from that hash. You then probe the slots that are step slots away from the first slot modulo capacity:

// h2 determines the probe step. It must be odd to be coprime
// with the power-of-two capacity, ensuring we visit every slot.
var h2 = std.hash.autoHashSeed(self.hash_seed + 1, key);
var step = @as(usize, @intCast((h2 & (self.capacity - 1)) | 1));
while (slotNotFound){
  ...
  // instead of idx = (idx + 1) & (capacity - 1);
  idx = (idx + step) & (self.capacity - 1); 
}

Does anyone have a good real world use case they can benchmark this solution on vs the built in solution that uses idx + 1?

cryptocode · September 11, 2025, 7:15pm

Note that secondary hashing isn’t novel, see for example Cuckoo hashing (the paper uses a non-reordering approach though). Their contribution is more related to lowering the worst-case bounds using funnel hashing, i.e. you get decent performance even near capacity. If I understand it correctly, funnel hashing is done by adding increasingly smaller extra-arrays, forming a funnel. They presumably still need to resize and reindex at some point.

Sze · September 11, 2025, 7:34pm

Having a rough look at the paper it seems to talk exclusively about theoretical complexity and not at all about implementing that on any kind of actual hardware.

I think it is quite likely that different probing strategies have a different impact, based on the hardware it is running on (where linear is likely better than jumping around in memory), if somebody can take this paper and design an implementation based on it and demonstrate that it runs faster on actual hardware, than I am more interested / impressed.

Before it is likely something that matters more for people who have to deal with much bigger scaling issues than I am dealing with or are into theoretical research for its own sake (which is fine and eventually useful, just not that interesting to me).