Proposal: Move pointer stability to an Allocator

Background

Currently, all std.hash_map and std.array_hash_map variants implement pointer stability via a pointer_stability: std.debug.SafetyLock field to catch unwanted pointer modification in safe build modes. The same mechanism will likely be used in the future for std.ArrayList* (Issue #19326, PR #22992) and std.MultiArrayList (Issue #19327).

The implementation will probably always look the same:

  1. Add a pointer_stability field to the container
  2. Add this to every function that might modify allocations:
self.pointer_stability.lock();
defer self.pointer_stability.unlock();
  1. Profit

Proposal

Why not move this functionality to an Allocator? The implementation would be very straightforward, basically just an allocator that calls pointer_stability.lock() and pointer_stability.unlock() before/after every alloc/remap/free:

std.debug.LockableAllocator:

child_allocator: std.mem.Allocator,
pointer_stability: std.debug.SafetyLock = .{},

const Self = @This();

fn alloc(...) ?[*]u8 {
    const self: *Self = @ptrCast(@alignCast(ptr));
    self.pointer_stability.lock();
    defer self.pointer_stability.unlock();
    return self.child_allocator.rawAlloc(...);
}

fn resize(...) bool {
    const self: *Self = @ptrCast(@alignCast(ptr));
    return self.child_allocator.rawResize(...);
}

fn remap(...) ?[*]u8 {
    const self: *Self = @ptrCast(@alignCast(ptr));
    self.pointer_stability.lock();
    defer self.pointer_stability.unlock();
    return self.child_allocator.rawRemap(...);
}

fn free(...) void {
    const self: *Self = @ptrCast(@alignCast(ptr));
    self.pointer_stability.lock();
    defer self.pointer_stability.unlock();
    return self.child_allocator.rawFree(...);
}

This would make it very easy to add pointer stability checks to existing container types, including ones from external dependencies.

Considerations

There are two main problems I see with this:

  • Pointer stability assertions in move functions
  • Accessing the pointer_stability field from functions that take the protected container as a parameter

About the first problem:
Right now there is a self.pointer_stability.assertUnlocked() in the hash map move() functions. I would argue that this is misplaced anyway, since the actual memory doesn’t get changed in these functions, it’s just the owner of the memory that changes. So I don’t really see the point in asserting pointer stability here, please correct me if I’m wrong.
The only other use of .assertUnlocked() is in the deinit() functions which free memory anyways, so locking + unlocking the SafetyLock in free provides an even stronger guarantee here.

About the second problem:
This is obviously a bit of an inconveniance. Currently you can just pass a *std.HashMapUnmanaged to any function and then access .pointer_stability directly to check whether it’s (un)locked. With this proposal you would either have to do this explicitly as a separate parameter or build a wrapper around the protected container that contains a reference to the pointer stability lock.

But I think it’s still worth it for the added flexibility and ease of implementation. Also not every function needs to check for pointer stability. What if you just want to look up whether a hash map contains an entry? In that case you don’t care about where this entry is stored anyways.
IMO this change would promote explicitness and make the programmer think even harder about whether a function cares about pointer stability or not :slight_smile:

I just thought of this while showering and definitely not long/hard enough to open a GitHub issue/PR, please let me know your opinion on the proposal and whether I’m missing any obvious drawbacks to this approach!

1 Like

I like this idea.

I don’t think this is a problem. Why would access .pointer_stability? The point of having this flag is so that you can set it once and then completely forget about it, calling your normal functions with the assumption that pointers won’t change. If this assumption is wrong, you’ll trip an assertion.
We could just go all in and remove the concept of pointer stability from the map itself. You don’t even need a new allocator for it. In the parts of the code where you don’t want pointers to change, you just pass failing_allocator.

Instead of:

map.lockPointers();
// foo should not fail at this point
foo(map, allocator) catch unreachable;

You have:

foo(map, failing_allocator) catch unreachable;

Or, like you mentioned, you could trip the assertion inside the allocator itself.

1 Like

It’s because a *HashMap is a pointer to a pointer (to a struct which has a pointer, rather). But two *HashMap can point to the same HashMap, and try to mutate that pointer. move changes the target of that pointer, and that can get you rekt.

1 Like

Ah, that makes sense of course. It’s a shame that move() doesn’t take an allocator…

1 Like

That could change, given that you can’t really operate a HashMap† without its allocator nearby.

I don’t think the change is likely, because any responsibility deferred to the Allocator interface has to be implemented by every backing allocator. A general purpose data structure like a HashMap can’t be made to rely on a specialized implementation either.

Have a safety lock in safe modes can prevent a lot of hassle, and it’s pretty cheap if there isn’t contention for the resource. It might make more sense to provide a HashMapOptions which can explicitly set the safety level for the type, although there are issues there because now your safe HashMap is a different type from your unsafe HashMap, if they happen to mix, which they inevitably will.

But you can get that now, if you’re willing to go through some hoops for it: define a module to hold your living-dangerously types, set it to ReleaseFast in build.zig, and import the types from there.

This is the kind of thing which really only makes sense after you’ve run a profiler on your benchmarks, you’ve seen that some HashMap operation is taking excessive time, and you’ve seen that there’s big savings if you switch from ReleaseSafe to ReleaseFast. At that point doing a little bit of data shuffling is no big deal.

† I’m going to keep future-proofing these discussions by saying HashMap for HashMapUnmanaged and so on.

1 Like

That’s true, especially because clearAndFree() already takes an allocator and from the perspective of a HashMap it does the same thing as move().
I guess in situations where you want to move the contents of a ‘mutable’ hash map to an ‘immutable’ one which doesn’t require an allocator you could just transfer ownership of the entire hash map instead.

This speaks against that change of course. Maybe move() could be changed to truncate the containers capacity and there could be a separate moveRaw() that doesn’t take an allocator and behaves like the current move() but without the pointer stability check, though this would still be kind of in service of this specific implementation.

I’m not that concerned about the performance impact of a safety lock, if you’re willing to build in ReleaseSafe mode you’re already opting into taking a hit on that front IMO. It’s more about ease of implementation and interoperability with third party data structures.
Being able to assert pointer stability by passing a special allocator to existing data structures without this feature would just be very convenient.
Of course you wouldn’t get pointer stability checks ‘for free’ with stdlib hash maps etc. anymore but I think this is in line with Zig’s general explicitness.

1 Like