Why get a different virtual address on re-allocation?

dude_the_builder · April 3, 2024, 6:39pm

This is not about Zig, it’s about memory management at the hardware and OS level. Let’s consider this common scenario:

I allocate 2 bytes and get back addresses 0x0 and 0x1 from the OS.
I take pointers to those two addresses: ptr_0 → 0x0 , ptr_1 → 0x1.
I need to grow the allocation to 3 bytes. The OS re-allocates, returning the addresses 0x7, 0x8, and 0x9. Now ptr_0 and ptr_1 are dangling pointers.

If the OS deals only with virtual addresses and not the actual physical addresses, why can’t it just re-assign 0x0 and 0x1 to the newly allocated memory region and avoid the dangling pointer problem altogether?

dimdin · April 3, 2024, 7:09pm

It is not an OS limitation. There is a hardware limitation.

You can preallocate some amount of virtual memory using mmap/VirtualAlloc. This memory is not commited, that means that you have a virtual space allocation (a specific address and size) without a physical mapping. Each time you actually need memory you can commit pages (usually a page is 4KiB). Each page is mapped to some physical memory space.
That means that a zig allocator can be implemented that does exactly what you want.

And now lets try to implement a generic allocator for that. We need to specify the size of the not commited virtual allocation. Each time we allocate we must call mmap/VirtualAlloc with that size. Lets assume that we are not going to exceed 64KiB, that means that we need 16 pages and 16 virtual to physical mapping descriptors (or something similar according to the memory management hardware). But these mapping descriptors are not free, they are limited resources.

That limitation means that every C/C++/… allocator fits small objects in some bigger OS allocation. So when you try to resize your small object, the allocator must move the object elsewhere.

dude_the_builder · April 3, 2024, 7:18pm

That’s very interesting. After writing the topic, reading your reply, and thinking it through a little more, I realize that even if it’s possible to implement an allocator that behaves the way I stated, it would get really messy really fast. Following the very simple example, if when re-allocating the new sequence of virtual addresses would overlap with an existing allocation (say 0x3 was already in use) then I would have to keep track that the new sequence would be 0x1, 0x2, 0x9. This is added overhead and also prevents contiguous regions of memory which hinders cache locality. So I think this is definitely not a good idea.

Sze · April 3, 2024, 11:14pm

I think if you have a lot of very tiny allocations that stay very tiny that is correct.
However if you have something that behaves more like an arena that is steadily growing, because you need to keep around a lot of things over time, then pre-allocating a big virtual address space and only backing it with physical pages on demand really could make sense.

Because that way you get pointer stability without having to pre-allocate all the memory upfront, but you also get the ability to treat it as one contiguous memory space from the point of view of the application. With Zig’s ArenaAllocator you don’t get a contiguous (virtual) memory space.

It can be a good idea, but you have to think about if your application really can benefit from this. I guess you could argue that the virtual pre-allocation method isn’t really re-allocation, it just pre-allocates a big virtual memory space and then on demand commits the pages, where the re-allocation just allocates exactly what is needed (+ whatever overhead your allocator hides from you) and copies it to a new location if needed.

So while I wouldn’t call it a re-allocation strategy, because it seems more like pre-allocation. The fact that you can do the pre-allocation virtual makes it a lot more practical, because you have a lot more virtual address space you could choose to “sacrifice” then you have physical memory.

This is a good video about some of the fun things that can be done with page mapping:

The video also mentions GetWriteWatch (windows), it seems on linux there isn’t an equivalent user space api, closest thing I found is this but I don’t know if it is fully userspace: c++ - Can the dirtiness of pages of a mmap be found from userspace? - Stack Overflow
But the second answer seems like a useful compromise, setting the page to read-only, catching the signal and then setting it to write allowed. Not digging deeper into it at the moment, but it could be a fun thing to explore in the future.

dude_the_builder · April 3, 2024, 11:57pm

Awesome video! Thanks for sharing this. And yes, its definitely a tempting area to explore and have some fun playing with virtual memory mapping and page tables.