I think sometimes remove just isn’t needed, in those cases this organization makes it super easy.
If you need remove, I think there are a bunch of different options, that aren’t necessarily way harder.
One is to rewrite all the data without holes to a new instance, that is basically a copy (rebuilding the data structure without what should be removed), removing the old one. (this is basically like data-structure-specific garbage-collection without having to keep track of things in a complicated general way)
For the diseases example you also can swap-remove a disease (if order / index stability isn’t needed) and when adding a new one, you could use the ArrayList in a way, where the unused capacity slice contains valid off+len slices
, so basically your freelist becomes the unused capacity slice. (And the ones that don’t correspond to removed locations would have len 0)
Because the diseases array indexes into the locations array and the former is likely indexed externally while the latter probably is not, you also have the full information available and ability to re-organize internally, you basically could write an in-place compaction method that moves locations forward to fill in-gaps created by removed diseases and their locations.
If you need index stability for the diseases, you could mark removed diseases with a zero length and add a new free-list entry, basically leaving the disease value there but marked as a tombstone, you then could re-use the offset and use it to index to the next tombstone. That way you just need a sentinel value that means end of list (for example maxInt(Index)) and a first_tombstone
index to have a list of indices that can be used for newly added disease entries.
For removing you can remove them and compact the locations in place, change the len and create a new disease that describes the unused memory and add it to the unused capacity slice that is used as free-list.
For growing the locations you can go through the free-list (which you could sort periodically), find one that is big enough, copy the data over, add the new data and swap the indicies (of the disease and the disease in the free-list), potentially doing the removing part for the remainder that isn’t used (creating a free-list entry for it).
I agree that it has some similarity to allocators, or more specifically it is like a memory pool for variable sized slices and it allows you to relocate the slices, because you know where the metadata is stored and you can distinguish between data that can be identified by an id externally and data that is “internally” managed (basically data that can be moved around, because the data-structure holds all the valid indices into the locations array).
I think another approach that can be useful (especially if there are some limited number of expected slice sizes) is to instead store every size of slice as their own group, which turns the index into a handle that can be used to find the right size-group and than the index in that group.
I used something similar to that in my HAMT and StencilArrays implementation, I think for data structures that have nested tree like structure, it might help with making it easier to replace and reuse nodes, because all nodes that share a size are part of the same memory block, which makes it easier to reuse memory (without copying around other memory for compaction purposes).
I think it could cut down on the amount of wasted memory / needed reorganization, if you have a lot of adding and removing.
But I haven’t investigated that in detail and it would vary based on how the data-structure is used.