The topic of the post morphed into the use of multiple allocators with the various methods of ArrayListUnmanaged, and I wanted to follow up on this separately without contributing additional noise to the original topic. Specifically, I wanted to use something @ericlang said as a jumping-off point:
Theoretically appending to the list isn’t a problem in-and-of itself. Since each allocator instance maintains a list of its own allocations (I think), allocating is not an issue.
However, potential issues arise when memory is freed. Depending on which combination of allocators are used, the behavior could differ. Maybe some combination will cause segfaults while others cause memory leaks. Or maybe everything works just fine for now ®™.
I can’t help but wonder that the “unmanaged” flavor of containers will cause a lot of confusion and subtle, hard-to-track-down behaviors, especially since an Allocator interface is being passed around which makes it almost impossible to know what the concrete implementation is. Given these issues, it seems a bit unfortunate that the “managed” containers seem to be on the path to deprecation.
One counter-argument for this would be to just use a single allocator, but then that kind of defeats the purpose of the Allocator interface (and allocators in Zig as a whole).
I’m sure there’s an aspect of the “unmanaged” containers that I’m not considering that y’all will let me know about. And to be clear, I’m not saying that the “unmanaged” containers are objectively bad - there are certainly cases for them. I think I’m just struggling to understand why the “average” person should use them and be responsible for ensuring they’re using the right allocators over using a “managed” container, which pretty much makes this a non-issue.
You can’t expect that an allocator has an explicit list of allocations, basically the only thing that is required is that if the alloc succeeded that the free will work too (with the allocator that was used to allocate).
This makes it possible to use arena/bump allocators which mostly just allocate big chunks of memory and then return smaller pieces of that memory to callers of alloc while either completely ignoring free calls or only honoring the last allocated one (because that one can be freed without introducing fragmentation/holes into the bigger memory chunks). Because those allocators don’t have a way to track individual memory pieces they can’t free individual memory blocks so they instead retain the memory until someone calls .reset or .deinit on the allocator.
Basically every allocator can choose a different strategy how to handle allocation and deallocation.
The benefit of the unmanaged variants is that when you create something that uses multiple data structures for example multiple ArrayLists and HashMaps it becomes possible to share the allocator between all those datastructures in a single field, cutting down on unnecessary redundancy and basically restoring the managed style just that it now is managed across a group of datastructures and in a way that makes sense for your application code. Things that are managed via specific structs can also share one or more allocators (either via shared fields or function parameters).
You also gain the option of not storing the allocator in a field and writing the code in a way where you only require the allocator once you actually allocate something. This means that initializing your data structure can become more simple, where a complex data structure composed from simpler ones can be initialized via an .empty decl literal.
That was exactly what I was thinking. There are not much cases thinkable in which you want multiple allocators for one list. That’s why I was kind of flabbergasted when reading about the deprecation of ArrayList.
But people who decided that are probably much smarter than me. I also should have a look at the unmanaged lists code in more depth.
I hope we do not have to have multiple allocators for one byte in the future
Just my two cents, but I often find that the argument “against” generally boils down to some theoretical confusion of which allocator was used to allocate a list, and the programmer being confused as to which allocator to use when freeing it, and/or this somehow being antithetical to using multiple allocators.
My issue with these arguments is that they always seem entirely theoretical, but I can never envision nor have ever seen a real-world example where such a scenario would manifest. The ability to pass in the incorrect allocator is not somehow unique to the containers defined in the standard library, literally any struct that doesn’t carry around an allocator with it is subject to the same “confusion”, but we never talk about these situations because in the real-world this is not an actual issue, and the use of multiple allocators is rarely even used with this level of granularity where it would even become a possible issue.
I am not against the use of “managed” lists, but I do think that the “Unmanaged” should be the default (i.e. ArrayList and ArrayListManaged), as it is adheres much closer to the ZIg practice of being explicit. If I am passing in an allocator, I should expect that this function can/will allocate, otherwise not. Very simple, no surprises, and not just surmising as much based on the error union, if it even chooses to return the OutOfMemory error.
To be clear, I wasn’t necessarily arguing “against” any particular thing here. I really just wanted to get some insight into the benefits of using unmanaged containers from folks whom are better at writing Zig and understanding the ecosystem than I, but I can see how my ignorance came across as such.
Admittedly, this post was theoretical, but the theory is far from impossible. You can absolutely have multiple allocators and free memory with the wrong one, even if it’s bad code.
I do appreciate the semantics of explicitly passing an allocator to a function that allocates memory. In that regard, the change seems better suited for Zig’s idea of avoiding hidden allocations.
I am not against either, I fully support the inclusion of both in the standard library, nor do I really don’t feel strongly either way. I do think that explicitly requiring an allocator for function that can allocate is more in harmony with the rest of the standard library, and additionally how people actually write Zig in the wild.
In my opinion, a struct carrying around an allocator is perfectly fine, but typically I would confine this to more complicated types, whereas lists and whatnot feel like one tier above language primitives (and often are in other languages). More often than not, a container is being used within the scope of a more advanced type, which may have multiple fields requiring an allocator, so it makes more sense to have this parent type carry the allocator, and pass it down to its children where needed. This is why I mentioned that in my opinion, “unmanaged” should be the default, and “managed” be the type with the qualified name.
All of this is obvious bike-shedding on my part, I won’t be upset if they choose to keep the status-quo, but I do support the change, it personally makes sense to me. I found myself exclusively using “unmanaged” variation after a couple months of using Zig with the exception of a few one-offs and short-lived locals.
std.ArrayHashMap is now deprecated and aliased to std.ArrayHashMapWithAllocator.
To upgrade, switch to ArrayHashMapUnmanaged which will entail updating callsites to pass an allocator to methods that need one. After Zig 0.14.0 is released, std.ArrayHashMapWithAllocator will be removed and std.ArrayHashMapUnmanaged will be a deprecated alias of ArrayHashMap. After Zig 0.15.0 is released, the deprecated alias ArrayHashMapUnmanaged will be removed.
So the unmanaged variants will end up with the simple name and the managed ones will be removed.
I think this is what I needed to hear. It’s essentially what @Sze said…
…but I feel like your explanation resonated more with me - at least, made me better understand what they were saying, and why the unmanaged containers should be used (not that their reply was poor or anything - quite the opposite).
I played around with the unmanaged ArrayLists during the current AOC and I find it quite annoying. Yes, I can wrap them and share an allocator, but I think this is a lot of boilerplate code and complexity that is loaded on the programmer, which could be handled by the stdlib. I am fine with unmanaged being the default, but I would highly appreciate it if the stdlib provides a managed variant. I don’t know why we can’t have both.
Thanks for your answer.
I know that the managed variants still exist, but I would like to use them in the future. I think the additional maintenance burden is still worth it. Otherwise there is a much higher maintenance burden for a plethora of programmers who chose to wrap the unmanaged variants for easier usage.
If you want to keep it, you can literally copy it from std. Possibly in the future, when zig and its std lib are nearing 1.0 it might be brought back as they’d likely have more resources for maintenance.
the managed variants allow you to be more ignorant of allocation, if you don’t want to think about allocation then you will find managed containers to be more convenient.
Zig sees that as a problem, it is objectively redundant and less efficient. And can hide code smells such as allocating in a loop.
Zig wants these things to be clear, even if you choose to do it anyway, so that if you decide you do care about efficiency in the future, you can much easier see where your issues are.
Zig is a language, and std lib, that caters to efficient and robust software; AOC is a fun activity for the holidays, while it is a decent way to get acquainted with a language it is not the target application for zig so you shouldn’t judge zig based solely on your experience with AOC.
After thinking about it, I think if std were to have managed containers I would prefer it to be just this:
The benefit of managed is the association with the allocator, and the benefit of unmanaged is the explicitness of operations that need allocation.
This has both of those, without the undesirable part of managed. see above
Based on the info in the release notes, I think a good decision was made to only include unmanaged collections in the std library. But I’d like to suggest that the reasons above are not necessarily good rules for creating libraries in general. If someone else published a library of collections that took the managed approach, would this be considered un-ziggy? I think there is nothing wrong with making a decision such as: this is so commonly need, we’re going to provide it in a library. The source code is still there and it is possible to name types and functions such that it is clear what they do.
I definitely think there is “information leakage”, when you have to pass in the allocator to use, since you will crash if you pass the wrong one, and the container does a realloc. So the owner needs to know an implictly associated allocator with the specific container.
Which reminds us of something alluded to, but not given a lot of visibility: it’s valuable to be able to use one allocator for, e.g., growing a list (so you’d pass that allocator for an append()), but another for allocating data structures that may be within those nodes. A “second-tier” container might actually hold on to two (or more) allocators - one which is for appending (and removing), and one which is for building some content out of args provided by a user of that (second-tier) container.
Perhaps a wide audience will want what zig has to offer, but will be “happy” with certain exceptions to the zig “norm” of explicitness; perhaps this (and some long-lived Io operations, with Io objects in the interface of all functions) will be such acceptable exceptions. If the pattern is to always name a thing X, and make an XManaged version, and set up the Managed like @vulpesx suggests:
…in order to keep maintenance low, I would think that could satisfy everybody.
I can imagine many starting a project with an XManaged… then, later, deciding they don’t want that, after all, and revert to X! (But I suppose the opposite could be true, too.)
EDIT: I realize, on closer inspection, that @vulpesx ‘s suggestion looks even simpler: rather than a an XManaged variant for every X, the proposal seems to be a single Managed container class that does nothing but holds the unmanaged together with an allocator - then you’d use this for any Array, List, etc. that you want. There are patterns within Zig already (like Io.File.Reader’s interface to Io.Reader) that this could adhere to.
I think such problems can all be addressed by only appearing in the ‘runtime_safety’ field (although I don’t think the zig standard library has any intention of including this implementation).
return struct {
const Self = @This();
/// Contents of the list. This field is intended to be accessed
/// directly.
///
/// Pointers to elements in this slice are invalidated by various
/// functions of this ArrayList in accordance with the respective
/// documentation. In all cases, "invalidated" means that the memory
/// has been passed to an allocator's resize or free function.
items: Slice = &[_]T{},
/// How many T values this list can hold without allocating
/// additional memory.
capacity: usize = 0,
safety: if (runtime_safety) struct {
expected_allocator: ?std.mem.Allocator,
fn validateAllocator(self: *@This(), allocator: std.mem.Allocator) void {
if (self.expected_allocator) |expected| std.debug.assert(allocator == expected) else self.expected_allocator= allocator;
}
} else void,
const runtime_safety = switch (builtin.mode) {
.Debug, .ReleaseSafe => true,
.ReleaseFast, .ReleaseSmall => false,
};
...
/// Extend the list by 1 element. Allocates more memory as necessary.
/// Invalidates element pointers if additional memory is needed.
pub fn append(self: *Self, gpa: Allocator, item: T) Allocator.Error!void {
if (runtime_safety) self.safety.validateAllocator(gpa);
const new_item_ptr = try self.addOne(gpa);
new_item_ptr.* = item;
}
}
Yes, that’s true and is one way to look at it. To me it seems not worthwhile, since if all it is doing is putting the allocator in the same struct as the list, then this only helps in the specific situation where you need nothing but the allocator and the list in a struct. The more general case is where you need the allocator, list, and other related things. Or perhaps two allocators as someone else said, one for the list and one for its allocated items.