One shot programs
I definitely see the value of using catch @panic("Out of Memory")
in short running programs that have a simple defined task, do that task and then exit. For example build scripts, code/asset generation, utility tools.
I think one of the questions you need to ask yourself is whether your application truly is such a short running “one shot” program, if it is it is probably a good strategy.
If it processes multiple data sets, allocating memory, freeing a lot and then allocating other stuff, then I don’t think crashing is a good choice.
Basically if your memory is monotonically increasing and then you hit finish, then crashing (exiting) is the best choice, because it is faster then freeing everything in reverse allocation order piece by piece, just so that you can exit. But technically in that situation you don’t need to crash, you can just call std.os.exit
.
Multi phase programs / bundles of different tasks
I think if you have a bunch of different phases where your memory usage grows, because you work on something; and then it shrinks, because you are distilling the results. I think you have 2 options:
-
Use arenas and think in terms of batches, failing in batches and recovering in batches (related: Enter The Arena - Ryan Fleury)
This allows you to just say: “for these things, if anything fails we just abort the whole section, resetting the entire arena and pretending we never started”.
However it is important to make sure you don’t keep some database connection, or something like it, half opened, where you loose the handle and thus don’t know how to close it. Basically don’t lose/leak external resources in the swept away trash pile memory.
-
Do everything “one piece at a time”:
- create one object
- errdefer destroy it
- add it to a list
- errdefer pop it from the list
- add its index in some other data structure
- errdefer remove the index from the other data structure
- do something else that can fail
- everything worked return the object
Using this you get methods that are able to rollback to the state before the function was called, because if an error happens the errdefers undo the partial successes of the function. Thus these functions hide partial success, by undoing it, giving you complete success or complete failure.
These functions are nice because they allow your program to reverse back from failure, convert the fine granularity error to a bigger granularity error, leaving you with less complexity at the call site.
You have “success or fail”, not: “success or it failed and I need to check if the half created object is still in some list” at the call site.
With strategy 1. I still don’t have a lot of experience, I used it a tiny bit with some old gui code, but not enough to really test it to its limits.
With strategy 2. here is an example:
pub fn createArchetype(self: *Self, atype: AType, new_id: u32) !*Archetype {
var archetype = try self.allocator.create(Archetype);
errdefer self.allocator.destroy(archetype);
archetype.* = try Archetype.init(self.allocator, &self.component_infos, new_id, atype);
errdefer archetype.deinit();
try archetype.calculateSizes();
return archetype;
}
This is from a work in progress ECS, which brings me to another point, by using such an ECS (which uses Archetypes which are basically batches of memory containing similar objects) I am kind of using both strategies:
- strategy 2. to manage batches
- strategy 1. by using these batches in a way that is somewhat similar to strategy 1.
by allowing me to manage memory as batches of higher granularity
Because I have the batches, there are code paths where it is already clear that the memory is already allocated, thus I don’t have to deal with it on the instance level anymore.
Instead I get an error when I try to add a new instance and the batch tries to grow and doesn’t have enough memory. (If I deal with an individual object I may have to deal with the error individually)
batching / assume capacity
But it is often times possible to, for example check if the school bus is big enough for the whole class, instead of requesting a seat for each pupil individually, which allows you to move (allocate) the whole bus, instead of one pupil at a time.
Which is the assumeCapacity
case @AndrewCodeDev mentioned.
The logging Andrew mentioned is a interesting case, might be worth to explore the idea of a logging library that uses something like assumeCapacity
to either log the entire message or none of it, by pre-computing and reserving a big enough chunk of memory, but I am not sure how that would turn out. I think it would have to be explored as an actual experiment, to see whether that could have some benefit to it.
asserts
I also agree that asserts are great, but they are more for cases where some invariant wasn’t upheld, so I think they are for situations where some programmer tries to use something in an unintended way, to prevent that from being possible.
Garbage collection
I also think there is a 3. strategy which is, you build something that in some way starts looking like / is a garbage collector.
You have some heuristic that triggers “memory is getting too filled we need to cleanup”, then you walk a data structure and figure out what is garbage and remove it. Possibly reorganizing the remaining things and then you continue running the program.
Typically garbage collection gets triggered before you hit out of memory, but you also could use an out of memory error as the trigger.
- It could back away from an error with strategy 1. or 2., garbage collect, retry.
- If the error happens again, maybe even try to move things that could be done later to the disk and retry.
- If it happens again finally crash.
It is just that most zig programs, probably can avoid the need to invent / use their own garbage collector.
I guess if the os has swapping configured, that could delay the point where you hit out of memory, but because the disk/storage is so much slower it might be better for the program to hit out of memory, instead of being slowed, because of os based swapping. Because the program might have more information, to just decide to kill some less important sub task.
You also could do some kind of distributed programming swapping, move some part of your working memory / problem to another machine, but that is even slower and just isn’t practical if you want to stay on one machine.