Interesting. Iāll admit the Ghostty example made me question my understanding of defer and errdefer interaction
I like the idea of removing the growing append
Zig applications should consider aborting on OOM.
Perhaps - but reusable packages should not. And isnāt it nice to generally program with the same style, and be able to extract code from your application into a reusable package?
I like the idea of renaming appendAssumeCapacity
to append
and encouraging its use as default, but instead of removing growing append rename it to e.g. appendGrow
. There are many cases where this distinction is not important, and expanding the amount of code needed to do basic operations on a data structure would only be encumbering.
I have a question . How would you handle a function that collects items in an ArrayList
, but returns an owned slice? toOwnedSlice
does return an error but the only place for it that makes sense is the last statement in the function.
unless it is something like this, which feels like a bad idea.
return ret.toOwnedSlice(allocator) catch ret.items;
If you can calculate the exact capacity needed, then ret.items
should be the whole allocation.
Otherwise, the best solution would be to return the array list and let the caller deal with it :3.
If you canāt/donāt want to do that, then you have to handle possible failure.
I donāt usuallyy know the exact capacity, and I specifically donāt want to return an ArrayList
. I want to return a nicely packaged slice, which makes for nicer API (and clearer data flow).
you could manually remap
/resize
the allocation which should reduce the possibility of failure, and reduce the half the maximum needed/assumed memory if you are doing that.
Yeah, my secret plan is to make people angry at the suggestion, and incentivize finding better ways to avoid these kinds of problems. Notably, std.testing.checkAllAllocationFailures
doesnāt help here, as you need to continue using the data structure after allocation failure to hit the issues.
What would help is throwing in allocation errors to the mix in Swarm Testing Data Structures.
I feel like the next step in this chain of thought is to not dynamically grow-allocate at all but use a fully pre-allocated array with a max capacity, it could be called a āBoundedArrayā
I think the end goal should be to eventually remove all memory allocation from the language. Even the stack.
Eliminating recursion? Letās take it a step further and eliminate function calls altogether (all structured programming is now some abuse of a labelled switch). Returning a pointer to a stack variable? No longer a problem.
Good luck to Rust in competing with that level of memory safety.
I realize youāre joking but tbh you can get surprisingly far without any dynamic memory allocation at all, e.g. in this Pacman clone all game state is in a single upfront defined global:
ā¦the same in the C version:
ā¦and my Zig emulator project is the same. The entire emulator state is in a single struct which doesnāt have any references to data outside that single nested struct (doesnāt look quite as impressive because the struct is declared elsewhere):
In the C version of those emulators I use this for snapshotting. I simply dump the entire emulator state struct via whatās essentially a memcpy into a file or the web browsers IndexedDB. This works because there are no references to outside data in the struct (thereās a handful of pointers inside the struct pointing to other parts of the struct, but those can be easily patched on save/load by replacing the pointers with offsets to the start of the struct before saving, and restoring the offsets to pointers after loading).
ā¦the only dynamic allocations happen in the sokol headers (used for rendering, audio, input etcā¦), but only once at startup, and only a small number of allocations (you can configure a couple of pool sizes in the init calls which are then pre-allocated). And then of course thereās more dynamic allocations happening down in the operating system which I unfortunately donāt have any control overā¦
E.g. in āZig termsā it might be a totally valid strategy to pass slices to pre-allocated memory into libraries instead of allocators, or even go a step further and make the āmax capacitiesā build-time parameters which are baked into a custom-built executable on the userās machine (which is much more feasible with Zigās build.zig and build.zig.zon and easy to setup Zig toolchain compared to the C/C++ world).
On a similar note, i used to read an article that talks about a lesson: limit the output for human consumption (i.e, human is bad at too much information), and limit the input for system resources. Use both of those to derive some reasonable upper bounds for buffers and allocations, so that your software doesnt redundantly use infinite amount of memory and run predictably.
The author is uhh⦠check notes, matklad
You said it much better than I did, will steal this phrase, thanks!
Iāve recently started doing something similar too (when I parse, I do two pases over the tokens, and during the first I perform some validations and compute how much memory I will need). Itās probably less efficient that way but I never worry too much about that.
Iāve also did a small helper container for this purpose. Itās only using debug.assert()
and it never fails. I find it easier to use for such cases.
Also interesting - it can be used in comptime. Itās not effortless, you still need to init differently based on @inComptime()
but the rest can be kept the same, including the .finish()
which will also copy the slice for you if you are in comptime.
BTW: Iāve just noticed the BoundedArray
was removed⦠I initially wanted to to use it but the main limitation was that it was not operating on externally provided buffer. I could also the ArrayListUnmanaged
but I wanted to have .len
as top-level field.
BTW2: In the regex.zig Iām also using two Buf(Ops)
s pointing into a single slice of memory, and the count is therefore a sum of both upper-bounds.
Isnāt āvalidate, allocate, transformā a fairly standard pattern in C?
Iāve written a lot of code that looks something like this (error checking omitted for brevity)
int size = parse(NULL); // returns negative numbers for errors
char data = malloc(size);
parse(&data); // can't fail given correct size data and earlier call succeeded
I never did any C before so it totally might be, and I think you are right, I think I saw it in some C codebase(s) before (pango? and I think llama.cpp does that too) but back then I was rather puzzled why were they doing the work twice
My background is Delphi ā PHP ā Java ā JS ā Rust ā Zig so Zig is really the first (serious) low-level encounter.
EDIT: I didnāt mean that I use the exact same function twice, just that I do some of the work again, so that I can save some work and checking later.
This style of first querying some required size by calling a function with some special null-pointer-arg and then calling that same function with a valid pointer again is used a lot in Win32, but as a C programmer I canāt say that Iām a fan of such ādouble-useā functionsā¦
Itās better to have a separate 'get_required_size()` function.
Itās better to have a separate 'get_required_size()` function.
Not saying itās good, but especially for larger C APIs I think the double-call can be the least worst option in some scenarios.
Think of something like Vulkan. Your library is big and complicated, and your main consumer of the library isnāt people coding in C, but people writing bindings for other higher-level languages.
Would you really gain anything by replacing
result = vkEnumerateInstanceExtensionProperties( "name", &count, NULL )
// check result, allocate some memory
result = vkEnumerateInstanceExtensionProperties( "name", &count, &ptr )
// check result
with
result = vkGetInstanceExtensionPropertyCount( "name", &count )
// check result, allocate some memory
result = vkEnumerateInstanceExtensionProperties( "name", count, &ptr )
// check result
across a dozen or so functions?