Reserve First

20 Likes

Interesting. I’ll admit the Ghostty example made me question my understanding of defer and errdefer interaction

I like the idea of removing the growing append

Zig applications should consider aborting on OOM.

Perhaps - but reusable packages should not. And isn’t it nice to generally program with the same style, and be able to extract code from your application into a reusable package?

6 Likes

I like the idea of renaming appendAssumeCapacity to append and encouraging its use as default, but instead of removing growing append rename it to e.g. appendGrow. There are many cases where this distinction is not important, and expanding the amount of code needed to do basic operations on a data structure would only be encumbering.

5 Likes

I have a question . How would you handle a function that collects items in an ArrayList , but returns an owned slice? toOwnedSlice does return an error but the only place for it that makes sense is the last statement in the function.

unless it is something like this, which feels like a bad idea.

return ret.toOwnedSlice(allocator) catch ret.items;

If you can calculate the exact capacity needed, then ret.items should be the whole allocation.

Otherwise, the best solution would be to return the array list and let the caller deal with it :3.

If you can’t/don’t want to do that, then you have to handle possible failure.

I don’t usuallyy know the exact capacity, and I specifically don’t want to return an ArrayList. I want to return a nicely packaged slice, which makes for nicer API (and clearer data flow).

you could manually remap/resize the allocation which should reduce the possibility of failure, and reduce the half the maximum needed/assumed memory if you are doing that.

Yeah, my secret plan is to make people angry at the suggestion, and incentivize finding better ways to avoid these kinds of problems. Notably, std.testing.checkAllAllocationFailures doesn’t help here, as you need to continue using the data structure after allocation failure to hit the issues.

What would help is throwing in allocation errors to the mix in Swarm Testing Data Structures.

I feel like the next step in this chain of thought is to not dynamically grow-allocate at all but use a fully pre-allocated array with a max capacity, it could be called a ā€˜BoundedArray’ :stuck_out_tongue_winking_eye:

5 Likes

I think the end goal should be to eventually remove all memory allocation from the language. Even the stack.

Eliminating recursion? Let’s take it a step further and eliminate function calls altogether (all structured programming is now some abuse of a labelled switch). Returning a pointer to a stack variable? No longer a problem.

Good luck to Rust in competing with that level of memory safety.

1 Like

I realize you’re joking but tbh you can get surprisingly far without any dynamic memory allocation at all, e.g. in this Pacman clone all game state is in a single upfront defined global:

…the same in the C version:

…and my Zig emulator project is the same. The entire emulator state is in a single struct which doesn’t have any references to data outside that single nested struct (doesn’t look quite as impressive because the struct is declared elsewhere):

In the C version of those emulators I use this for snapshotting. I simply dump the entire emulator state struct via what’s essentially a memcpy into a file or the web browsers IndexedDB. This works because there are no references to outside data in the struct (there’s a handful of pointers inside the struct pointing to other parts of the struct, but those can be easily patched on save/load by replacing the pointers with offsets to the start of the struct before saving, and restoring the offsets to pointers after loading).

…the only dynamic allocations happen in the sokol headers (used for rendering, audio, input etc…), but only once at startup, and only a small number of allocations (you can configure a couple of pool sizes in the init calls which are then pre-allocated). And then of course there’s more dynamic allocations happening down in the operating system which I unfortunately don’t have any control over…

E.g. in ā€˜Zig terms’ it might be a totally valid strategy to pass slices to pre-allocated memory into libraries instead of allocators, or even go a step further and make the ā€˜max capacities’ build-time parameters which are baked into a custom-built executable on the user’s machine (which is much more feasible with Zig’s build.zig and build.zig.zon and easy to setup Zig toolchain compared to the C/C++ world).

1 Like

On a similar note, i used to read an article that talks about a lesson: limit the output for human consumption (i.e, human is bad at too much information), and limit the input for system resources. Use both of those to derive some reasonable upper bounds for buffers and allocations, so that your software doesnt redundantly use infinite amount of memory and run predictably.

The author is uhh… check notes, matklad

6 Likes

You said it much better than I did, will steal this phrase, thanks!

1 Like

I’ve recently started doing something similar too (when I parse, I do two pases over the tokens, and during the first I perform some validations and compute how much memory I will need). It’s probably less efficient that way but I never worry too much about that.

I’ve also did a small helper container for this purpose. It’s only using debug.assert() and it never fails. I find it easier to use for such cases.

Also interesting - it can be used in comptime. It’s not effortless, you still need to init differently based on @inComptime() but the rest can be kept the same, including the .finish() which will also copy the slice for you if you are in comptime.

BTW: I’ve just noticed the BoundedArray was removed… I initially wanted to to use it but the main limitation was that it was not operating on externally provided buffer. I could also the ArrayListUnmanaged but I wanted to have .len as top-level field.

BTW2: In the regex.zig I’m also using two Buf(Ops)s pointing into a single slice of memory, and the count is therefore a sum of both upper-bounds.

1 Like

Isn’t ā€œvalidate, allocate, transformā€ a fairly standard pattern in C?

I’ve written a lot of code that looks something like this (error checking omitted for brevity)

int size = parse(NULL); // returns negative numbers for errors
char data = malloc(size);
parse(&data); // can't fail given correct size data and earlier call succeeded

I never did any C before so it totally might be, and I think you are right, I think I saw it in some C codebase(s) before (pango? and I think llama.cpp does that too) but back then I was rather puzzled why were they doing the work twice :smiley:

My background is Delphi → PHP → Java → JS → Rust → Zig so Zig is really the first (serious) low-level encounter.

EDIT: I didn’t mean that I use the exact same function twice, just that I do some of the work again, so that I can save some work and checking later.

This style of first querying some required size by calling a function with some special null-pointer-arg and then calling that same function with a valid pointer again is used a lot in Win32, but as a C programmer I can’t say that I’m a fan of such ā€˜double-use’ functions…

It’s better to have a separate 'get_required_size()` function.

2 Likes

It’s better to have a separate 'get_required_size()` function.

Not saying it’s good, but especially for larger C APIs I think the double-call can be the least worst option in some scenarios.

Think of something like Vulkan. Your library is big and complicated, and your main consumer of the library isn’t people coding in C, but people writing bindings for other higher-level languages.

Would you really gain anything by replacing

result = vkEnumerateInstanceExtensionProperties( "name", &count, NULL )
// check result, allocate some memory
result = vkEnumerateInstanceExtensionProperties( "name", &count, &ptr )
// check result

with

result = vkGetInstanceExtensionPropertyCount( "name", &count )
// check result, allocate some memory
result = vkEnumerateInstanceExtensionProperties( "name", count, &ptr )
// check result

across a dozen or so functions?