std.BoundedArray simple dropin replacement

For the other 5 people who are sad/pissed that the single most useful container type in std (I’m talking about std.BoundedArray of course) has been removed and replaced with a more verbose alternative (via ArrayListUnmanaged), I copy-pasted the parts that I used (just a couple of append…() methods and slice() into a drop-in replacement:

Tbh I’m not a fan of this new all-in-one design-philosophy, since ArrayListUnmanaged now has so many methods that it’s becoming the equivalent of a ‘fat base class’.

BoundedArray on the other hand had one focused use case and an API specialized for that use case.

IMHO a better design would be to have a universal, flexible but minimal ‘ItemBuffer’ type, which is not meant to be used directly, but instead as the base for all sorts of higher level containers (stacks, queues, array-lists, ring buffers, …). Those high-level containers types would also be part of the stdlib but mostly just thin specialization/convenience wrappers around the ItemBuffer type.

Back in the olden days when I still wrote C++ code I used a similar approach, and I still think the basic idea makes sense:

I would probably still treat something like BoundedArray as a special case because it’s so embarrasingly simple - and it has the advantage that it can be passed around by value without having to care about the lifetime of some external chunk of memory.

I first started to implement this BoundedArray replacement using an internal ArrayListUnmanaged, then noticed that this is total overkill (and I also had trouble creating a backward compatible version of BoundedArray), so I just did the sensible thing and copy-pasted a subset of the old implementation.

PS: unrelated, but zig nightly currently seems to be broken when fetching packages from git+https? This almost looks like that mysterious EndOfStream error that I also see when viewing locally generated Zig docs in the browser:

/home/runner/work/chipz/chipz/build.zig.zon:7:20: error: unable to unpack git files: EndOfStream
            .url = "git+https://github.com/floooh/sokol-zig.git#6ed171f4b0fd637cb80aa70d9dcc5f208c2e8035",
12 Likes

Admittedly not a BoundedArray user, but I had this exact thought yesterday. And you never know. Maybe down the road, ArrayListUnmanaged gets broken up again with this exact idea of a “core” structure. Whether it’s a bounded array or dynamic array will simply be different types that define methods on the same core structure (talking about the existing ArrayListUnmanaged members items and capacity and maybe some other core methods).

The bottom line to me is that inferring the intended usage from a type at compile-time is preferrable to methods with specific names that invoke illegal behavior if used incorrectly. Seems to align more with Zig’s zen.

2 Likes

Doubly linked list comes close but is too verbose.

Perhaps a generic type to complement the pointers would make it more useful.

Comes close to what?

I don’t know how DoubleLinkedLists have anything to do with the topic, for me they are useful as organization data structure in a limited number of cases (mainly when you only need iterative or local access/modification and your elements are big enough to offset the cost of the pointers), but are quite different from bounded array.

I don’t know what you mean with that.

1 Like

For those who still want the full BoundedArray implementation:

1 Like

Ouch, I use 0.14 still. And 90% of my filosofy is built on BoundedArray.
Is there some crazy explanation why it was removed?

https://github.com/ziglang/zig/pull/24699#issuecomment-3156805668

If those replacement strategies don’t work for you, then it’s trivial to copy BoundedArray into your own project or use jedisct1’s package as linked above.

You can read the comments and commit messages in the PR that removed them:

But the gist is:

  • Machine code bloat; each unique BoundedArray(T, len) with a unique length resulted in independent chunks machine code that the compiler might not always be able to deduplicate.
  • BoundedArray was safely copyable by value, but this could be a performance pitfall. Copying a BoundedArray with a large capacity but only a few actually used slots is a waste of processing time since all the undefined elements past the used region will also be copied even though their values are insignificant. Users should consider different strategies for usage patterns like these.
  • If you were using BoundedArray as a way to set an upper bound for the maximum number of items that can be created/processed in a function, you were restricting the caller unnecessarily with an arbitrary limit. Taking a buffer []T as an input or an allocator gives the caller full control over bounds, which is the preferred and more friendly approach to designing APIs.
7 Likes

Hmm, not very strong arguments IMHO. I’d think that Zig users who specifically picked BoundedArray over the alternatives are aware of those downsides.

I’m sympathetic about removing redundant APIs and streamling the stdlib, but currently BoundedArray feels like a victim of that idea taken too far, because it wasn’t redundant with other containers (mainly because it doesn’t have references to external data dangling off its body).

It would actually be nice to be able to pick this “embedded vs external” buffer strategy for all container types.

And yes it stamps out specialized code for all item types and capacities, but that’s true for all generic programming. The capacity could probably be stored in a runtime value so that only the init function is specialized by the capacity (ok as long as the capacity is a generic arg on the whole type this would still rely on deduplication), but OTH all that code is most likely inlined anyway, so I don’t really buy the code bloat argument even if the compiler doesn’t deduplicate.

The ‘max capacity baked into the code is bad’ argument also isn’t all that useful as general advice IMHO. For many types of libraries I agree, a baked capacity is often not a good thing, any max capacities should at least be configurable by the library user at initialization - up in the application layer it’s a different story though, there’s nothing wrong with estimating max capacities for nearly anything - at least it makes sure that a memory leak cannot run haywire and eat up all system memory.

But there are also plenty exceptions. For instance such upper bounds are often simply defined by the environment. 3D APIs have upper limits for things like number of vertex attributes, number of resource bindings (of a specific type on a specific shader stage), etc… making those limits fully dynamic in a wrapper library simply doesn’t make sense since it’s the underlying hardware which defines the limits - and writing code for hardware that doesn’t exist also doesn’t make a lot of sense.

1 Like

Ok these sound as valid reasons… I will live and check what to do. Maybe I just use some copy or find another solution.
Is 0.15 already there?

No it’s still ‘dev’, AFAIK it’s pretty close though (a couple of weeks?).

I am quite surprised by this move. Such a useful data structure, which I also currently make use of.

Pushing useful basic stuff like this into risky 3rd party packages feel like the start of the journey towards a ‘left-pad’ incident.

And the reasons for removal only seem to be based on a new push towards dependency injection everywhere. I really don’t get the philosophy of having a collection, that might need an allocator occasionally, having multiple methods pass in hopefully the same allocator. Dependency Injection to the max :frowning: It’s like having a thread safe collection, and passing in the thread safety mechanism on every method.

I guess I am just old, remembering the bliss of Java 1.2 collections framework. btw the book ‘Java concurrency in practice - Brian Göetz’ is well worth a read regarding collections API design, especially anything to do with concurrency, if that wasnt obvious from the name :slight_smile: It might be Java centric, but the principles are universal, and a really good set of concurrent / thread safe building blocks are described.

the reasons for removal are bad code gen, nothing to do with dependency injection.

ArrayListUnmanaged replaces it, not a 3rd party package, though I agree that a single data structure that could both manage the allocations or not is not a good design decision

It adds a new memory management footgun though.

If I pass around the ArrayListUnmanaged struct by value I now have multiple ArrayLists stomping on each others feet because they point to the same backing buffer - but passing things around by value often makes sense to keep code simple and not worry about lifetime issues.

That problem didn’t exist in BoundedArray which is safe to copy (problems like potentially bad codegen, or copying more data than needed should be pointed out in the docs, but IMHO in the end the decision should be up to the user of whether to use a container with an embedded or external buffer - such tradeoffs are completely normal and exist everywhere, especially when whipping out generics).

I was simply responding to them sating that they are forced to use 3rd party packages to get the same functionality.

One of the first mistakes you learn from with low level code is being aware what your data structures reference.

I agree it’s a foot gun, but I don’t think it’s a compelling reason for the existence of BoundedArray.

My preference would be a BoundedArray that is an ArrayListUnmanaged without the allocating api, it encourages allocation upfront, and you don’t have to deal with a type that might be managing dynamic allocations or not without indicating which one it is.

I do agree with you that you should also get the choice of an embedded array if that suits your needs better.

something something a FixedArrayList and EmbeddedArrayList would be nice.

1 Like

Yeah something in that direction… I still think some sort of low-level, internal “ItemBuffer” type which is generic over the item type and works on an externally provided ‘capacity-sized-slice’. Then on top of that variants which either have a fixed-size embedded item buffer or a growable external item buffer. And finally on top of that various specialized containers, like ArrayList, Stack, RingBuffer, Dequeue, Queue, Map, etc etc etc… each in two variants, one with a growable external buffer, one with a fixed-capacity embedded buffer (or maybe the two can even be merged via a generics parameter).

What I really don’t like is to differentiate between growable and fixed-capacity via different methods on the same container type (e.g. append() vs appendBounded() - that should always be append()).

4 Likes

Your main focus seems to be on code reusability, which I think you are taking to far.

the problem with this is that the embedded buffer would have the footprint of the buffer + pointer + len + capacity when it could be buffer + len. It also adds more layers to the api which can harm optimisations.
std needs to balance supporting more use cases with the cost of performance/memory, developers shouldn’t have to deal with the cost of use cases they don’t need, they shouldn’t have to re implement std because of that.
Not saying std shouldn’t offer such things, just not in the way you described.

rephrase that to non growing functions, being able to ensure no chance of allocation failure, pointer instability and removing the extra book keeping of growing the backing memory is invaluable to robust and performant code.

i also think its worth considering the impact on api ergonomics/friction that adding more layers to the api has

But that’s exactly what will happen (see C++) if the stdlib containers don’t have simple and straightforward APIs.

We’re both probably on opposite design-philosophy ends, but I think the stdlib should most importantly be convenient to use, even at the cost of efficiency. If performance isn’t good enough in some cases then it’s fine to drop down to custom implementations outside the stdlib.

It doesn’t make much sense IMHO when the stdlib is just providing low level building blocks which then must be wrapped in custom types to be easy to use (but that’s what ArrayListUnmanaged is starting to look like, a low level building block for more convenient wrapper types implemented in user code - like a BoundedArray made of a fixed-size array and an ArrayListUnmanaged and a smaller specialized API.

Now… at the current development cycle of Zig that’s probably okay to focus on such low-level core types, but for me at least, much of the lure from C to Zig is the ‘rich stdlib’.

There’s basically two types of code I use Zig for, one is really low-level code, which often doesn’t really need the stdlib except for using operating system services like filesystem access. For this, a low-level “raw” stdlib is totally fine - in this type of code, “Zig - the language” is more important than “Zig - the stdlib”.

The other type of Zig code for me are quick’n’dirty command line tools where I’m processing some input files into output files, code generation etc… E.g. what I traditionally used Python for but where Zig really makes more sense in a Zig project (since everything snaps nicely together via the build system).

For this type of tools I don’t care about performance (even 100x slower Python would be fine), I also don’t care much about memory leaks since the code will run once and then exit anyway.

What I do care in this type of tool is that I don’t need to play code-puzzle with too low-level stdlib types, or need to pull in too many 3rd-party dependencies. I want some convenient solution to parse cmdline args, to open/read/write/close files, and often to have string processing features, and for this a convenient (but not necessarily high performance) stdlib is key.

11 Likes

I think eventually it would make sense for Zig to have a core library which is essentially the current standard library and an extended standard library which may include things beyond what is immediately needed to implement Zig itself and caters more towards what users of Zig may need. (Which seems to be planned with the standard library audit before 1.0)

Personally I would prefer if those stayed clearly separated in some manner, I don’t really care whether that happens through different import names or clearly documented namespacing, etc.
But I would dislike if you eventually couldn’t tell anymore if something in the standard library is actually needed by the language implementation or not.

If such a extended standard library future comes, I think it would make sense to either make the wider standard library a packaged library (that may even come with the default download), or give people some way to get the more minimal distribution that only includes zig + core.

This sounds to me, more like a community project that creates sloppy-lib, that includes the whole kitchen sink. Or a zig scripting language.

I don’t think code that has horrible performance and makes writing code super convenient at the cost of encouraging production of slop that nobody wants to touch, read, or maintain, belongs in the standard library, whether it is core or extended.

Adding such stuff also has implications on the kind of programmers you attract, so I think doing that would encourage a downwards spiral of code quality. Basically you would remove all friction that encouraged people to write their code with a little bit more care and instead encourage them to not care at all.

I think if the community creates a package that is essentially seen as something nobody should depend on in any published package, because it is only meant for quick and dirty, throw away tools, than that is fine, but including it in the default distribution of Zig seems like adopting the opposite of “Favor reading code over writing code.”. At that point you might as well adopt some terse perl or APL syntax and just write some hieroglyphs that result in some program, that you can’t read a week later.

Also while you may not care about some bad but convenient code being added to the standard library there will be certainly others who either don’t want to leave that code in the bad state it is or want to remove it. Wanting others to maintain but not clean up and improve “because it is unnecessary” your convenient sloppy code, seems like an unrealistic premise to me, so if it were to be allowed to be added, that would basically mean that others than have to fix that mess and maintain it and also change it so that it follows the core values of the community and at that point it may become less convenient again.

Yes and because of that you can use a package named convenient-slop instead of adding that to Zig itself.

The main benefit for the user of the package if it is a part of Zig, seems to be that then others have to maintain it.
So it seems like an unfair exploitation of core community contributors to deliver a by definition bad product, so that pass-by users can create crappy software, that will eventually leak into the public and drag down the opinion for the code quality produced by the language itself.

Personally I would rather have less adoption of the language instead of future tutorials being based on or LLMs being trained on that “convenient” code and spreading it.
I also don’t want to have to tell countless newcomers, how to turn their “easy” code into something that has high quality, by peeling back layers and layers of easy bad code.

So I think what you describe is short sighted and doesn’t consider the long term costs (for others and the community).

My opinion is, if you want that, than maintain it yourself, so that if people ask how to upgrade their code into a real program that has good performance and handles memory errors etc., I can point them to your repo, to create an issue and you can spend the time to explain how to transition out of the mess.

The trouble with this approach (making things too easy) isn’t the expert user that knows when to do what, but to avoid sending beginners in a direction that leads down a path of quick success followed by a long road of disappointment and disillusionment, until they are ultimately completely turned off by the language/package/framework/community that sold them on a lie of things being easy.

3 Likes

So what I’m getting from that is that the purpose of the Zig stdlib is to build the Zig compiler? Isn’t that a bit narrow-minded since most people using Zig are not working on the Zig compiler?

I don’t think code that has horrible performance and makes writing code super convenient at the cost of encouraging production of slop that nobody wants to touch, read, or maintain, belongs in the standard library, whether it is core or extended.

I don’t think I mentioned ‘horrible’ performance anywhere, and trading something around 10% peformance for better usability is far from horrible.

Extremist thinking like this what gave us software disasters like Vulkan, a 3D API which whenever there was a decision between making the API easier to use or being more ‘explicit’ went for ‘more explicit’, with the result that nobody is using it (at least Khronos is starting to realize their mistake, only about a decade too late: https://www.youtube.com/watch?v=NM-SzTHAKGo).

Also while you may not care about some bad but convenient code

I don’t know where you get the impression that convenience means ‘bad code’. Performance is just one aspect in code quality which should always be balanced against other aspects (like robustness (Zig has runtime range, null and overflow checks after all - and all of those eat into performance, but not bad enough that it matters in most situations, and this balance is what software development is all about - and balancing towards ease-of-use is the same, yes it might cost some performance, but not bad enough to matter in most situation - just like those runtime safety checks).

Personally I would rather have less adoption of the language instead of future tutorials being based on or LLMs being trained on that “convenient” code and spreading it.

…I also don’t know where suddenly LLMs are coming in…

My opinion is, if you want that, than maintain it yourself, so that if people ask how to upgrade their code into a real program that has good performance and handles memory errors etc., I can point them to your repo, to create an issue and you can spend the time to explain how to transition out of the mess.

The thing is, I’ve been there, done that back in my C++ days - because the C++ stdlib is crap. What you get in return is a fragmented ecosystem of a myriad of stdlib replacements that all don’t work with each other, and with that you get a fragmented community - basically a tower-of-babel situation. A language needs a stdlib that’s useful for writing application and tools code (and more importantly, as a common vocabulary between developers), and not a stdlib that’s essentially just a compiler construction kit - because I’d wager that most Zig users won’t build their own compiler (even if it’s getting more and more tempting).

The trouble with this approach (making things too easy) isn’t the expert user that knows when to do what, but to avoid sending beginners in a direction that leads down a path of quick success followed by a long road of disappointment and disillusionment

Now you sound a lot like that PS3 dude :wink: I’ll just leave that here:

““We don’t provide the ‘easy to program for’ console that [developers] want, because ‘easy to program for’ means that anybody will be able to take advantage of pretty much what the hardware can do, so then the question is what do you do for the rest of the nine-and-a-half years?” (Kaz Hirai. 2009)”

…the result was that this was the only time in history where the Xbox 360 ate the Playstations lunch, because Xbox was ‘easy to program for’.

6 Likes