std.BoundedArray simple dropin replacement

This has been a difficult breaking change to make, because so many coders whose opinions I respect have decided that they like the API.

While I remain fully convicted that alternatives are superior, I am open-minded to continue to examine use cases.

Does anyone want to link to some relevant code, that they feel is a good use case for BoundedArray, that will suffer from avoiding that data structure?

Flipping the script - I just upgraded some Aro code to avoid BoundedArray and both cases I found were bad uses. The first one should have been a hash map (it does a O(N^2) string matching loop, and hash maps already have a linear_scan_max), and the second one should have appended to an array list parameter, or more simply inlined the function logic. source

7 Likes

I don’t have strong feelings one way or the other about BoundedArray, but I think it’s worth noting that the BoundedArray version of possibleProgramNames was impossible to misuse, while the new ArrayListUnmanaged version will hit illegal behavior if you pass in an array list with 0 capacity.

My general sense is that this sort of thing is a tradeoff you think is fine/worth it, and doc comments are good enough for communicating these sorts of requirements, but (as is the case here) those requirements can be left out of the documentation quite easily.

(this API at least hints towards possible_names needing some amount of capacity naturally since the function doesn’t take an allocator, but if the function took an allocator for unrelated reasons this would become more ambiguous and make it much easier to footgun yourself)

(edit: there’s also no way to change the API in the future to be able to append more than 2 items without introducing a possible panic/IB at existing usage sites)

Here is a use-case that I am interested to hear from others on best path forward:

/// Encode the value to MessagePack bytes in a bounded array.
/// Unbounded size types (slices) are not supported.
/// No errors though!
pub fn encodeBounded(value: anytype, comptime options: EncodeOptions(@TypeOf(value))) std.BoundedArray(u8, largestEncodedSize(@TypeOf(value), options.format)) {
    var res = std.BoundedArray(u8, largestEncodedSize(@TypeOf(value), options.format)){};
    var fbs = std.io.fixedBufferStream(res.buffer[0..]);
    encodeAny(value, fbs.writer(), options.format) catch unreachable;
    res.len = fbs.getWritten().len;
    return res;
}

The function accepts a value (like an ?u8) and returns a bounded array of the value encoded in the message pack format. It only works for types with comptime-known maximum size. For ?u8, It returns something like a bounded array with 2 bytes, because it can be known at compile time what the maximum size of that type will be when encoded. When encoded, null is one byte, u8 is two bytes. So we return max(1, 2), which is 2.

This has the advantage of “error-free encoding”. My function signature has no possibility of errors. The user cannot provide an array that is too short. The API is very simple.

I am using the unique property of bounded array that it is a variable length data structure that can cross the return boundary (it can be created and returned on the stack without corruption).

With bounded array being gone, I see the following ways I could proceed:

  1. Return a simpler bounded array that only has the slice() method. I don’t need all the append etc helper methods.
  2. Stop providing this error-free API and only provide my existing method that accepts a mutable slice for output buffer.

I will likely update it to provide the reader / writer interface, which may change my mind on everything. I haven’t tried the new writer interface yet. I initially avoided it because I despised the anytype and inferred errors sets I was semi-forced to use.

Some existing criticisms that apply to my code:

  1. I am not perfect about largestEncodedSize(). I lazily wrote some of it so it overestimates for some of the types. I could definetly be exact though if I put some more switch cases in there for more integer bit sizes. If I accepted a writer or output buffer, the user would just get exact encoded size. (I do have a separate API that accepts buffer).
  2. User gets oversized bounded arrays because I have to return the maximum size. In the ?u8 scenario, if the user encodes lots of nulls, they will be wasting 50% of their encoding memory.
  3. If I just accept buffer, my API becomes smaller and easier to maintain.

PS: Nobody uses my code so I am just guessing what people like. I personally like not having any errors to handle.

pub fn encodeX(value: X, options: XOptions, w: *Writer) Writer.Error!void {
    // write to w
}

pub fn encodeY(value: Y, options: YOptions, w: *Writer) Writer.Error!void {
    // write to w
}
  • fully concrete, reusable code
    • the caller can use a fixed buffer and catch unreachable if they want it
  • easier to use API since you don’t have to guess which types are allowed
  • customizable per-type options (perhaps Y doesn’t even need options)
  • does not impose possibly wrong constraints/limits
  • does not pointlessly copy unused memory when returning

Good riddance to BoundedArray here.

6 Likes

No I am not saying there only should be compiler things in the eventually realized standard library, I just would want some way to get the set of things that are actually used by the compiler and things that aren’t used by the compiler.

I think it would be useful from a perspective of making it easy to get into reading the compiler or creating minimal distributions of Zig, when it is clear what parts are just there for the sake of users of Zig and which parts are used by the implementation a lot. Especially if the extended library becomes bigger.

I read that:

as not caring about performance at all (for that library and the tools it is intended to write), as long as it is convenient. My response was that I think there can be more than a core library (which would be only concerned about compiler construction), where the extended library still can be convenient to some degree, just still with a lot higher standards than super slow python code.

I don’t think that is extreme. Am I misunderstanding what you meant with 100x slower python code? From your response now, it sounds like you basically also just want an extended standard library that makes some things a little bit more convenient, while that example made it sound as if you would sacrifice a lot of things.

Well thanks, for comparing me with some person I don’t know anything about and I don’t agree with at all.

Taking advantage of what the hardware can do is exactly what I want and I don’t want different peoples notions of what they think is a good way to structure programs to get in the way of that (more than necessary).

I just meant that I would much rather lead someone down a path that is a bit more manual and a bit more work, where success may take some more time, but then is reliably found by the person, because they understand the tradeoffs and can work with them. (I would rather take that, over something that makes it super easy but hides the tradeoffs)

Too often easy just means that someone opted in to a bunch of tradeoffs for you, which then aren’t so easy to opt-out of again. It would be great if you can explicitly opt-in to certain trade-offs and get convenient things out of it or if you can’t choose, at least have them clearly documented, so that you can make a decision whether you can accept them. What sucks is if you don’t get a choice on which tradeoffs you have to accept, or are surprised by what you have opted-into without realizing.

Regarding vulkan I haven’t really used it yet, so I can’t really comment on it.
I also haven’t reverse engineered graphics apis / drivers and graphics cards, so I can’t really tell what is going on with them under the hood, but those apis seem like they are exactly made to keep us from interfacing with the hardware directly, I think a world where an open source compiler would directly output to specific families of graphics cards would be a better place, than having to invent languages that can compile to different apis that then again contain their own different languages that are internally compiled to yet another output.

Basically I hate magic, I want to be able to directly see whats happening and those apis don’t make that easier and it seems that tools like renderdoc have to do too much work to regain enough control over those apis to extract the needed information.

Guys, don’t forget you have a lot more in common than you have differences to bicker about. Keep it classy, my friends.

Regarding Vulkan, I don’t have nearly as much experience as @floooh but I do have some. So far, I have not felt the need for a BoundedArray API, but I will keep my mind open.

11 Likes

My 2ct, using this thread as an example:

Zig aims at being a general-purpose language.

At this point in time, Zig (the language) is fine, but the std lib sucks, not technically, but it needs much better documentation.

Sorry I have to say this:
The Zig community is sometimes over-focused on runtime performance.
And I say this, being “the performance guy” in our company…

Many if not most programs need a lot of I/O and many of these are written single-threaded for simplicity. The performance of these programs is I/O bound and not at all CPU depending.

The overall “cost” of those programs thus is X*(development time).

Python clearly wins here, because it is easy to write and read and comes with “batteries included”.

The quality of the std lib should be measured in three aspects with equal weights:
Code quality, performance, understandability and ease of use.

The last two points definitely need improvement, if the Zig wants to be general purpose, or it should give up this goal.

7 Likes

This actually lines up very well with other choices in the zig compiler: moving stuff that is not strictly needed to build the compiler to separate packages, but still support that stuff (for example, LLVM, or translateC). This is desirable from another point of view: easily distributing the compiler, making its bootstrap process simpler and more universal. In my mind, this is a worthwhile goal to go after.

2 Likes

I totally agree, but the key here is “supported”, e.g. such packages should at least be reviewed and “blessed” (ideally: maintained) by the core team so that everybody knows that these are official packages. Otherwise the risk of fragmenting the library ecosystem (and with it the community) is real.

easily distributing the compiler, making its bootstrap process simpler and more universal

…technically the whole standard library could be distributed as packages and don’t need to be part of the Zig distribution. That would also make sense for when there’s a time when stdlib versions will be needed.

1 Like

Aha. Interesting that is.
I noticed in my code, which writes to stdout, that I need - for example - to output a chess-move.

fn to_string(self: Move) std.BoundedArray(u8, 5) {
    // outputs strings like "e2e4" or "e7e8q" in case of a promotion.
}

So first I convert a move to a string and then write it to stdout, while it can be done in 1 step.

I did not have a look yet at 0.15dev (I just use the latest releases) but reading this thread it seems that ArrayListUnmanaged gets a stack-based version as well? Or am I wrong?

to_string sounds like format.

/// outputs strings like "e2e4" or "e7e8q" in case of a promotion.
pub fn format(move: Move, w: *Writer) Writer.Error!void {

}

This thread is a good example IMHO:

If a max capacity of some item collection can be estimated upfront and isn’t too big, than a nested array + length is often much easier (and safer) to work with than a slice.

Now the old nested BoundedArray as it is also wouldn’t help all that much because it can’t be directly assigned from a Zig array.

Another situation where I used BoundedArray is to gather cmdline args in build.zigs, I have replaced this now with ArrayListUnmanaged which isn’t a big deal, but it would be nicer if a replacement would require less rather than more code (and I also instinctively don’t like that there’s some separate data blob dangling off the ‘main struct’ and I also don’t like sprinkling my code with ‘undefined’).

New:

Old:

TL;DR: for situations where I just need to throw a low number of items into an array (up to a few dozen maybe), but at the same time need to track the number of occupied slots in the array I was usually reaching for BoundedArray (since it’s just that: a fixed-size array and a length).

1 Like

just an aside, for defines i find the b.addConfigHeader and the return types api, to be quite nice.

look at addValueInner for what types do what, but the important gist is:
null doesn’t define, everything else supported does.
enums and enum literals refer to identifiers.

2 Likes

This may require a “second-party” organization to maintain an “official endorsement, community-driven” ecosystem. There is no technical difference between using second-party packages and third-party packages, but the core developers guarantee the quality.

One can implement bounded array using ArrayListUnmanaged which lessens the code bloat. I personally think BoundedArray is okay for small temporary buffers you need to fill or small bounded buffers in structs that need to be copied.

What worries me about ArrayListUnmanaged that it has quite bit of method creep with implicit state you need to keep in mind, having higher level types built on top of it might not be a bad idea.

4 Likes

I see the move to only unmanaged containers in std as a positive thing because it will result in less churn in user code as std evolves.

BoundedArray was tangential to managed-vs-unmanaged, since the array data is directly embedded into the BoundedArray struct (it’s just an array and a length in the same struct), there are no allocators involved.

PS: In hindsight, the naming wasn’t great though. IMHO ‘EmbeddedArrayList’ might have been a better choice. Theoretically, all container types could have such an embedded variant, and all of those wouldn’t require allocators.

2 Likes

I would say ArrayListEmbedded for consistency, but I definitely agree with the premise.

1 Like

From before BoundedArray was a thing:

(original commit from 6 years ago)

I remember seeing in the wild at least 4-5 other independent implementations of the same concept over time :^)

4 Likes

Using it now :slight_smile: Thx.