Zig 0.14.0 released

Most likely you would create your own task specific container wrappers storing appropriate allocators inside them to avoid the silly errors of allocating with one allocator while freeing with another. This friction and extra work could actually lead to a better application design in the end.

1 Like

Yes. If you use std.ArrayList for lists in an interpreter, then the same sort of logic would apply.

That makes sense, thanks!

1 Like

I think from a data-oriented standpoint you should avoid individual objects anyway, and instead group them based on the set of keys, so that every unique set of keys is like an archetype, that forms a table, where the keys are the columns and the rows are the former individual instances.

All values shared across an entire column, also conveniently can be shared/stored once (in the archetype), if you are fine with moving the instance to a different archetype once that value needs to be changed.

I don’t think having a lot of managed instances creating an overhead is a very strong argument, I think a better argument is that you shouldn’t have that many container instances and once you have fewer of them; and your code is more data-oriented and deals more with groups of instances, you already naturally have code that groups things and then that code already carries the knowledge about what allocator is used for these 3 containers that are used together to store the data for a bunch of instances. So why store the same allocator 3 times in different containers that are always used together with the same allocator.

Also being able to initialize the unmanaged variants by just using the .empty decl literal is pretty nice. So I agree that it is better that you see where and when allocations are actually happening.

7 Likes

This is very similar the way by which std.ArrayList was implemented. Just copy that code instead if you want keep using it.

1 Like

I will disagree, here it’s not negligible. I’ve had first hand experience with that recently. So I was the first in my group of friends to learn Zig, and I follow it very closely. My friend on the other hand has less than a year of practice with the language. Recently he was working on optimizing he’s Scrabble solver. In which he computes all the possible solutions for a given board configuration based on the letters and wildcards you might have. I was helping him, looking through his code, benching with poop, and I noticed that he had aliased std.ArrayList(u8) to String. changing it to the unmanaged variant, significantly improved performance. The peak RSS went down by like 128mb of Ram. the cache references, cache misses went down by 20% and branch prediction was much better.

So while it probably doesn’t matter at small scale. It’s actually a very big deal at a larger scale. Also the inconvenience is fairly small, and as always you are free to create a wrapper around it. I started like everyone I guess by using the managed versions, and the more I use the language the more I’m leaning towards using undamaged variants.

5 Likes

This very much depends on your design. If you have lots of small arraylists and hashtables you will be spending lots of memory and cpu cycles managing them. Modern trend in Zig programming advocated by Andrew himself is the data-driven programming where instead of slice of containers you go with container of slices. Please google for his videos for the details. Data-driven design makes your RSS smaller and memory access more cache friendly. Also, having fewer but larger array-lists or hash-maps makes overhead negligible,

For the record, I am not against unmanaged containers and use them all the time but not because they save me an allocator object (this saving is nice but not crucial).

5 Likes

I remember a discussion on this topic from a while back. The problem with stashing an allocator somewhere is that you’re assuming that (a) it’ll remain valid beyond the function call and (b) that it’s the same interface for deallocation. Either of these has to be true.

Imagine we have a generic function that allows you to limit amount of allocation:

fn callWithLimit(func: anytype, args: anytype, limit: usize) @typeInfo(@TypeOf(func)).@"fn".return_type.? {
    // ...
}

How would we go about implementing that? We would loop through args, look for std.mem.Allocator, and replace it with an different one:

    var new_args: @TypeOf(args) = undefined;
    inline for (args, 0..) |arg, index| {
        new_args[index] = switch (@TypeOf(arg)) {
            std.mem.Allocator => init: {
                var awl = AllocatorWithLimit.init(arg, limit);
                break :init awl.allocator(); 
            },
            else => arg,
        };
    }
    return @call(.auto, func, new_args);

The allocator received by func would have a ptr that points to awl, which sits in the call stack. As soon as callWithLimit() returns, it’s gone.

I do agree, 100%, I don’t think his design was really good, but it’s still like 1 < ms response time, so although it would be cool to refactor the implementation he made, I just briefly helped him, and although I would have also implemented a DOD design, I’m pretty sure most people won’t, and in most cases it will be fine, but having only unmanaged variant, just puts you imo in exactly the mindset you are mentioning, it adds just the right amount of friction, because passing allocators everywhere just isn’t a whole lot of fun, so it will probably make you realize that your design is actually a local maximum that can be better represented and more convenient if it uses some DOD patterns. So I still think there is some values in removing all managed variant.

This really isn’t relevant because the change is to remove managed ArrayLists from the std library … nothing prevented your use case from using unmanaged ArrayLists before or after the change. OTOH, for those who want managed ArrayLists for whatever reason after the change, they can/must roll their own wrapper.

(I’m not arguing for or against the change … especially as it is a done deal and thus moot. As the release notes state, the change was based on a preponderance of feedback in favor of it.)

1 Like

Were these submissions removed from Hackernews? I can open them from your link, but I can’t find them on the site itself?
They are also not listed on the alternative interface I use. Weird.

They never made it to the Hacker News home page and remained suppressed, yet discoverable through Algolia search. It is very strange, indeed. Normally, Zig news catch HN attention.

1 Like

Tbf, HN’s front page is hardly about interesting software projects anymore. A couple of years ago Lobster’s and HN’s front page were nearly a copy of each other, and Lobster still has that quality, but not HN.

1 Like

Decl literals is actually kind of a game-changer for the language

I really like how that and .zon files fit with the “struct as key-value pairs” pattern that is decently common in Zig.

If I’m understanding both right, we have a situation where calling an API and defining a configuration file for software using that API can reasonably share syntax without the complexity of either exploding.

Ie, if the API is defined as accepting an “options” struct, and that struct has (possibly nested structs with) “enum-like” decl literals like .default, anything up to and including the top level can be replaced with those ‘enum-likes’ in a configuration .zon file as well.

Now developer-users who are familiar with configuring your software can crack open the API and find everything immediately familiar. For systems software like Curl or OpenSSH which has both users and developers working with it, I think that is a super neat feature.

This is just my opinion (defaulting to unmanaged :thinking: , and planning to get rid of managed :anxious_face_with_sweat: :scream:)
The only good reason is the reduction in SIZE.
The first one mentioned by @Retro_Dev doesn’t push us to get rid of the current behavior. Currently, If someone want that they can use the unmanaged version which is clear in it’s name. Zig aim to be good (best) at low level programming, but it’s usage is very wide. That change may alienate a considerable portion of it users.

  • Someone already mentioned the risk of errors. I think it’s very high. Doing “clever” things, the right may be much harder.
  • Same for PROTOTYPING and other usages. It’s true the std should mostly contain the base (it’s a good one already). But It should also be convenient to work with. One should be able to COMPOSE common functions, types,… QUICKLY.
  • (Simple) Code may be harder to read and write. And not that better to understand.

As it seems, the BEST course of action is to KEEP the managed versions even if they are renamed. One should have to reinvent…

Isn’t that 16 + N * x vs 16 * N + N * x ?
N being the number of instances and x being their average size ?
For that use case unmanaged is the best option, but 16 * N can still be “negligible” is x is big enough.

the unmanaged api is more explicit and versatile, those are also good reasons.

zig values your ability to do what you want/need over preventing mistakes, though the latter is still valuable.
I find it rare to have more than 2 allocators in scope, that’s easy to keep track of, and should be one of the first things you check when you have weird memory issues regardless of if the container is managed or not.

Favor reading code over writing code is one of zigs philosophies, regardless typing code is usually not the bottleneck of development.

if you find the unmanaged api to be harder to understand, that indicates a lack of understanding of how dynamic collections fundamentally work.

the argument about size is less about total memory usage (though that can be important sometimes) but more about efficient cache usage, which can have a massive effect on performance.

At the end of the day, your opinion is subjective, if you want to convince the zig team you’re going to have to find arguments that haven’t already been addressed.

3 Likes

I would like to mention that there is also a reduction in complexity.
Every data structure in the language has two variants, now you have to decide which one to use. Is this a case where I need to save the extra bytes? Are my lists significantly larger than the 40 bytes of management data? And note that even at 100 u8’s the reduction in size from going Unmanaged is over 10%.
Zig tries to make that decision for us here, to quote from the zig zen: “Only one obvious way to do things.”

Furthermore the choice to have both means that both versions have to be maintained and updated together, and every new data structure needs to be created twice. And if the standard library makes the choice to give us both versions, then third-party libraries will be hold to the same standard. Would you rather have library authors spend time maintaining two versions of the same data structure, or spend time on fixing bugs and adding more features?

Now I agree that using the managed variants is more convenient, but most of the time you know the allocator, so is it really that hard to pass it in to a few more append calls, or even better just reserve everything up-front?

6 Likes

I wasn’t talking about the collections themselves, but about the code source. The current std.ArrayList despite it flaws, have a very convenient semantic.

  1. I will create a span of memory for you, and give you handles over it (append, free_retain_capacity, …).
  2. I may move the span if necessary (only trust .items and an index…).
  3. When you are done tell me, and I will release the span.
  • Most of the time you can assume the allocator stays the same (it’s rare to see list.allocator = new_allocator, and that is definitely more error prone than passing it around). This is not the case when passing it around.
  • The allocator don’t only decide how to allocate but also where. For example I often use std.heap.FixedBufferAllocator to store file path ([1024]u8, a zero marks the end, with functions like getSlice or getSliceSentinel → [:0]const u8). For more complex, use cases, you can pass it to a temporary std.ArrayList during init to have a more featured interface (“handles”) than simple allocator. Always refering to the same span.
  • Suppose you want to pass the *std.ArrayList to a sub-routine (writer is not as versatile, and if there is no space arrayList will error anyway), you will also need to pass the allocator. Which again raise the questions: Are they the same as the one first used ? Is it right if they differ ?

They are not exactly the same. As previously stated, even when allocating upfront the unmanaged sementic “Each time you want me to do something tell me how, and where for FixedBufferAllocator”. This is GREAT, but still, some newcomers often want faster, more typed and low level javascript/python or friction-less rust. In these, arrayList is far more trivial…

Anyway I understand the change. I’m just thinking, the managed behavior is still nice to have (Maybe have a way to share allocators .init(shared_allocator: *std.mem.Allocator) ).

this is an incoherent statement. We are clearly talking about managed vs unmanaged containers.

these apply to both managed and unmanaged.

regardless of if the allocator implementations allow such a thing, this is extremely sketchy. The correct thing to do is make a new list, copy the contents, deinit the old one

again, applies to both managed and unmanaged.

Yes, you have to use the same allocator.
the writer interface cant be as verstile as the array list[unmanaged] api, since the writer doesnt know what its writing to, where as the array list does.

you are repeating points we already answered.

what do you mean by this, you understand you don’t have to make an allocator for each allocation? Right?

2 Likes