This is intended to be modeled after the components built in the standard library and offer the idioms and expectations of what you would expect in a good performing container type.
I am standing on the shoulder of giants however as this is built on top of the AutoHashMap but with the clear intention of this having all the common set operations that you would expect in other languages.
I welcome any and all constructive feedback! Contributions are also welcome and encouraged!
Very, very nice! I was looking at the code and see that many methods require iterating over the elements. I recalled that the doc comments for std.HashMap state:
/// If iterating over the table entries is a strong usecase
/// and needs to be fast, prefer the alternative
/// `std.ArrayHashMap`.
So I wonder if std.ArrayHashMap would work and even provide better performance in this case?
If I have the right person, I’ve literally watched probably around 20 of your Zig videos on Youtube and even some a few times over. What a fantastic series you have.
You bring up a valid point about considering ArrayHashMap as the backend. In fact, I need to investigate the viability of having the user of the ZiglangSet select which backend they want to use depending on their needs.
The one thing that is usually the case with set datastructures is a lot of it ends up favoring membership checks which implies lookups which is why I started out with the HashMap as the backend.
It’s true that a handful of methods do need to iterate, which will usually occur when you are in the Set composition stages of code…but then it’s somewhat expected that after this stage you have stability (where your sets are now mostly static).
From there you would basically do membership checks going forward. But, I will say this may not be true for every use case out there. Perhaps some use cases favor iteration in their workload. In that case the ArrayHashMap might be the way to go so I’ll investigate this.
And seriously thank you for the encouraging words!
Hey, there’s only one Dude the Builder. lol Thank you for the positive feedback on the videos. I’m currently starting a new series on Zig that will hopefully try to keep up with the latest changes in the language. Let’s see how it goes.
I think being able to switch the backend hash map type is an excellent idea. Making the ZiglangSet adaptable to specific use cases; that’s powerful indeed. And the way those data structures are designed in the standard library, I think their APIs are very similar so it could be a drop-in replacement in many cases.
I hate those with a passion. I think they are a huge mistake. That is a single branch that is going to properly predicted everytime it matters (it is isn’t you are already going to allocate so it doesn’t matter in that case).
It shows a poorly designed APi and kitchen sink approach.
It does not help performance over just writing better code, and it doubles the API size.
They don’t help performance one bit. The single capacity test you are going around is going to predicted properly every time it matters so only cost the instruction slot. The time is isn’t predicted (when you do need to grow), the cost to grow the struct will swamp it 10000x fold.
its optimizing for the wrong thing and papers over API problems. I work on ultra high performance code daily and we never do that stuff (nanoseconds matter). Spend time on better cache usage, bulk operations, almost anything else will give you a better return.
I added this for consistency with the APIs in the Zig stdlib because I’m erroring on the side of consistency.
But I see this function as a micro-optimization for when you’ve reserved a large capacity of memory up front and then perhaps in a tight loop are adding items per loop iteration. In that case you get rid of capacity checks entirely.
But a few more points on this.
If people really care they should benchmark and prove out the performance for their workload.
One could argue that any hash-based data structure already is not super cache friendly to begin with but they are one of the best data structures ever invented and like everything else in life it’s all about trade-offs.