What are the tradeoffs of a Treap? (/ why is ``std.Treap`` the way it is?)

markus · June 20, 2024, 11:03am

In a post here I came across someone mentioning std.Treap. I looked it up, did some research and found it very interesting. I am, however, wondering why the Treap specifically made it into the stdlib and what are its tradeoffs / advantages compared to something like a normal avl tree?

mnemnion · June 20, 2024, 12:59pm

The treap is used in the GeneralPurposeAllocator to store buckets. Why a treap specifically? Don’t know. It might be that OS pages have a tendency to be ordered, and ordered inputs force maximum rotation on AVLs. So using a random priority would avoid that.

But that’s just a guess.

markus · June 20, 2024, 1:10pm

Thank you already though, thats useful information to my understanding of why the treap is in there

dimdin · June 20, 2024, 3:44pm

Treap in standard library is used in three places.

GPA
From the commit comment that added the Treap:

Before this commit, GeneralPurposeAllocator could run into incredibly degraded performance in scenarios where the bucket count for a particular size class grew to be large.
…
std.Treap is used instead of a doubly linked list for the lists of buckets. This takes the time complexity of searchBucket [used in resize and free] from O(n) to O(log n), but increases the time complexity of insert from O(1) to O(log n)
Futex
It seems related to the fairness of wake, but I am not sure.
resinator (windows resource compiler)
Keeps source mapping for #line handling.

markus · June 20, 2024, 7:29pm

Why is a treap superior to other trees here though? Or why else was it used?

AndrewCodeDev · June 20, 2024, 7:31pm

I think your question is more towards tree-heaps in general instead of std.Treap. You can find more information about tree-heap data structures through some direct searches:

markus · June 20, 2024, 7:34pm

Yep, thats true. I didnt know where else to ask, though. Thank you for the link!

squeek502 · June 20, 2024, 8:46pm

Note that std.Treap technically predates all of these usages, but the motivation was for std.Thread.Futex (and for a while Futex was the only usage of Treap within the Zig codebase). The pull request that introduced std.Treap has a nice writeup that contains the motivation:

github.com/ziglang/zig

Introduce std.Treap

ziglang:master ← kprotty:treap

opened 10:01PM - 15 Apr 22 UTC

kprotty

+400 -0

A [Treap](https://en.wikipedia.org/wiki/Treap) is a randomized binary search tre…e. The "randomized" part means that, after every update, the shape of the tree is dependent on the RNG's distribution but the height is highly probable to be `log n` where n = unique keys/nodes. The reason for adding a Treap in the standard library over more common self-balancing binary search trees like [AVL](https://en.wikipedia.org/wiki/AVL_tree) and [RedBlack](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree) is that the implementation is iterative, requires similar Node overhead, and (most importantly) is relatively much simpler. The API supports getting the min/max nodes and getting "Entry"s by key lookup or from an existing/inserted node. Inserts, updates, and deletions are all done through an Entry via its method `fn set(self: Entry, new_node: ?*Node)`. Having them all as one function simplifies the API surface. Having it all in the Entry encourages lookup amortization. The primary use case for this is intrusive data structures that require fast (`O(log n)`) association. My plan is to use it for implementing thread wait queues in the userland `std.Thread.Futex` fallback and the timer queue for `std.event.Loop.sleep`; both of which currently employ `O(n)` worst case linked-list traversal out of simplicity.

The reason for adding a Treap in the standard library over more common self-balancing binary search trees like AVL and RedBlack is that the implementation is iterative, requires similar Node overhead, and (most importantly) is relatively much simpler.

I used it for the GeneralPurposeAllocator because it existed in the standard library and had the properties I was looking for:

Any data structure with O(log n) or better search/insert/delete would also work for this use-case.

I initially used a skip list implementation that I wrote for this because I wasn’t aware of std.Treap, but std.Treap slightly outperformed it in my benchmarks and provides all the same benefits.

markus · June 20, 2024, 9:01pm

OH YES this was the answer I was hoping for, case solved.

The first quoted paragraph is it!

squeek502 · June 20, 2024, 9:24pm

Even more context for those curious:

std had a red-black tree implementation at one point but it was deleted and moved to the std-lib-orphanage:

And as somewhat of a proof of the

[Treap] is relatively much simpler.

claim, std.rb had bugs. From this PR that wanted to reinstate it for use with an Allocator implementation:

Note that this was marked as a draft due to (quite major) bugs found in std.rb - a rewrite of it is underway.

(that rewrite never materialized)

markus · June 21, 2024, 6:27am

Interesting, where is the stdlib orphanage though?

Cloudef · June 21, 2024, 7:36am