Zroaring — roaring bitmaps for zig

https://codeberg.org/archaistvolts/zroaring

a zig port of CRoaring

roaring bitmaps are a data structure for storing and operating on large sets of 32-bit integers. they’re used in major systems because of great compression and fast set operations.

Documentation

this project is quite early with lots to be done. its a little messy still but i think i’m ok with the current design except some regret that i’ve over engineered my WordBitset with comptime options.

just wanted to share incase others are interested in this. human contributions wanted!

3 Likes

i managed to create a workflow on github so that this project’s docs get auto built and deployed. first time i’ve done that with a zig project. was pretty easy to do really. mirroring from codeberg to github for hosting docs and ci doesn’t seem too bad so far.

1 Like

@awesomo4000

Thanks for your message. I saw your project was inspired by it to try myself. I’m following your validation strategy. Seems like you’ve covered a good bit of ground in your project. So far I’m somewhere between just porting the c code and trying for something simpler.

Some of my project’s c style memory management has been a bit difficult. I’m currently storing each container type as ptr and cardinality, capacity u32s with to/fromArrayList methods. and that works well.

Also like your project, I’m storing Conatiners as tagged pointers. Since each container has equal size, 128 bits, each is align 16 so the first 4 bits of every pointer will be clear. Though I’m only using 2 bits for 4 tags.

I want to add fuzzing and property based tests soon. I think roaring bitmaps are quite interesting and am enjoying learning about them.

Anyway, I’m curious to know how writing rawr went, what you learned and if you have any advice.

More passing validations today. I guess I’ll move on to frozen bitmaps. Hope to pass all validations soon.

Just added frozen_view() to complete an initial frozen de/serialization cycle :slight_smile:.

I think i’m happy now with a single shared allocation which gets cleaned up in deinit(). Better than my first pass with an out pointer the caller needed to free. Here’s how it looks now. And here is deinit.

I’ve struggled before with constructing a MultiArrayList from a buffer. Figured out something that works, seems ok along with MAL.capacityInBytes(). Curious to hear if anyone else has done similar. I’m hoping this will be fast - need start benchmarking soon.

1 Like

I’ve started over with a more from scratch implementation after realizing previous memory management was a bit of a mess. I was struggling to get a testing.checkAllAllocationFailures() test passing. Now its passing :slight_smile:.

I deleted most of my old code. I think I’m happy with this new direction. I’ll still be trying to follow the CRoaring API, but I don’t think porting it was as correct, productive and fun as I’d hoped.

I’m going for more of a PWP style. Right now a Bitmap lives in just 2 allocations, one for a Header - with slices contained in the Header allocated right after the Header bytes. And all container data lives in a single ArrayList(Block).

Much happier with the current deinit(). No more pointer chasing and only 2 free()s. :light_bulb:

1 Like

Still working on this. Ended up unhappy with the previous approach of storing everything in an array list. That also ended up too complex for me.

Now I’m using flexible-struct adapted from https://codeberg.org/ziglang/zig/pulls/30823 and everything lives in 1 allocation (instead of 2). Behold, another happy deinit() with no pointer chasing.

Currently I have most validations working again except for the ‘frozen’ ones. Maybe I’ll finally have some benchmarks vs CRoaring to share soon. :high_voltage: :factory_worker: