Typed BitSet

ericlang · June 7, 2025, 2:52pm

I needed an internal “typed” bitset. So I created this thing, which is basically a naked version of std_bitset.IntegerBitSet. Max int that can be used = u64.

I “supertyped” it:

CountInt is only used for the popcount
BitInt is exactly the right size.

Any comments? (except my braces on new line, that’s personal)
Too bad I have to truncate tthe result of @ctz

pub fn BitSet(comptime size: u7) type
{
    comptime if (size == 0) @compileError("BitSet size 0 not supported");
    comptime if (size > 64)  @compileError("BitSet size > 64 not supported");

    return packed struct
    {

        const Self = @This();
        pub const zeroes: Self = .{};

        /// The integer type used to represent a mask in this bit set. For BitSet(64) this will be a u64.
        pub const MaskInt = std.meta.Int(.unsigned, size);

        /// The integer type used to represent a bit number in this bit set. For BitSet(u64) this will be a u6.
        pub const BitInt = std.math.Log2Int(MaskInt);

        const one_bigger = std.meta.Int(.unsigned, size + 1);

        /// The integer type used to represent counting of bits. For BitSet(64) this will be a u7.
        pub const CountInt = std.math.Log2Int(one_bigger);

        // /// The bit mask, as a single integer
        mask: MaskInt = 0,

        pub fn init(mask: MaskInt) Self
        {
            return .{ .mask = mask };
        }

        pub fn is_set(self: Self, bit: BitInt) bool
        {
            return (self.mask & mask_bit(bit)) != 0;
        }

        pub fn set(self: *Self, bit: BitInt) void
        {
            self.mask |= mask_bit(bit);
        }

        pub fn unset(self: *Self, bit: BitInt) void
        {
            self.mask &= ~mask_bit(bit);
        }

        fn mask_bit(bit: BitInt) MaskInt
        {
            return @as(MaskInt, 1) << @as(BitInt, @intCast(bit));
        }

        pub fn count(self: Self) CountInt
        {
            return @popCount(self.mask);
        }

        pub fn iterator(self: Self) Iterator()
        {
            return .{ .bits_remaining = self.mask };
        }

        fn Iterator() type
        {
            return struct
            {
                const IterSelf = @This();

                /// all bits which have not yet been iterated over
                bits_remaining: MaskInt,

                /// Returns tee next set bit.
                pub fn next(self: *IterSelf) ?BitInt
                {
                    if (self.bits_remaining == 0) return null;
                    const next_index = @ctz(self.bits_remaining);
                    self.bits_remaining &= (self.bits_remaining - 1);
                    return @truncate(next_index);
                }
            };
        }
    };
}

Sze · June 7, 2025, 5:00pm

This can be simplified:

        fn mask_bit(bit: BitInt) MaskInt
        {
            return @as(MaskInt, 1) << bit;
        }

ericlang · June 7, 2025, 5:35pm

omg you are right! (again)

swenninger · June 7, 2025, 8:50pm

Is there a specific reason you implemented the iterator as a type function instead of a regular struct?

ericlang · June 7, 2025, 9:38pm

I think you are right. In this case it could be a struct, using the types of that BitSet parent. I will check…

ericlang · June 7, 2025, 9:42pm

Yes you were right.

I can just declare the substruct.

        const Iterator = struct
        {
            const IterSelf = @This();

            /// all bits which have not yet been iterated over
            bits_remaining: MaskInt,

            /// Returns the next set bit.
            pub fn next(self: *IterSelf) ?BitInt
            {
                if (self.bits_remaining == 0) return null;
                const next_index = @ctz(self.bits_remaining);
                self.bits_remaining &= (self.bits_remaining - 1);
                return @truncate(next_index);
            }
        };

Edit: the reason was I copy pasted some stuff from std.bit_set.
And forgot to undress this one

ericlang · June 10, 2025, 2:19pm

Now I run into troubles when replacing a “raw” bitmask (u15) inside a packed struct with this BitSet(15).
When calling a function the compiler complains about alignment.
Is there a way to avoid having to do a @ptrCast(@alignCast(&self.bitset))

I thought the bit twiddling would become easier

const PackedStruct = packed struct {
    // stuff
    bitset: BitSet(15),
    // more stuff
    fn does_not_compile(self: *PackedStruct, pos: u4) void {
        self.bitset.set(pos); // pointer alignment problem
    }
}

Sze · June 10, 2025, 2:28pm

Your PackedStruct is a normal struct.
This is confusing, what do you actually want to do?

I think this would be easier to answer if you showed the actual error message and what stuff and more stuff is.

ericlang · June 10, 2025, 2:36pm

I edited it to packed struct…
stuff and more stuff are just random fields to mess up the alignment…
I’ll try and produce a fully working code snippet

Sze · June 10, 2025, 2:44pm

I think you need to layout the bitset field so that it starts at a byte address that has the correct alignment.

ericlang · June 10, 2025, 2:59pm

It is not possible to do any aligning or ordering in packed structs… It is done by the compiler.
Minimal example. In the 2nd method I am doing something terribly wrong also…

fn BitSet(size: u7) type {
    return packed struct {
        const Self = @This();
        pub const zeroes: Self = .{};
        pub const MaskInt = std.meta.Int(.unsigned, size);
        pub const BitInt = std.math.Log2Int(MaskInt);
        mask: MaskInt = 0,

        fn set(self: *Self, pos: BitInt) void {
            const one = @as(MaskInt, 1);
            self.mask |= (one << pos);
        }
    };
}

const PackedStruct = packed struct {
    some_stuff: u23 = 0,
    bits: BitSet(15) = .zeroes,

    pub fn does_not_compile(self: *PackedStruct, pos: u4) void {
        self.bits.set(pos); 
        // expected type '*main.BitSet(15)', found '*align(8:23:5) main.BitSet(15)'
        // note: pointer host size '5' cannot cast into pointer host size '0'
        // note: pointer bit offset '23' cannot cast into pointer bit offset '0'
    }

    pub fn does_compile_but_does_not_do_anything(self: *PackedStruct, pos: u4) void {
        const p: *BitSet(15) = @ptrCast(@alignCast(&self.bits));
        p.set(pos);
    }
};

main

    var p: PackedStruct = .{};
    p.does_compile_but_does_not_do_anything(3);
    std.debug.print("{}\n", .{p.bits.mask}); // zero!

Sze · June 10, 2025, 3:04pm

Nope, Language Reference: Packed Struct:

… Packed structs have well-defined memory layout - exactly the same ABI as their backing integer.

Each field of a packed struct is interpreted as a logical sequence of bits, arranged from least to most significant. …

Sze · June 10, 2025, 3:27pm

I think the easiest two options would be to either flatten the PackedStruct so that it contains only primitive types (integers and booleans) instead of nested packed structs that also need to be addressable.
Or make the BitSet 16 bits wide for easier alignment.

With the 15 Bits and nesting I don’t know how to get it aligned properly, with the things I tried.
I think you will either have to give up on those or give up the nesting and instead write it more flat.

Not entirely sure, because I haven’t actually tried this sort of nesting with packed structs that aren’t a multiple of 8 bits, except just now. Maybe there is some way I am not aware of.

ericlang · June 10, 2025, 3:41pm

Yeah it is a complicated case. I need all kinds of sizes in the packed struct and also need it to be small. I only have 6 bits spare.
So I will stick to the raw u15.

Eisenhauer · June 10, 2025, 6:32pm

Two suggestions, if you prefer to keep BitSet:

Write the set-function in an immutable Style, returning a new but modified BitSet, like:

fn set(self: Self, pos: BitInt) Self {
    const one = @as(MaskInt, 1);
    return .{.mask = self.mask | (one << pos),};
}

This makes the calling code a little less ergonomic with self.bits = self.bits.set(pos); instead of just `self.bits.set(pos);. Full Example on Godbolt.

Add variants of your packed structs with the usual (not packed) memory layout. These variants would have all the functions you want. Add additional conversions between the packed and non-packed variants. Your code would change to a “load from packed - work with the not packed variant - store back to packed”-style.