Floating Point Data XOR and Integer Delta Compression

lateefj · October 25, 2022, 5:43pm

There are two algorithms I am interesting in experimenting with Integer Delta compression and Floating Point XOR compression. Partly to understand more about the algorithms at a low level but also to learn more about Zig!
I was reading the FB Gorilla Paper that outlines integer delta compression (4.1.1) and XOR floating point compression (4.1.2) which references another paper
“Fast Lossless Compression of Scientific Floating-Point Data”. This all brings me two initial questions:

How do I convert an u64 to an array of bits (const u1)?
How do I XOR two floats (f1 | f2)?

I have been searching for a function that takes a runtime integer (hoping it was in std.mem) and returns the bits but haven’t had any luck.

fn i64XorDelta(first: u64, second: u64) []const u1 {
    return std.mem.something([]const u1, first - second);
}

Second I noticed XOR only worked on Integer types. Is there something obvious I am missing on how to do this XOR on a float?

Thanks!

.

kristoff · October 25, 2022, 6:31pm

I’m not sure what the best option is, and it probably depends on the specifics of what you want to do, but take a look at the types in std.bit_set and std.io.BitReader. That said, the most straightforward option is to just use masking on the original value, maybe by creating some helper functions to make things clearer.
You need first to @bitCast them to an integer value (of the appropriate size), do the xor, and then bit cast them back to a float.

dee0xeed · October 25, 2022, 6:46pm

const std = @import("std");

fn xorf32(f1: *f32, f2: *f32) f32 {
    var a: u32 = 0;
    var b: u32 = 0;

    var uptr: *u32 = @ptrCast(*u32, f1);
    a = uptr.*;
    uptr = @ptrCast(*u32, f2);
    b = uptr.*;
    var c: u32 = a ^ b;
    var fptr: *f32 = @ptrCast(*f32, &c);
    return fptr.*;
}

pub fn main() void {
    var f1: f32 = 0.5;
    var f2: f32 = 0.5;
    std.debug.print("{}\n", .{xorf32(&f1, &f2)}); // prints 0.0e+00
}

dee0xeed · October 25, 2022, 7:03pm

Yes, @bitCast() gives more compact (source) code, I just did not know about it, my example was “C way bit casting”.

dude_the_builder · October 25, 2022, 7:17pm

…and with a little comptime sugar to make it generic (at least for some floats):

const std = @import("std");

fn floatXor(comptime F: type, a: F, b: F) F {
   const I = switch (F) {
      f16 => u16,
      f32 => u32,
      f64 => u64,
      else => @compileError("Unsupported input type."),
   };

   return @bitCast(F, @bitCast(I, a) ^ @bitCast(I, b));
}

pub fn main() void {
    inline for ([_]type{ f16, f32, f64 }) |T| {
        std.log.info("{}", .{ floatXor(T, @as(T, 0.5), @as(T, 0.5)) });
    }
}

lateefj · October 25, 2022, 10:56pm

Thanks this is great. I am not use to using comptime yet I defiantly need to practice that.
Next I am trying to figure out how to pack the bits. I think I should have a switch statement for each integer bit range so it can be efficiently packed into an array of data.

Psuedo code (I think I want u1 instead of u8):

const std = @import("std");
const math = @import("math");

fn floatXor(comptime F: type, first: F, second: F) F {
    const I = switch (F) {
        f16 => u16,
        f32 => u32,
        f64 => u64,
        else => @compileError("Unsupported input type."),
    };
    return @bitCast(@bitCast(I, first) ^ @bitCast(I, second));
}

test "test XOR f64" {
    try std.testing.expectEqual(floatXor(f64, 1.0, 1.0), 0.0);
    try std.testing.expectEqual(floatXor(f64, 0.0, 1.0), 1.0);
}

fn bitify(comptime N: type) []const u8 {
    const I = switch (N) {
        u8 => u8,
        u16 => u16,
        u32 => u32,
        u64 => u64,
        f16 => u16,
        f32 => u32,
        f64 => u64,
        else => @compileError("Unsupported input type."),
    };
    // Minimize the size of the integer
    return switch (I) {
        0...std.math.pow(usize, 8, 2) => std.bit_set.IntegerBitSet(@truncate(u8, I)),
        std.math.pow(usize, 8, 2) + 1...std.math.pow(usize, 16, 2) => std.bit_set.IntegerBitSet(@truncate(u16, I)),
        else => std.bit_set.IntegerBitSet(@truncate(u32, I)),
    };
}

test "bitify number" {
    try std.testing.expectEqual(bitfy(0), []u8{0});
}

Wondering if I am on the right track or should try something totally different?

dude_the_builder · October 26, 2022, 12:52am

std.bit_set has several types of bit sets that cater to different needs, but I think for most cases, the static non-allocating bit sets it offers are really good solutions. They let you think about the bit set at a more abstract level in terms of number of elements max and which ones are set, unset, and the set count. For example:

const size = @bitSizeOf(N);

var bits = std.bit_set.StaticBitSet(size).initEmpty();

// From then on, it's just a matter of setting, clearing, and counting.
std.log.info("bits set: {}", .{bits.count()});
bits.set(10);
std.log.info("bits set: {}", .{bits.count()});
std.log.info("bit 10?: {}", .{bits.isSet(10)});
// There's `unset`, `setValue`, `toggle` methods etc.

So at that level, you wouldn’t have to do all the low-level bit flipping yourself.

dee0xeed · October 26, 2022, 7:26am

I would mention that in situations where small (integer) values are more probable various codes (Elias, Golomb, Rice…) are also more or less frequently used. One example is flac codec which uses Rice coding for LPC output.

lateefj · October 26, 2022, 7:55pm

I seem to be getting a SIGEGV error. Is there a good way to debug this command:

zig test xor.zig 
fish: Job 1, 'zig test xor.zig' terminated by signal SIGSEGV (Address boundary error)

Based on this code:

const std = @import("std");

fn floatXor(comptime F: type, first: F, second: F) F {
    const I = switch (F) {
        f16 => u16,
        f32 => u32,
        f64 => u64,
        else => @compileError("Unsupported input type."),
    };
    return @bitCast(@bitCast(I, first) ^ @bitCast(I, second));
}

test "test XOR f64" {
    try std.testing.expectEqual(floatXor(f64, 1.0, 1.0), 0.0);
    try std.testing.expectEqual(floatXor(f64, 0.0, 1.0), 1.0);
}

fn bitify(comptime N: type) []const u8 {
    const size = @bitSizeOf(N);
    var bits = std.bit_set.StaticBitSet(size).initEmpty();
    bits.set(N);

    return bits.mask;

}

test "bitify number" {
    //try std.testing.expectEqual(bitfy(0), []u8{});
}

Wondering if it has to do with build cache or something.

dude_the_builder · October 27, 2022, 2:28am

With regards to turning an unsigned int into an array of bits, here’s a possible solution:

fn bitlify(n: u8) [8]u1 {
   var out = [_]u1{0} ** 8;

   var i: usize = 0;
   while (i < 8) : (i += 1) {
      if (n & @as(u8, 1) << @intCast(u3, i) != 0) out[i] = 1;
   }

   return out;
}

test "bitlify number" {
   try std.testing.expectEqual(bitlify(12), [_]u1{ 0, 0, 1, 1, 0, 0, 0, 0 });
}

Making it fully generic would be a bit more work, mainly determining the size of the input type, on which depend the types of the left and right hand sides of the left shift operation and the size of the output array.

kristoff · October 28, 2022, 3:54pm

You are returning a pointer (a slice to be precise) to a local variable in bitify that becomes garbage as soon as the function returns.

lateefj · November 6, 2022, 6:34pm

I have been poking around trying to code this up and learn a bit of zig on the way. I seem to getting different results based on a print statement. Line 21 seems to impact the test results which is surprising. Is there something I am missing?


const std = @import("std");
const maxInt = std.math.maxInt;

pub fn bitify(comptime I: type, n: I) []const u1 {
    var zeroOffset: usize = 0;
    if (n == 0) {
        const b = [_]u1{0};
        return b[zeroOffset..1];
    } else if (n == 1) {
        const b = [_]u1{1};
        return b[zeroOffset..1];
    }
    const nsize = @bitSizeOf(I);
    var bts = [_]u1{0} ** nsize;
    var i: usize = 0;

    var num = n;
    while (i < nsize) : (i += 1) {
        const bit = @intCast(u1, num % 2);
        // XXX: Comment out this print statement and interesting things happen
        std.debug.print("Casting {d} num is: {d} bit is: {d}\n", .{ i, num, bit });
        bts[nsize - (i + 1)] = bit;
        num = num / 2;
        if (bit == 1) {
            zeroOffset = nsize - (i + 1);
        }
    }
    std.debug.print("Value {d} set bit: {d} bits are: {d}\n", .{ n, zeroOffset, bts });
    // Drop leading 0's
    var respBits: []const u1 = bts[zeroOffset..bts.len];
    std.debug.print("Response bits are: {d}\n", .{respBits});
    return respBits;
}

fn u1SliceEqual(name: []const u8, first: []const u1, second: []const u1) bool {
    if (first.len != second.len) {
        std.debug.print("Test {s} bit lengths to not match {d} != {d}\n", .{ name, first.len, second.len });
        return false;
    }
    for (first) |b, i| {
        if (second[i] != b) {
            std.debug.print("For test {s}: index {d} first {d} second {d} not equal first {d} second {d}\n", .{ name, i, first, second, b, second[i] });
            return false;
        }
    }
    return true;
}

pub fn bits(comptime I: type, n: I) []const u1 {
    const b = switch (I) {
        u8, u16, u32, u64, u128 => {
            if (n <= maxInt(u8)) {
                return bitify(u8, @intCast(u8, n));
            } else if (n > (maxInt(u8)) and n <= maxInt(u16)) {
                return bitify(u16, @intCast(u16, n));
            } else if (n > maxInt(u16) and n <= maxInt(u32)) {
                return bitify(u32, @intCast(u32, n));
            } else if (n > maxInt(u32) and n <= maxInt(u64)) {
                return bitify(u64, @intCast(u64, n));
            } else {
                return bitify(u128, @intCast(u128, n));
            }
        },
        else => @compileError("Not a number type"),
    };
    return b;
}

test "test 2^n bits" {
    var allocator = std.testing.allocator;
    var i: usize = 1;
    while (i < 128) {
        var array = std.ArrayList(u1).init(allocator); //.initCapacity(allocator, i + 1);

        var j: usize = 0;
        while (j < i) {
            try array.append(1);
            j += 1;
        }
        var v: u128 = @intCast(u128, std.math.pow(u128, 2, i) - 1);

        var bitArray = bits(u128, v);
        std.debug.print("\nValue {d} bit array {d} length {d} slice {d}\n", .{ v, bitArray, i, array.items });
        try std.testing.expect(u1SliceEqual("u128 v", bitArray, array.items));
        i += 1;
        array.deinit();
    }
}

test "bitify number" {
    const array12 = [_]u1{ 1, 1, 0, 0, 0, 0 };
    var zero: usize = 0;
    const slice12 = array12[zero..array12.len];

    try std.testing.expect(u1SliceEqual("u8 12", bitify(u8, 12), slice12));
}

I have been wondering if it has something to do with not allocating the bit arrays correctly?

dude_the_builder · November 7, 2022, 11:21am

As @kristoff mentioned, you are returning a slice to stack allocated memory in the function which will become garbage at some point later on when you call another function (like debug.print). Unlike garbage collected languages like Go that let you return slices without thinking about where the bytes are, in Zig you have to always ask yourself “Where are the bytes?”. If you have a function that returns a slice / pointer, that should be a red flag. It either should be receiving the memory as a parameter so it can slice from there and return that, or it has to allocate on the heap and return that slice. In your functions, you are creating arrays which are allocated on the function’s temporary stack frame and returning a slice from that memory, that produces inconsistent behavior given that when you later on dereference that pointer (a slice is a pointer) you may be lucky and the original bytes are still there, but you may have already called another function that clobbered those bytes and you get weird values back. In a language like Go, escape analysis detects when you’re returning stack memory and it automatically heap allocates and produces a slice / pointer to that memory.

lateefj · November 15, 2022, 8:57pm

Thanks @dude_the_builder and @kristoff . Bit of learning about defer and errdefer was helpful in sorting out memory issues. Next I am going to work on accumulating the u1 and serializing them (disk).