# Floating Point Data XOR and Integer Delta Compression

There are two algorithms I am interesting in experimenting with Integer Delta compression and Floating Point XOR compression. Partly to understand more about the algorithms at a low level but also to learn more about Zig!
I was reading the FB Gorilla Paper that outlines integer delta compression (4.1.1) and XOR floating point compression (4.1.2) which references another paper
“Fast Lossless Compression of Scientific Floating-Point Data”. This all brings me two initial questions:

1. How do I convert an u64 to an array of bits ([]const u1)?
2. How do I XOR two floats (f1 | f2)?

I have been searching for a function that takes a runtime integer (hoping it was in std.mem) and returns the bits but haven’t had any luck.

``````fn i64XorDelta(first: u64, second: u64) []const u1 {
return std.mem.something([]const u1, first - second);
}
``````

Second I noticed XOR only worked on Integer types. Is there something obvious I am missing on how to do this XOR on a float?

Thanks!

.

1. I’m not sure what the best option is, and it probably depends on the specifics of what you want to do, but take a look at the types in `std.bit_set` and `std.io.BitReader`. That said, the most straightforward option is to just use masking on the original value, maybe by creating some helper functions to make things clearer.

2. You need first to `@bitCast` them to an integer value (of the appropriate size), do the xor, and then bit cast them back to a float.

``````const std = @import("std");

fn xorf32(f1: *f32, f2: *f32) f32 {
var a: u32 = 0;
var b: u32 = 0;

var uptr: *u32 = @ptrCast(*u32, f1);
a = uptr.*;
uptr = @ptrCast(*u32, f2);
b = uptr.*;
var c: u32 = a ^ b;
var fptr: *f32 = @ptrCast(*f32, &c);
return fptr.*;
}

pub fn main() void {
var f1: f32 = 0.5;
var f2: f32 = 0.5;
std.debug.print("{}\n", .{xorf32(&f1, &f2)}); // prints 0.0e+00
}
``````
1 Like

Yes, `@bitCast()` gives more compact (source) code, I just did not know about it, my example was “C way bit casting”.

…and with a little `comptime` sugar to make it generic (at least for some floats):

``````const std = @import("std");

fn floatXor(comptime F: type, a: F, b: F) F {
const I = switch (F) {
f16 => u16,
f32 => u32,
f64 => u64,
else => @compileError("Unsupported input type."),
};

return @bitCast(F, @bitCast(I, a) ^ @bitCast(I, b));
}

pub fn main() void {
inline for ([_]type{ f16, f32, f64 }) |T| {
std.log.info("{}", .{ floatXor(T, @as(T, 0.5), @as(T, 0.5)) });
}
}
``````
5 Likes

Thanks this is great. I am not use to using comptime yet I defiantly need to practice that.
Next I am trying to figure out how to pack the bits. I think I should have a switch statement for each integer bit range so it can be efficiently packed into an array of data.

Psuedo code (I think I want `u1 instead of u8`):

``````const std = @import("std");
const math = @import("math");

fn floatXor(comptime F: type, first: F, second: F) F {
const I = switch (F) {
f16 => u16,
f32 => u32,
f64 => u64,
else => @compileError("Unsupported input type."),
};
return @bitCast(@bitCast(I, first) ^ @bitCast(I, second));
}

test "test XOR f64" {
try std.testing.expectEqual(floatXor(f64, 1.0, 1.0), 0.0);
try std.testing.expectEqual(floatXor(f64, 0.0, 1.0), 1.0);
}

fn bitify(comptime N: type) []const u8 {
const I = switch (N) {
u8 => u8,
u16 => u16,
u32 => u32,
u64 => u64,
f16 => u16,
f32 => u32,
f64 => u64,
else => @compileError("Unsupported input type."),
};
// Minimize the size of the integer
return switch (I) {
0...std.math.pow(usize, 8, 2) => std.bit_set.IntegerBitSet(@truncate(u8, I)),
std.math.pow(usize, 8, 2) + 1...std.math.pow(usize, 16, 2) => std.bit_set.IntegerBitSet(@truncate(u16, I)),
else => std.bit_set.IntegerBitSet(@truncate(u32, I)),
};
}

test "bitify number" {
try std.testing.expectEqual(bitfy(0), []u8{0});
}
``````

Wondering if I am on the right track or should try something totally different?

`std.bit_set` has several types of bit sets that cater to different needs, but I think for most cases, the static non-allocating bit sets it offers are really good solutions. They let you think about the bit set at a more abstract level in terms of number of elements max and which ones are set, unset, and the set count. For example:

``````const size = @bitSizeOf(N);

var bits = std.bit_set.StaticBitSet(size).initEmpty();

// From then on, it's just a matter of setting, clearing, and counting.
std.log.info("bits set: {}", .{bits.count()});
bits.set(10);
std.log.info("bits set: {}", .{bits.count()});
std.log.info("bit 10?: {}", .{bits.isSet(10)});
// There's `unset`, `setValue`, `toggle` methods etc.
``````

So at that level, you wouldn’t have to do all the low-level bit flipping yourself.

I would mention that in situations where small (integer) values are more probable various codes (Elias, Golomb, Rice…) are also more or less frequently used. One example is flac codec which uses Rice coding for LPC output.

I seem to be getting a SIGEGV error. Is there a good way to debug this command:

``````zig test xor.zig
fish: Job 1, 'zig test xor.zig' terminated by signal SIGSEGV (Address boundary error)
``````

Based on this code:

``````const std = @import("std");

fn floatXor(comptime F: type, first: F, second: F) F {
const I = switch (F) {
f16 => u16,
f32 => u32,
f64 => u64,
else => @compileError("Unsupported input type."),
};
return @bitCast(@bitCast(I, first) ^ @bitCast(I, second));
}

test "test XOR f64" {
try std.testing.expectEqual(floatXor(f64, 1.0, 1.0), 0.0);
try std.testing.expectEqual(floatXor(f64, 0.0, 1.0), 1.0);
}

fn bitify(comptime N: type) []const u8 {
const size = @bitSizeOf(N);
var bits = std.bit_set.StaticBitSet(size).initEmpty();
bits.set(N);

}

test "bitify number" {
//try std.testing.expectEqual(bitfy(0), []u8{});
}
``````

Wondering if it has to do with build cache or something.

With regards to turning an unsigned int into an array of bits, here’s a possible solution:

``````fn bitlify(n: u8) [8]u1 {
var out = [_]u1{0} ** 8;

var i: usize = 0;
while (i < 8) : (i += 1) {
if (n & @as(u8, 1) << @intCast(u3, i) != 0) out[i] = 1;
}

return out;
}

test "bitlify number" {
try std.testing.expectEqual(bitlify(12), [_]u1{ 0, 0, 1, 1, 0, 0, 0, 0 });
}
``````

Making it fully generic would be a bit more work, mainly determining the size of the input type, on which depend the types of the left and right hand sides of the left shift operation and the size of the output array.

1 Like

You are returning a pointer (a slice to be precise) to a local variable in `bitify` that becomes garbage as soon as the function returns.

I have been poking around trying to code this up and learn a bit of zig on the way. I seem to getting different results based on a print statement. Line 21 seems to impact the test results which is surprising. Is there something I am missing?

``````
const std = @import("std");
const maxInt = std.math.maxInt;

pub fn bitify(comptime I: type, n: I) []const u1 {
var zeroOffset: usize = 0;
if (n == 0) {
const b = [_]u1{0};
return b[zeroOffset..1];
} else if (n == 1) {
const b = [_]u1{1};
return b[zeroOffset..1];
}
const nsize = @bitSizeOf(I);
var bts = [_]u1{0} ** nsize;
var i: usize = 0;

var num = n;
while (i < nsize) : (i += 1) {
const bit = @intCast(u1, num % 2);
// XXX: Comment out this print statement and interesting things happen
std.debug.print("Casting {d} num is: {d} bit is: {d}\n", .{ i, num, bit });
bts[nsize - (i + 1)] = bit;
num = num / 2;
if (bit == 1) {
zeroOffset = nsize - (i + 1);
}
}
std.debug.print("Value {d} set bit: {d} bits are: {d}\n", .{ n, zeroOffset, bts });
// Drop leading 0's
var respBits: []const u1 = bts[zeroOffset..bts.len];
std.debug.print("Response bits are: {d}\n", .{respBits});
return respBits;
}

fn u1SliceEqual(name: []const u8, first: []const u1, second: []const u1) bool {
if (first.len != second.len) {
std.debug.print("Test {s} bit lengths to not match {d} != {d}\n", .{ name, first.len, second.len });
return false;
}
for (first) |b, i| {
if (second[i] != b) {
std.debug.print("For test {s}: index {d} first {d} second {d} not equal first {d} second {d}\n", .{ name, i, first, second, b, second[i] });
return false;
}
}
return true;
}

pub fn bits(comptime I: type, n: I) []const u1 {
const b = switch (I) {
u8, u16, u32, u64, u128 => {
if (n <= maxInt(u8)) {
return bitify(u8, @intCast(u8, n));
} else if (n > (maxInt(u8)) and n <= maxInt(u16)) {
return bitify(u16, @intCast(u16, n));
} else if (n > maxInt(u16) and n <= maxInt(u32)) {
return bitify(u32, @intCast(u32, n));
} else if (n > maxInt(u32) and n <= maxInt(u64)) {
return bitify(u64, @intCast(u64, n));
} else {
return bitify(u128, @intCast(u128, n));
}
},
else => @compileError("Not a number type"),
};
return b;
}

test "test 2^n bits" {
var allocator = std.testing.allocator;
var i: usize = 1;
while (i < 128) {
var array = std.ArrayList(u1).init(allocator); //.initCapacity(allocator, i + 1);

var j: usize = 0;
while (j < i) {
try array.append(1);
j += 1;
}
var v: u128 = @intCast(u128, std.math.pow(u128, 2, i) - 1);

var bitArray = bits(u128, v);
std.debug.print("\nValue {d} bit array {d} length {d} slice {d}\n", .{ v, bitArray, i, array.items });
try std.testing.expect(u1SliceEqual("u128 v", bitArray, array.items));
i += 1;
array.deinit();
}
}

test "bitify number" {
const array12 = [_]u1{ 1, 1, 0, 0, 0, 0 };
var zero: usize = 0;
const slice12 = array12[zero..array12.len];

try std.testing.expect(u1SliceEqual("u8 12", bitify(u8, 12), slice12));
}
``````

I have been wondering if it has something to do with not allocating the bit arrays correctly?

As @kristoff mentioned, you are returning a slice to stack allocated memory in the function which will become garbage at some point later on when you call another function (like `debug.print`). Unlike garbage collected languages like Go that let you return slices without thinking about where the bytes are, in Zig you have to always ask yourself “Where are the bytes?”. If you have a function that returns a slice / pointer, that should be a red flag. It either should be receiving the memory as a parameter so it can slice from there and return that, or it has to allocate on the heap and return that slice. In your functions, you are creating arrays which are allocated on the function’s temporary stack frame and returning a slice from that memory, that produces inconsistent behavior given that when you later on dereference that pointer (a slice is a pointer) you may be lucky and the original bytes are still there, but you may have already called another function that clobbered those bytes and you get weird values back. In a language like Go, escape analysis detects when you’re returning stack memory and it automatically heap allocates and produces a slice / pointer to that memory.

1 Like

Thanks @dude_the_builder and @kristoff . Bit of learning about defer and errdefer was helpful in sorting out memory issues. Next I am going to work on accumulating the []u1 and serializing them (disk).

1 Like