Statistic Distributions Library with an Intrusive Interface

Hello everyone!

I feel the intersection with statisticians and systems level programmers is quite narrow, but I happen to be one of those! I’ve implemented a small Distributions library!

I had two objectives in mind: 1) refactor from another project where I need random number generation following some distributions (the ones which are actually implemented) and 2) to finally understand how Intrusive Interfaces actually work (or I hope so at least), let me explain.

I realized that a Statistical Distribution is an amazing problem to teach to a newbie - like myself - how knows some statistics how this pattern goes and understands what a more traditional interface (like in Java or Go) works. You have several functions that every distribution must implement (pdf, ppf, cdf, sample) and you can have functions on the Interface that the implementees do not need to implement (as sampleBuffer, which just calls sample from the children).

I just learnt about Intrusive Interfaces with the Io new implementation, so if someone could actually verify to me if what I did complies as that I would be very glad :))

Regarding features, it literally has two distributions: you just can generate random samples from an Exponential and a Uniform with single or double precision. I also don’t intend to work on this a lot, so if you need an specific distribution or feature just open an issue or send me a mail!

const Unif = stats.Uniform(f32);
const Exp = stats.Exponential(f32);
const Dist = stats.Distribution(f32);

const seed = blk: {
    var os_seed: u64 = undefined;
    init.io.random(std.mem.asBytes(&os_seed));
    break :blk os_seed;
};

var prng = std.Random.DefaultPrng.init(seed);
const rng = prng.random();

var exp: Exp = .init(2); 
const dexp: *Dist = &exp.interface; //ptr distribution
const e = dexp.sample(rng);
 
var esample: [40]f32 = undefined;
dexp.sampleBuffer(&esample, rng);

Have a nice day :))

6 Likes

Hello! This is now in a more usable state. If some of you know something about Discrete Event Simulations you’ll know these haha

I’ve implemented several continuous distributions: Constant, Exponential, Uniform, Hyperexponential, Hypoexponential and Erlang; also I’ve added the Categorical (pick items from a vector with some different probabilities) and an Empirical Cumulative Distribution function generator. All of those are working with comptime generics.

Lastly, I had to create union aggregaitons of the distributions: one for the discrete and another one for the continuous. When you want to swap out a distribution from a config file, let’s say a JSON like this:

// as we want to use both discrete and continous distributions, we must import both unions
const Continuous = distributions.ContinuousDistribution;
const Discrete = distributions.DiscreteDistribution;

// this struct defines which type of distribution can the field hold
// any of those can be inited with it's precision
const Config = struct {
    inter_arrival_time: Continuous(f32),
    wait_time: Continuous(f64),
    boarding_time: Continuous(f64),
    user_action: Discrete(f32, u8),
};

// we define a json with the values implemented on the distributions
const content = 
    \\{
    \\  "inter_arrival_time": { "exponential": { "lambda": 2.0 } },  
    \\  "wait_time": { "constant": { "value": 5.0 } },
    \\  "boarding_time": { "uniform": { "a": 1, "b": 10 } },
    \\  "user_action": { "categorical": { "weights": [0.4, 0.6], "data": [0, 1] } }
    \\}
;

They are very useful, as know having different JSON will effectiverly change your program!

I had lot’s of fun implementing this, and I’d really want to keep going if I had more time! If some of you need a specific distribution to sample from lmk xD. Have a nice day!

1 Like

Hello! To shave even more CPU cycles from my main project, I looked into the rand_dist crate to see how they implemented the exponential. The algorithm is the Ziggurat, which is in itself not complex, but it has used a trick (that paired with comptime) makes the Zig implementation quite more nicer than Rust’s IMO.

This is actually quite an improvement, due to in a Discrete Event Simulation, you have to sample an Exponential per every iteration in the main loop, and the Inverse Method involves evaluating a logartihm, which is quite slow. Ziggurat skips that 99% of times, so the overall speed increase is substantial :DD

Ziggurat needs of two random numbers, a usize and a f64. Instead of making two sample calls (expensive) the algorithms generates a u64 then grabs the first 12 bits of the number, and uses the remaining 52 to create a float, which with the correct exponent, creates a number in [2,3] or [1,2].

assert(Precision == f64 or Precision == f32);

const Uint = if (Precision == f64) u64 else u32;
const mantissa_shift = if (Precision == f64) 12 else 9;
  
const exp_0: Uint = if (Precision == f64) 0x3FF0000000000000 else 0x3F800000;
const exp_1: Uint = if (Precision == f64) 0x4000000000000000 else 0x40000000;  
const offset = 1.0 - std.math.floatEps(Precision) / 2.0;

const bits = rng.int(Uint);
const i = @as(usize, bits) & 0xff; // number between 0 and 256

const mantissa = (bits >> mantissa_shift);
  
const u = if (symmetric) @as(Precision, @bitCast(exp_1 | mantissa)) - 3 else @as(Precision, @bitCast(exp_0 | mantissa)) - offset;

Then it just looks up the appropiate numbers in a precomputed table. The rust crate right now supports single precision by generating the random number in double precision and casting it down, and I don’t know Rust enough to tell if this implementation could be achieved in a similar way. What I know tho is that comptime makes me not duplicate a whole algorithm, and that this “get two numbers for the price of one” is just delightful!

This algorithm is also used to sample from the normal distribution, so that’s also in the library now! TBH having a sampling piece of software that cannot sample a Gaussian distribution is pretty lame haha

Have a nice day!

3 Likes