When there's too much abstraction (OOP) in an interface?

Hello everyone!

I came here to ask for your opinions regarding interfaces and their OOP flavor, let me explain my case.

For my DES simulation I need sampling from specific statistical distributions, I already made a post (i am not linking to the code because it’s absolutely broken rn xD) to showcase that. In there, I decided to test it as an Intrusive Interface, as it makes sense: you define the Distribution with a the following VTable

pub fn VTable(comptime Precision: type) type {
    return struct {
        sample: *const fn (dist: *const Distribution(Precision), rng: Random) Precision,
        format: *const fn (dist: *const Distribution(Precision), writer: *Io.Writer) std.Io.Writer.Error!void,
    };
}

This follows my simulation needs to specify “something that you are going to be sampling from” so the following code is actually beautiful and valid:

const ContDist = Distribution(f32, f32);

const d: Exponential(f32) = .init(1);
// the following absolutely works too!
// const d: Normal(f32) = .init(0,1);
const X: *const ContDist = &d.interface;

fn myFn(X: *const ContDist) void {
    // do things
    while (t_clock < horizon) { X.sample(); }
}

Now, after the context, my question is when do you stop abstracting?

Now, let’s implement a Kolmogorov-Smirnov gof test to verify our implementation. the test just checks, for an $alpha = 0.999$ that
$ D \sqrt{n} > 1.95 $, where $D = sup_{x} |F(x) - F_n(x)$ where F_n(x) is the Empirical Cumulative Distribution Function of the sample with size $n$, and $F(x)$ is the cumulative distribution function.

So to test a distribution I need to have the cdf. As this is common for all distributions I can just:

pub fn VTable(comptime Precision: type) type {
    return struct {
        sample: *const fn (dist: *const Distribution(Precision), rng: Random) Precision,
        cdf: *const fn(dist: *const Distribution(Precision), x: Sample) Precision,
        format: *const fn (dist: *const Distribution(Precision), writer: *Io.Writer) std.Io.Writer.Error!void,
    };
}

And now the test just becames this signature pub fn ksTest(sample: []Sample, Dist: ContDist) instead of making something like fn normalCdf(x: Sample, params: anytype) and then code it.

My point is, I could just keep going with everything a Distribution has:

pub fn VTable(comptime Precision: type) type {
    return struct {
        sample: *const fn (dist: *const Distribution(Precision), rng: Random) Precision,
        cdf: *const fn(dist: *const Distribution(Precision), x: Sample) Precision,
        pdf: *const fn(dist: *const Distribution(Precision), x: Sample) Precision,
        quantile: *const fn(dist: *const Distribution(Precision), x: Precision) Sample,
        quartile: *const fn(dist: *const Distribution(Precision), x: Precision) Sample,
        // expected value, theoretical variance, std, .... 
        format: *const fn (dist: *const Distribution(Precision), writer: *Io.Writer) std.Io.Writer.Error!void,
    };
}

But if you follow this path, you arrive to Scipy stats rv_continous which is a fully OOP class with ALL the methods implemented for all the distributions. Not that is wrong doing in a big ass class, but does it make sense that if I want to know the CDF of exponential i just do this:

// Code this in the library
pub fn exponentialCDF(rate: Precision, x: Precision) Precision {
    return 1 - exp(-rate * x);
}

const e = Exponential.exponentialCDF(1, 1);

rather than

const e: Exponential(f32) = .init(1);
e.cdf(1)

TL;DR: questions

  1. Which is your criteria for when to stop? My OOP side just wants to make it more and more complete, but that something is possible does not mean it’s the correct design. Edit: my gut tells me that to make the interface larger and with more feautres without a good reason is just not the move, but I am struggling to articulate why I feel this way!
  2. Despite maybe not being statisticians, which of the two above you prefer? There is an argument that the library should focus in sampling only vs implementing a distribution, but I don’t think it’s work to apply the intrusive interface to all of the methods.

Thank you and have a great day!

(PD: If this should be in help instead of explain lmk!)

Don’t make your interface just the vtable. Your interface can contain fields and functions that call the lower level vtable functions. You only want to put the things in the vtable which implementations might be variable.

I personally do prefer my math to work with pure functions and data alone when possible though. This makes it easier to SIMDify them if necessary as well though this is possible even with interfaces as long as you work with batches.

2 Likes

Yeah, that’s what my gut tells me too!

Following this example, i find much nicer R way of doing this rather than python:

sample <- rnorm(mu, variance, n)
quartile <- qnorm(x, mean=mu, variance=sigma**2)
cdf <- pnorm(x, mean=mu, variance=sigma**2)

so in my example you’d argue for something like this?

const dist = @import("distributions")

const norm_cdf = dist.Normal.cdf;
const n = norm_cdf(mu, sigma2, 3)

an have them on the same file, despite not implementing the interface?

Like, a good criteria to not make it implement the table could be to not need to be generic at runtime? IE, the kolmogorov-smirnov test could take a pointer to a CDF with the params?

const cdfFn = *const fn(params: anytype, x: Precision) Precision;
fn ksTest(sample: []Sample, cdf: cdfFn) Precision

Yeah if you don’t have good reason to be generic during runtime I’d avoid it.

2 Likes