Optimize runtime checks away with double compilation

Hello :))

I am building a Discrete-Event simulation of a social network (cannot share the code to it yet..), and this idea came to me. To schedule the time between events, I use runtime dynamic dispatching reading from a json config file (see the distributions library, the json part of the README for more context) and to give the user the choice if the traces should be written to file or not. Let’s focus on the file example for a minute, my code is filled with:

if (simconf.trace_to_file) { // write the trace }

which cannot be optimized, as the variable trace_to_file is read from a json at runtime. Also, I cannot embed the config file into the build system, as I want this to be just a single executable, the end user should be able to just pass the desired configuration and the simulation to run —make them run zig build is not an option.

Now, an idea crossed in my head… Could I make a Zig program that received the configuration file, read it ({"trace_to_file": false}) and knowing that information, could compile the simulation removing the if (simconf.trace_to_file) checks?

A silly concrete example mimicking the situation I am at:

const std = @import("std");
const Io = std.Io;

// json equivalent struct
const Conf = struct {
    write_to_file: bool,
};

pub fn main(init: std.process.Init) !void {
    const arena: std.mem.Allocator = init.arena.allocator();

    const args = try init.minimal.args.toSlice(arena);

    if (args.len <= 4) {
        std.debug.print("Usage: n a b config.json\n", .{});
        std.process.exit(1);
    }

    const n = try std.fmt.parseInt(u64, args[1], 10);
    const a = try std.fmt.parseInt(u64, args[2], 10);
    const b = try std.fmt.parseInt(u64, args[3], 10);

    // load json
    const content = try std.Io.Dir.cwd().readFileAlloc(init.io, "conf.json", arena, .unlimited);
    defer arena.free(content);
    const options = std.json.ParseOptions{ .ignore_unknown_fields = true };
    const parsed_result = try std.json.parseFromSlice(Conf, arena, content, options);
    const conf = parsed_result.value;
    // In order to do I/O operations need an `Io` instance.
    const io = init.io;

    var stdout_buffer: [1024]u8 = undefined;
    var stdout_file_writer: Io.File.Writer = .init(.stdout(), io, &stdout_buffer);
    const stdout_writer = &stdout_file_writer.interface;

    try stdout_writer.print("Computing the {d}-th fibonacci number starting with ({d}, {d})\n", .{ n, a, b });
    try stdout_writer.flush();

    const nth = try fibonacci(init.io, a, b, n, conf);

    try stdout_writer.print("nth: {d}\n", .{nth});
    try stdout_writer.flush(); // Don't forget to flush!
}

/// Computes the n-th fibonacci number
fn fibonacci(io: std.Io, a: u64, b: u64, n: u64, conf: Conf) !u64 {
    var buffer: [64 * 1024]u8 = undefined;
    const file = try std.Io.Dir.cwd().createFile(io, "./fib.txt", .{ .read = false });
    defer file.close(io);
    var file_writer = file.writer(io, &buffer);
    const writer = &file_writer.interface;

    var f_i2: u64 = a;
    var f_i1: u64 = b;
// AIMING TO REMOVE THIS
    if (conf.write_to_file) { 
        try writer.print("{d}\n", .{f_i2});
        try writer.print("{d}\n", .{f_i1});
    }
    for (2..n) |_| {
        const f_i = f_i2 + f_i1;
        // AIMING TO REMOVE THIS
        if (conf.write_to_file) {
            try writer.print("{d}\n", .{f_i});
        }

        f_i2 = f_i1;
        f_i1 = f_i;
    }
    // AIMING TO REMOVE THIS
    if (conf.write_to_file) {
        try writer.flush();
    }

    return f_i1;// AIMING TO REMOVE THIS
}

The point would be that the if inside the fibonacci code could get optimized away by making the two programs setup.

Idk if I am just delulu, but I don’t know where even to stat to try to test this, and that’s why I am asking here :smiley:

Have you actually benchmarked a difference, and is it enough to care?

To answer the actual question of how you would do this: compiling the program at runtime is a way to do this, but it would add considerable delay to the execution which probably defeats the entire point.

Instead, your functions can take a comptime log_enabled: bool, and higher up the call stack if (runtime_log_enabled) foo(true, ...) else foo(false, ...). However, this could greatly increase binary bloat, though perhaps the optimiser is smart enough to deduplicate much of it.

Alternative would be making the logging faster, one way to do that would be to only write to the file at key points where the delay is acceptable, or perhaps in another thread, or something else entirely.

No! I just thought this could be a cool way to do it, idrc if its worth it or not :smiley:
Probably for just the trace writing it’s not worth it, if you take into account all the sampling from distributions that involve a vtable dereference (which I hope this method would optimize away) maybe? This is my master thesis, but it’s in statistics and OR, so they don’t expect me (and I don’t have time to haha) do a very serious performance benchmarking. Once I have hand it in I will benchmark it for sure, I am very curious.

I did not give as much context as needed to accurately answer the “defeating the point”, but the simulation has to be ran thousands (or tens of thousands) of times and it can take form a few miliseconds (very simple topologies and short timespans) to several hours with a lot of users. In the latter case any gain would be absolutely welcomed. Again, this question aimed to this set up is possible and how (which no actuall idea even when to start, that’s my reasoning behind giving a very simplification of one of the optimize things whit the fibonacci)

Hmmm yes, for just one boolean would totally work. You mean that the binary should just contain two functions, one with the writes and another one without them, right?

Regarding this yeah, I already though about having one thread just writing to a file and the main thread just going and copying the trace of the simulation! But, again, the thing that I might be more able to gain from is the runtime dynamic dispatch from the distribution sampling.

Thank you for your answer tho! If you feel more context is needed I can upload the whole code, but I would need some time to get it somewhat more presentable hahah

2 Likes