This overengineered log function reduced my compile-time by 15%

IntegratedQuantum · August 5, 2023, 10:06am

Motivation

I noticed a while ago that formatting takes up a big chunk of my compile-time.
The reason is quite obvious: It duplicates a ton of code. When I call std.log.info("abc", .{}) and std.log.info("de", .{}) it will create 2 different instances of my logging functions.

Idea

We need to move the extra text from compile time to runtime.
Instead of calling std.log.info("Some text {}", .{someValue}) we should call std.log.info("{s}{}", .{"Some text ", someValue})

Maybe this can be done at the call-site?
What if I change my habbits and always call std.log.info("{s}", .{"abc"});?
That won’t work. "abc" has the type *[3:0]const u8
Now if I pass "de" which is of type *[2:0]const u8 it will create a new instance of the log function because it’s generic over the tuple type
How about std.log.info("{s}", .{@as([]const u8, "abc")};?
That looks already really clunky and doesn’t even work. @as([]const u8, "abc") is a comptime value. And this will create a comptime tuple field, whose value is passed at compile time

But the idea isn’t that bad. Instead of doing this work manually I could maybe do this inside the log function. This still means that the log function is duplicated, but maybe that’s good enough.

Implementation

So first of all for context, I would like to show you my old log function. Basically I’m printing to a file and the terminal. The text printed to terminal uses ANSI color codes to give me a color-coded output, whereas the file contains the log level as a text e.g. [err]: Log output.
I’m printing to an allocated buffer first to reduce the number of syscalls.

pub fn logFn(
	comptime level: std.log.Level,
	comptime _: @Type(.EnumLiteral),
	comptime format: []const u8,
	args: anytype,
) void {
	const color = comptime switch (level) {
		std.log.Level.err => "\x1b[31m",
		std.log.Level.info => "",
		std.log.Level.warn => "\x1b[33m",
		std.log.Level.debug => "\x1b[37;44m",
	};

	var stackFallbackAllocator: std.heap.StackFallbackAllocator(65536) = undefined;
	stackFallbackAllocator.fallback_allocator = threadAllocator;
	const allocator = stackFallbackAllocator.get();
	{
		const string = std.fmt.allocPrint(allocator, "[" ++ level.asText() ++ "]" ++ ": " ++ format ++ "\n", args) catch unreachable;
		defer allocator.free(string);
		logFile.writeAll(string) catch {};
	}
	{
		const string = std.fmt.allocPrint(allocator, color ++ format ++ "\x1b[0m\n", args) catch unreachable;
		defer allocator.free(string);
		nosuspend std.io.getStdErr().writeAll(string) catch {};
	}
}

Alright now for the actual log function. Be warned, this will get pretty long and complicated:

pub fn logFn(
	comptime level: std.log.Level,
	comptime _: @Type(.EnumLiteral),
	comptime format: []const u8,
	args: anytype,
) void {
	const color = comptime switch (level) {
		std.log.Level.err => "\x1b[31m",
		std.log.Level.info => "",
		std.log.Level.warn => "\x1b[33m",
		std.log.Level.debug => "\x1b[37;44m",
	};
	const colorReset = "\x1b[0m\n";
	const filePrefix = "[" ++ comptime level.asText() ++ "]" ++ ": ";
	const fileSuffix = "\n";
	// Parsing the format string and creating the improved format string:
	comptime var formatString: []const u8 = "";
	comptime var i: usize = 0;
	comptime var mode: usize = 0;
	comptime var sections: usize = 0;
	comptime var sectionString: []const u8 = "";
	comptime var sectionResults: []const []const u8 = &.{};
	comptime var sectionId: []const usize = &.{};
	inline while(i < format.len) : (i += 1) {
		if(mode == 0) {
			if(format[i] == '{') {
				if(format[i + 1] == '{') {
					sectionString = sectionString ++ "{{";
					i += 1;
					continue;
				} else {
					mode = 1;
					formatString = formatString ++ "{s}{";
					sectionResults = sectionResults ++ &[_][]const u8{sectionString};
					sectionString = "";
					sectionId = sectionId ++ &[_]usize {sections};
					sections += 1;
					continue;
				}
			} else {
				sectionString = sectionString ++ format[i..i+1];
			}
		} else {
			formatString = formatString ++ format[i..i+1];
			if(format[i] == '}') {
				sections += 1;
				mode = 0;
			}
		}
	}
	formatString = formatString ++ "{s}";
	sectionResults = sectionResults ++ &[_][]const u8{sectionString};
	sectionId = sectionId ++ &[_]usize {sections};
	sections += 1;
	formatString = comptime cacheString("{s}" ++ formatString ++ "{s}");

	// Extracting the tuple types
	comptime var types: []const type = &.{};
	comptime var i_1: usize = 0;
	comptime var i_2: usize = 0;
	inline while(types.len != sections) {
		if(i_2 < sectionResults.len) {
			if(types.len == sectionId[i_2]) {
				types = types ++ &[_]type{[]const u8};
				i_2 += 1;
				continue;
			}
		}
		const TI = @typeInfo(@TypeOf(args[i_1]));
		if(@TypeOf(args[i_1]) == comptime_int) {
			types = types ++ &[_]type{i64};
		} else if(@TypeOf(args[i_1]) == comptime_float) {
			types = types ++ &[_]type{f64};
		} else if(TI == .Pointer and TI.Pointer.size == .Slice and TI.Pointer.child == u8) {
			types = types ++ &[_]type{[]const u8};
		} else if(TI == .Int and TI.Int.bits <= 64) {
			if(TI.Int.signedness == .signed) {
				types = types ++ &[_]type{i64};
			} else {
				types = types ++ &[_]type{u64};
			}
		} else {
			types = types ++ &[_]type{@TypeOf(args[i_1])};
		}
		i_1 += 1;
	}
	types = &[_]type{[]const u8} ++ types ++ &[_]type{[]const u8};

	// Fill the comptime-known part of the tuple at compile-time to reduce the runtime code generation:
	comptime var comptimeTuple: std.meta.Tuple(types) = undefined;
	comptime std.debug.assert(std.meta.Tuple(types) == std.meta.Tuple(types));
	comptime var len: usize = 0;
	i_1 = 0;
	i_2 = 0;
	inline while(len != sections) : (len += 1) {
		if(i_2 < sectionResults.len) {
			if(len == sectionId[i_2]) {
				comptimeTuple[len+1] = sectionResults[i_2];
				i_2 += 1;
				continue;
			}
		}
		i_1 += 1;
	}
	comptimeTuple[0] = filePrefix;
	comptimeTuple[comptimeTuple.len - 1] = fileSuffix;
	// Here the runtime part begins:
	var resultArgs: std.meta.Tuple(types) = comptimeTuple;
	len = 0;
	i_1 = 0;
	i_2 = 0;
	inline while(len != sections) : (len += 1) {
		if(i_2 < sectionResults.len) {
			if(len == sectionId[i_2]) {
				i_2 += 1;
				continue;
			}
		}
		resultArgs[len+1] = args[i_1];
		i_1 += 1;
	}

	{
		logToFile(formatString, resultArgs); // I moved as much code to helper functions that are generic over the new formatting options.
	}
	{
		resultArgs[0] = color;
		resultArgs[resultArgs.len - 1] = colorReset;
		logToStdErr(formatString, resultArgs);
	}
}

Essentially there is a few key sections:

The format string is parsed, extracting all the text sections and replacing them with {s}.
The new tuple type is constructed. A few types are additionally coerced to broader types. Like [:0]const u8 → []const u8
ix → i64 for x ≤ 64
ux → u64 for x ≤ 64
All the parts of the format string which are comptime known are moved into the tuple at compile time. This reduces the amount of code that gets generated at runtime.
All the arguments from the original tuple are moved into the tuple at runtime.
I do the allocPrinting and write to file/stderr in a seperate function which is only generic in the new format string+tuple to further reduce duplicate code.

One more note: Strings are not deduplicated at compile time, so I have to that manually using comptime function caching, otherwise the whole thing wouldn’t work:

fn cacheStringImpl(comptime len: usize, comptime str: [len]u8) []const u8 {
	return str[0..len];
}

fn cacheString(comptime str: []const u8) []const u8 {
	return cacheStringImpl(str.len, str[0..].*);
}

Results

Overall this reduced the number of allocPrint instances from logging from 300ish to just 27.
The size of the debug binary has been reduced by 630 kB.
The debug build time has been reduced from 9.9s to 8.5s(15%).
The size of the release-fast binary has been reduced by 100 kB.
The release-fast build time has been reduced from 1m11s to 56s(21%)

Outlook

I’m really looking forward to the compile-time improvements coming that will be introduced by the x86 backend and hot code swapping/incremental compilation.

AndrewCodeDev · August 5, 2023, 10:34am

Aha! By using the tuple, you’re resolving to a broader type.

C++ has similar problems with NTTP (non-type template parameters) that cause binaries to explode. I had a similar problem with custom classes that enable taking strings as template parameters (they just become char arrays) and you get types like <{0, 1, 2, 3, 4, 5 …}> etc…

alp · August 8, 2023, 3:30pm

Amazing work! You should help work on the stdlib

IntegratedQuantum · August 8, 2023, 5:04pm

Thanks.

I was actually considering to make a PR with this. But I don’t know where to put it.
If I just change the default log function then that probably would only help a few people.
If I add it further up in the call stack (like directly in std.log.info) then it would duplicate code for different log levels and log scopes.
If I add it further down in the call stack, like in std.fmt.format then all the functions above it (allocPrint, logFn, log.info) would remain duplicated.

And additionally it would require a lot of modification to make it good enough for the standard library. For example with this log function you cannot log types std.log.info("{}", .{type})