Test parameterization

Calder-Ty · February 15, 2024, 5:30pm

I’ve been working on a keyboard event parser/handler for the kitty keyboard protocol. One thing I’ve missed from other languages, as I’ve been testing, is the ability to parameterize my tests. In python, javascript, and rust there are packages that support the ability to generate individual tests based off of parameterized inputs and outputs.

I did attempt to use comptime to generate different test delcarations, but unfortunately it is not supported (as far as i can tell). I have been able to do inline loops to get rid of boiler plate, but it doesn’t create an individual test for each item, which means any one failure fails the whole test. That can make it harder to track down what input actually failed, or if there are others that would fail but were never tested.

Here is an example of what I’m thinking about:

// 'Standard' Table can be found here https://vt100.net/docs/vt100-ug/chapter3.html
const codes = [_]std.meta.Tuple(&.{ u8, KeyEvent }){
        .{ 0, KeyEvent{ .code = KeyCode{ .Char = ' ' }, .modifier = KeyModifier.control() } },
        .{ 1, KeyEvent{ .code = KeyCode{ .Char = 'a' }, .modifier = KeyModifier.control() } },
  // Lots of other Event mappings
        .{ 31, KeyEvent{ .code = KeyCode{ .Char = '?' }, .modifier = KeyModifier.control() } },
};
inline for (codes) |code| {
    test "parse c0 codes to standard representation" {
        const result = try parseEvent(&[_]u8{code.@"0"}, false);
        try testing.expect(std.meta.eql(code.@"1", result.?));
    }
}

Is this something that others would find beneficial?

AndrewCodeDev · February 15, 2024, 6:34pm

One possible option is to create a test runner struct because you can bind function parameters and names based on characters. Then run through them using the @field builtin and call them in a single test.

If you just want to run those tests in a loop, I don’t know why we couldn’t do that in a regular loop. I don’t know why you would need independent tests at this level. Maybe I am missing something?

dimdin · February 15, 2024, 6:34pm

I can think another way to do it, but first two hints:

You can construct tuples as unnamed structs

const code = [_]struct{ u8, KeyEvent} {

You can access tuple members as: code[0].

You can have one test collect all the results and print all the failed results.

test "parse..." {
     const codes = [_]struct{ u8, KeyEvent}{
          .{...},
    };
    var result: [codes.len]?KeyEvent = undefined;
    var failed = false;
    for (codes,0..) |code,i| {
        result[i] = try parseEvent(&[_]u8{code[0]}, false);
        if (!failed and !std.meta.eql(code[1], result[i].?)) {
            failed = true;
        }
    }
    if (failed) {
         // print here, all the failures to stderr 
         return error.Expect;
    }
}

AndrewCodeDev · February 15, 2024, 6:47pm

That’s much better for a test imo. I’d add one more thing here… since we’re running a test, I’d add little more machinery for accessing the optional. You probably wrote it this way for brevity, but I’d make sure that the test captures everything including that. Even if we anticipate that it should never fail, simple refactoring bugs could violate that assumption and it’d be nice to know that. I’m particular about these kinds of things.

Something as simple as:

if (result[i] == null) {
    // handle this case here
}

Also, @dimdin, are you aware of any status updates for printing during tests? I know that debug printing had issues with tests, but I haven’t experimented with stderr.

I’d go at this from the opposite direction - if you can generate tests, you can probably generate what information you need inside a single test. I’d always try that first tbh.

Sze · February 15, 2024, 7:11pm

The reason why it is hard to track down which input failed, is because you are using the generic testing.expect.

In my opinion the simplest fix is to write one single test, that contains the loop and executes all expectation-checks, but then use the more specific testing.expectEqual, because this one actually shows what was expected and what it got instead, thus it is clear which input failed.

In practice I don’t think this is a problem, either you just fix the first fail you get right on the spot, or if there is some reason to ignore that for now, then just temporarily comment out that test case and work on the next one.

Calder-Ty · February 15, 2024, 7:13pm

Perhaps i’m misunderstanding, or being misunderstood. I know i can generate the asserts in a loop. That is the solution i have now. I want individual tests because i don’t like having multiple asserts in one test. If my function fails for data x, and for y, I want to know. Not have it fail on x and then never test y.

AndrewCodeDev · February 15, 2024, 7:17pm

Right, we’re on the same page here then.

I think what @dimdin could help solve that issue for you though. You can store your test results and if something fails, mark it as a failure.

Then you can dump the failed test results back to the user before returning the error and closing the test.

That’s part of why I was asking if we’ve handled the issues for printing during tests. Going directly to stdio used to have issues but I haven’t checked up on that in a while and stderr may work fine. You may even try logging the output to file but I haven’t tried logging with the builtin test feature yet.

AndrewCodeDev · February 15, 2024, 7:18pm

As an addendum to the above solution, instead of having a single bool called failed, you can have an array of them that is used to mark which ones failed so you can dump that info after.

Calder-Ty · February 15, 2024, 7:23pm

As far as i know the print to stdout is still a problem, and one that won’t be resolved. Yes I suppose that i could set up an array and report the failures. It could end up with a lot of bolierplate. Perhaps a comptime function to generate the infrastructure could help.

Just when i first came across the problem, it seemed like comptime was such a nice solution to it, but can’t be done because test declarations can’t be operated on by comptime syntax.

AndrewCodeDev · February 15, 2024, 7:26pm

True, but I’d argue that the boilerplate for generating tests is the tradeoff here. That said, I don’t know if it’s that much boilerplate.

var checks: [Examples.len]bool = .{ true } ** Examples.len;

Then…

checks[i] = std.meta.eql(...);

I’ll play around with logging to see if we can get around the io issue, but if it works I’d probably go with that.

dimdin · February 15, 2024, 7:57pm

from memory:

testing.expect* calls std.debug.print (that prints on stderr) for all the testing output.
a custom logger is installed by the tester that count as errors the levels warn and err.
stdout is used from the builder for communication between the zig build and the zig test process.

Printing to stdout is problematic.
Logging as info or dbg is invisible, and logging as warn and err means testing error.
Printing to stderr is the way to display error messages when failing.

AndrewCodeDev · February 15, 2024, 8:35pm

So stderr has the same issue - I just confirmed it. The test eats the anything after one of the last newlines. Even so, the std.log.err has an annoying formatting issue where if you pass `“\n” in your format, it prints it before the logger’s name so the message gets put to a separate line… so then the name of the logger is colliding with the text above it. I dunno, not digging it for testing. Logging to a file has no problem though.

Anyhow, I tested this out and it works pretty nicely:

const std = @import("std");

const builtin = @import("builtin");

fn ErrorFileLogger(comptime src: std.builtin.SourceLocation) type {
    return struct {
        const Self = @This();
        file: std.fs.File,
        used: bool = false,

        pub fn open() !Self {
            return Self {
                .file = try std.fs.cwd().createFile(src.fn_name ++ ".log", .{
                    .read = false, .truncate = true
                })
            };       
        }

        pub fn print(self: *Self, comptime format: []const u8, args: anytype) !void {
            try self.file.writer().print(format ++ "\n", args);
            self.used = true;            
        }

        pub fn close(self: *Self) void {
            self.file.close();

            if (!self.used) { // clean-up logging file
                // shouldn't fail here if we opened the same file.
                std.fs.cwd().deleteFile(src.fn_name ++ ".log") catch unreachable;
            }
        }
    };
}

test "logging_test" {

    var log = try ErrorFileLogger(@src()).open();

    defer log.close();

    const keys = [_]u8{ 'a', 'b', 'c', 'd' };

    const checks: [keys.len]bool = .{ false } ** keys.len;

    for (keys, checks) |key, check| {
        if (!check) try log.print("Key Failed: {}", .{ key });
    }
}

You can definitely make it fancier, but it writes out to a file of the same name as the test… so mine produces test.logging_test.log. I dunno, worth playing around with. You could definitely parameterize the path more to make it go to a dedicated logging directory.

mperillo · February 15, 2024, 8:54pm

This seems to be a simple Table Driven Test. It is the preferred testing method used in Go.

Since Zig does not support sub tests you need to do it manually:

Add a name/key to each test (unless one field can be used as a key)
Personally I would use a normal struct, instead of a tuple
Use a normal loop, no need of comptime
When a test fails, append the key and the error message to a slice or ArrayList and print them at the end.
If tests are cpu bound, call each test in a thread pool

AndrewCodeDev · February 15, 2024, 8:56pm

Just pointing out, we can’t print them directly, you’ll get truncated output. That’s why I’m suggesting some form of logging to file instead. The rest of what you said though is good

mperillo · February 16, 2024, 7:45am

But, as an example, testing.expectEqualSlices can print a log of data on stderr.

I looked at test_runner and Build/Step/RunStep and the stderr message is used as error message, with no truncation.

IMHO, the real issue is how to report the sub test name to the test runner.
testing.expectXXX functions print the error message directly to stderr, so when the function returns it is no longer possible to report the sub test name.

Probably the only solution is to use a different file descriptor.
GnuPGP uses this solution; as an example --status-fd n, --log-fd n and friends:

mperillo · February 16, 2024, 7:51am

Another solution is to add a TestConfig struct to all the testing.expectXXX functions, where you can set the sub test name.

AndrewCodeDev · February 16, 2024, 9:02am

You know… it’s funny because you can prevent the truncation just by printing an extra newline, but I opened a git issue about this a year ago and was linked to another issue that was being discussed about this. Anyhow…

File descriptors may help - I’d definitely be willing to look at a working example

mperillo · February 16, 2024, 9:56am

A simple implementation should be possible, but only when the build runner is in listen mode (zig build test). In this case only one test is run for each request.

When the build runner is in normal mode (zig test), then you need to associate the sub test name to a test_fn using a file or memory; I suspect that this is not possible to implement.

castholm · February 16, 2024, 10:35am

To summarize, the goal is to parameterize tests in such a way that if you have 5 test cases and the 2nd case fails, you still want cases 3, 4 and 5 to run, and you also want each case to be considered its own test for the purposes of logging and statistics.

The following approach using a type-returning function accomplishes this:

const std = @import("std");

pub fn mulAdd(a: i32, b: i32, c: i32) i32 {
    return a * b + c;
}

test mulAdd {
    _ = MulAddTestCase(5, 1, 2, 3);
    _ = MulAddTestCase(9, 2, 3, 4); // incorrect expectation
    _ = MulAddTestCase(17, 3, 4, 5);

    comptime { // we can even define cases using comptime metaprogramming
        for (.{
            "26,4,5,6",
            "38,5,6,7", // incorrect expectation
        }) |case| {
            var it = std.mem.splitScalar(u8, case, ',');
            _ = MulAddTestCase(
                try std.fmt.parseInt(i32, it.next().?, 10),
                try std.fmt.parseInt(i32, it.next().?, 10),
                try std.fmt.parseInt(i32, it.next().?, 10),
                try std.fmt.parseInt(i32, it.next().?, 10),
            );
        }
    }
}

fn MulAddTestCase(comptime expected: i32, comptime a: i32, comptime b: i32, comptime c: i32) type {
    return struct {
        test {
            std.testing.expectEqual(expected, mulAdd(a, b, c)) catch |err| {
                std.debug.print("{}\n", .{@This()});
                return err;
            };
        }
    };
}

This will log the following output:

1/6 decltest.mulAdd... OK
2/6 test_0... OK
3/6 test_0... expected 9, found 10
main.MulAddTestCase(9,2,3,4)
FAIL (TestExpectedEqual)
C:\zig\lib\std\testing.zig:93:17: 0xc89318 in expectEqualInner__anon_4779 (test.exe.obj)
                return error.TestExpectedEqual;
                ^
C:\temp\main.zig:33:17: 0xc89662 in test_0 (test.exe.obj)
                return err;
                ^
4/6 test_0... OK
5/6 test_0... OK
6/6 test_0... expected 38, found 37
main.MulAddTestCase(38,5,6,7)
FAIL (TestExpectedEqual)
C:\zig\lib\std\testing.zig:93:17: 0xc89318 in expectEqualInner__anon_4779 (test.exe.obj)
                return error.TestExpectedEqual;
                ^
C:\temp\main.zig:33:17: 0xc89cf2 in test_0 (test.exe.obj)
                return err;
                ^
4 passed; 0 skipped; 2 failed.
error: the following test command failed with exit code 1:
C:\temp\zig-cache\o\44d9392e612b2b642805d8157e949b11\test.exe

Which isn’t perfect but should at the very least let you see which cases failed in the log.

Note that there’s no way to give the test cases unique names; the grammar for test declarations is KEYWORD_test (STRINGLITERALSINGLE / IDENTIFIER)? Block, so the name must be a string literal, not a concatenation expression, the result of calling std.fmt.comptimePrint, etc…

mperillo · February 16, 2024, 11:09am

I did not think of using a struct; thanks.

Here is my version:

const std = @import("std");

pub fn mulAdd(a: i32, b: i32, c: i32) i32 {
    return a * b + c;
}

const TestCase = struct {
    key: []const u8,
    a: i32,
    b: i32,
    c: i32,
    expect: i32,

    fn test_fn(self: TestCase) type {
        return struct {
            test {
                const actual = mulAdd(self.a, self.b, self.c);
                std.testing.expectEqual(self.expect, actual) catch |err| {
                    std.debug.print("mulAdd {s}\n", .{self.key});
                    return err;
                };
            }
        };
    }
};

test mulAdd {
    inline for ([_]TestCase{
        .{ .a = 26, .b = 4, .c = 5, .expect = 6, .key = "1" },
        .{ .a = 38, .b = 5, .c = 6, .expect = 7, .key = "2" }, // incorrect expectation
    }) |case| {
        _ = case.test_fn();
    }
}

But you still need to update the testing.expectXXX functions so that the test key can be printed as:

error: 'test_0' failed: 'mulAdd 1' expected 6, found 109

instead of

error: 'test_0' failed: expected 6, found 109
mulAdd 1

Update

The code works correctly, however I’m afraid that with large tables, the compilation will be very slow.