std.testing.expectEqualSlices i think could have a bug

I have written a lexer for an assembly like language in zig. i have also implemented a .eql() function for my tokens to make sure that they are the same.

This is my test file:

const std = @import("std");
const testing = std.testing;
const assert = testing.expect;
const allocator = testing.allocator;

const TokenKind = @import("token").TokenKind;
const Token = @import("token").Token;

const L = @import("lexer");
const Lexer = L.Lexer;

fn assert_eq(slice1: []const Token, slice2: []const Token) !void {
    if (slice1.len != slice2.len) return error.TestUnexpectedResult;

    for (slice1, 0..) |_, i| {
        if (slice1[i].eql(slice2[i])) {
            continue;
        } else {
            return error.TestUnexpectedResult;
        }
    }
}

test "test single character tokens" {
    const input: []const u8 = "[] () .,: #";

    var lexer = Lexer.init(allocator, input);
    defer lexer.deinit();

    const tests = [_]Token{
        .{ .kind = .LBracket, .line = 1 },
        .{ .kind = .RBracket, .line = 1 },
        .{ .kind = .LParen, .line = 1 },
        .{ .kind = .RParen, .line = 1 },
        .{ .kind = .Dot, .line = 1 },
        .{ .kind = .Comma, .line = 1 },
        .{ .kind = .Colon, .line = 1 },
        .{ .kind = .Hashtag, .line = 1 },
        .{ .kind = .Eof, .line = 1 },
    };

    try lexer.tokenize();

    try assert(lexer.errors.items.len == 0);

    try testing.expectEqualSlices(Token, &tests, lexer.tokens.items);
}

test "test whitespace and newlines" {
    const input: []const u8 =
        \\ADD
        \\
        \\HALT
    ;

    var lexer = Lexer.init(allocator, input);
    defer lexer.deinit();

    const tests = [_]Token{ .{ .kind = .{ .Identifier = "ADD" }, .line = 1 }, .{ .kind = .{ .Identifier = "HALT" }, .line = 3 }, .{ .kind = .Eof, .line = 3 } };

    const tokens = try lexer.getTokens();

    try assert(lexer.errors.items.len == 0);

    try assert_eq(tests[0..], tokens);
}

I have checked before that the .eql function works between tokens. Then during this call specifically in the second test for the lexer

try std.testing.expectEqualSlices(Token, &tests, tokens);

i get this output:

============ expected this output: =============  len: 3 (0x3)

[0]: .{ .kind = .{ .Identifier = { 65, 68, 68 } }, .line = 1 }
[1]: .{ .kind = .{ .Identifier = { 72, 65, 76, 84 } }, .line = 3 }
[2]: .{ .kind = .{ .Eof = void }, .line = 3 }

============= instead found this: ==============  len: 3 (0x3)

[0]: .{ .kind = .{ .Identifier = { 65, 68, 68 } }, .line = 1 }
[1]: .{ .kind = .{ .Identifier = { 72, 65, 76, 84 } }, .line = 3 }
[2]: .{ .kind = .{ .Eof = void }, .line = 3 }

================================================

the function states that the two slices are identical, yet it fails. Is this a fault on my end?

Could you share the definition of Token (and the definitions of any dependent types)?

I believe you also truncated the output. There should also be a line stating which index the difference was found at.

EDIT: My guess is that your eql function is not comparing .Identifier correctly, you’re likely comparing pointer values instead of using std.mem.eql.

Sure here is my token definition:

const std = @import("std");

const Directive = enum { text, data };

pub fn strToDirective(str: []const u8) ?Directive {
    return std.meta.stringToEnum(Directive, str);
}

pub const TokenKind = union(enum) {
    Identifier: []const u8,
    Directive: Directive,
    Integer: i32,
    StringLiteral: []const u8,
    Register: u8,
    Dot,
    Colon,
    Comma,
    LBracket,
    RBracket,
    LParen,
    RParen,
    Hashtag,
    Eof,

    pub fn eql(self: TokenKind, other: TokenKind) bool {
        switch (self) {
            .Identifier => |ident| return std.mem.eql(u8, ident, other.Identifier),
            .Directive => |directive| return directive == other.Directive,
            .Integer => |num| return num == other.Integer,
            .StringLiteral => |literal| return std.mem.eql(u8, literal, other.StringLiteral),
            .Register => |byte| return byte == other.Register,

            .Colon, .Comma, .LBracket, .RBracket, .LParen, .RParen, .Hashtag, .Dot, .Eof => return true,
        }
    }
};

pub const Token = struct {
    kind: TokenKind,
    line: usize,

    pub fn eql(self: Token, other: Token) bool {
        if (self.line != other.line) return false;

        return self.kind.eql(other.kind);
    }
};

to clarify the output states that the error occured at index 0

2 Likes

The problem is that std.testing.expectEqualSlices uses std.meta.eql instead of your custom eql function internally on each item pair of the slices you give to it.

std.meta.eql only performs shallow field-by-field comparison, for slices that means it only compares pointer and length:

[...]
        .pointer => |info| {
            return switch (info.size) {
                .one, .many, .c => a == b,
                .slice => a.ptr == b.ptr and a.len == b.len,
            };
        },
[...]

So @squeek502 's guess was kind of correct, since &tests and tokens are completely separate instances it returns false.

std.testing.expectEqualDeep might work for you, which does compare slices by content recursively, or you could just loop over both slices and use your own eql function:

for (tokens, &tests) |expected, actual| {
    try std.testing.expect(expected.eql(actual));
}

You could also write a custom testingEql function that uses std.testing.expectEqual instead of == and std.testing.expectEqualSlices instead of std.mem.eql if you want nicer error messages on test failure.

1 Like

You said your second test fails in
try std.testing.expectEqualSlices but I don’t see a call to this function there (only in the first test)!?

As a side note, usually a lexer does not return the complete list of the tokens, instead it’s supposed to only give one token after the other, to avoid unnecessary memory usage. But maybe there are specific reasons in your case.

And the eql function for TokenKind is a bit sloppy, as it considers all void tags as equal.

Yeah I’m quite new to the language and the standard library is not very well documented so Ive been struggling a bit.

As for the reason the lexer pumps out all the tokens at once is because I was attempting to make sure the program had no lexical errors before being passed to the parser. Although changing the design wouldnt be too difficult.

Also as for the call to expectEqualSlice in the second test i had removed it temporarily to test my custom function and had forgotten to put it back :stuck_out_tongue:

Also do you have tips on how i could improve my eql function for Token and TokenKind? I would very much appreciate it.

okay thank you!