std.testing.expectEqualSlices i think could have a bug

CipherPower · September 28, 2025, 8:34pm

I have written a lexer for an assembly like language in zig. i have also implemented a .eql() function for my tokens to make sure that they are the same.

This is my test file:

const std = @import("std");
const testing = std.testing;
const assert = testing.expect;
const allocator = testing.allocator;

const TokenKind = @import("token").TokenKind;
const Token = @import("token").Token;

const L = @import("lexer");
const Lexer = L.Lexer;

fn assert_eq(slice1: []const Token, slice2: []const Token) !void {
    if (slice1.len != slice2.len) return error.TestUnexpectedResult;

    for (slice1, 0..) |_, i| {
        if (slice1[i].eql(slice2[i])) {
            continue;
        } else {
            return error.TestUnexpectedResult;
        }
    }
}

test "test single character tokens" {
    const input: []const u8 = "[] () .,: #";

    var lexer = Lexer.init(allocator, input);
    defer lexer.deinit();

    const tests = [_]Token{
        .{ .kind = .LBracket, .line = 1 },
        .{ .kind = .RBracket, .line = 1 },
        .{ .kind = .LParen, .line = 1 },
        .{ .kind = .RParen, .line = 1 },
        .{ .kind = .Dot, .line = 1 },
        .{ .kind = .Comma, .line = 1 },
        .{ .kind = .Colon, .line = 1 },
        .{ .kind = .Hashtag, .line = 1 },
        .{ .kind = .Eof, .line = 1 },
    };

    try lexer.tokenize();

    try assert(lexer.errors.items.len == 0);

    try testing.expectEqualSlices(Token, &tests, lexer.tokens.items);
}

test "test whitespace and newlines" {
    const input: []const u8 =
        \\ADD
        \\
        \\HALT
    ;

    var lexer = Lexer.init(allocator, input);
    defer lexer.deinit();

    const tests = [_]Token{ .{ .kind = .{ .Identifier = "ADD" }, .line = 1 }, .{ .kind = .{ .Identifier = "HALT" }, .line = 3 }, .{ .kind = .Eof, .line = 3 } };

    const tokens = try lexer.getTokens();

    try assert(lexer.errors.items.len == 0);

    try assert_eq(tests[0..], tokens);
}

I have checked before that the .eql function works between tokens. Then during this call specifically in the second test for the lexer

try std.testing.expectEqualSlices(Token, &tests, tokens);

i get this output:

============ expected this output: =============  len: 3 (0x3)

[0]: .{ .kind = .{ .Identifier = { 65, 68, 68 } }, .line = 1 }
[1]: .{ .kind = .{ .Identifier = { 72, 65, 76, 84 } }, .line = 3 }
[2]: .{ .kind = .{ .Eof = void }, .line = 3 }

============= instead found this: ==============  len: 3 (0x3)

[0]: .{ .kind = .{ .Identifier = { 65, 68, 68 } }, .line = 1 }
[1]: .{ .kind = .{ .Identifier = { 72, 65, 76, 84 } }, .line = 3 }
[2]: .{ .kind = .{ .Eof = void }, .line = 3 }

================================================

the function states that the two slices are identical, yet it fails. Is this a fault on my end?

squeek502 · September 28, 2025, 9:08pm

Could you share the definition of Token (and the definitions of any dependent types)?

I believe you also truncated the output. There should also be a line stating which index the difference was found at.

EDIT: My guess is that your eql function is not comparing .Identifier correctly, you’re likely comparing pointer values instead of using std.mem.eql.

CipherPower · September 28, 2025, 9:30pm

Sure here is my token definition:

const std = @import("std");

const Directive = enum { text, data };

pub fn strToDirective(str: []const u8) ?Directive {
    return std.meta.stringToEnum(Directive, str);
}

pub const TokenKind = union(enum) {
    Identifier: []const u8,
    Directive: Directive,
    Integer: i32,
    StringLiteral: []const u8,
    Register: u8,
    Dot,
    Colon,
    Comma,
    LBracket,
    RBracket,
    LParen,
    RParen,
    Hashtag,
    Eof,

    pub fn eql(self: TokenKind, other: TokenKind) bool {
        switch (self) {
            .Identifier => |ident| return std.mem.eql(u8, ident, other.Identifier),
            .Directive => |directive| return directive == other.Directive,
            .Integer => |num| return num == other.Integer,
            .StringLiteral => |literal| return std.mem.eql(u8, literal, other.StringLiteral),
            .Register => |byte| return byte == other.Register,

            .Colon, .Comma, .LBracket, .RBracket, .LParen, .RParen, .Hashtag, .Dot, .Eof => return true,
        }
    }
};

pub const Token = struct {
    kind: TokenKind,
    line: usize,

    pub fn eql(self: Token, other: Token) bool {
        if (self.line != other.line) return false;

        return self.kind.eql(other.kind);
    }
};

to clarify the output states that the error occured at index 0

Justus2308 · September 28, 2025, 10:05pm

The problem is that std.testing.expectEqualSlices uses std.meta.eql instead of your custom eql function internally on each item pair of the slices you give to it.

std.meta.eql only performs shallow field-by-field comparison, for slices that means it only compares pointer and length:

[...]
        .pointer => |info| {
            return switch (info.size) {
                .one, .many, .c => a == b,
                .slice => a.ptr == b.ptr and a.len == b.len,
            };
        },
[...]

So @squeek502 's guess was kind of correct, since &tests and tokens are completely separate instances it returns false.

std.testing.expectEqualDeep might work for you, which does compare slices by content recursively, or you could just loop over both slices and use your own eql function:

for (tokens, &tests) |expected, actual| {
    try std.testing.expect(expected.eql(actual));
}

You could also write a custom testingEql function that uses std.testing.expectEqual instead of == and std.testing.expectEqualSlices instead of std.mem.eql if you want nicer error messages on test failure.

hvbargen · September 28, 2025, 10:27pm

You said your second test fails in
try std.testing.expectEqualSlices but I don’t see a call to this function there (only in the first test)!?

As a side note, usually a lexer does not return the complete list of the tokens, instead it’s supposed to only give one token after the other, to avoid unnecessary memory usage. But maybe there are specific reasons in your case.

And the eql function for TokenKind is a bit sloppy, as it considers all void tags as equal.

CipherPower · September 28, 2025, 10:36pm

Yeah I’m quite new to the language and the standard library is not very well documented so Ive been struggling a bit.

As for the reason the lexer pumps out all the tokens at once is because I was attempting to make sure the program had no lexical errors before being passed to the parser. Although changing the design wouldnt be too difficult.

Also as for the call to expectEqualSlice in the second test i had removed it temporarily to test my custom function and had forgotten to put it back

Also do you have tips on how i could improve my eql function for Token and TokenKind? I would very much appreciate it.

CipherPower · September 28, 2025, 10:37pm

okay thank you!