What if std.mem.indexOfScalarPos can handle array of array?

Sometimes I want to know if a string is present in an array of strings.

For example:

const my_fruits = [_][]const u8{
    "apple",
    "banana",
    "avocado",
};

const i_have_banana: bool = indexOfScalar([]const u8, &my_fruits, "banana") == null;

Today, it fails because std.mem.indexOfScalarPos try to compare array:

pub fn indexOfScalarPos(comptime T: type, slice: []const T, start_index: usize, value: T) ?usize {
    if (start_index >= slice.len) return null;

    var i: usize = start_index;

    // some low level optimisations I don't care about this time

    for (slice[i..], i..) |c, j| {
        if (c == value) return j; // here, the fail
    }
    return null;
}

What if instead it was this:

pub fn indexOfScalarPos(comptime T: type, slice: []const T, start_index: usize, value: T) ?usize {
    if (start_index >= slice.len) return null;

    var i: usize = start_index;


    // some low level optimisations I don't care about this time

    for (slice[i..], i..) |c, j| {
        if (@typeInfo(T) == .pointer and @typeInfo(T).pointer.size == .slice) {
            if (std.mem.eql(@typeInfo(T).pointer.child, c, value)) return j;
        } else {
            if (c == value) return j;
        }
    }
    return null;
}

With this modification, the code mentioned above works.

Do you think I can consider opening an issue and a pull request on GitHub, or there are some downsides and traps I didn’t see ?

I think in Zig, scalar means just the builtin types or types that are transparently represented as builtin types (packed structs, enum with backing types defined, etc.). I think that’s why indexOfScalarPos only uses the == operator since only the types that are considered ā€œscalarā€ have the == operator defined.

Since Zig doesn’t have operator overloading, there is no semantically robust way to say that 2 objects of the same type are ā€œequalā€. Consider what you are suggesting, what should happen when T has no well defined semantic for equality? Just comparing their bits might not make any sense.

Maybe if Zig recognizes a special function eql for defining how a type should define its equality, then we can have the indexOfScalarPos function uses that to check for equality.

1 Like

The Zig stdlib generally tries to avoid this kind of ā€˜magic’ overloading of functions with an otherwise simple/consistent definition. indexOfScalarPos fulfills its purpose, and if different functionality is needed, a different function should be used.

I think that if you want this feature, the most reasonable proposal would be to add a new function which finds the first value that passes an arbitrary comparison check. This would be implemented like so:

pub fn indexOfMatch(comptime T: type, slice: []const T, ctx: anytype, comptime matchFn: fn (@TypeOf(ctx), T) bool) ?usize {
    return indexOfMatchPos(T, slice, 0, ctx, matchFn);
}

pub fn indexOfMatchPos(comptime T: type, slice: []const T, start_index: usize, ctx: anytype, comptime matchFn: fn (@TypeOf(ctx), T) bool) ?usize {
    if (start_index >= slice.len) return null;

    for (slice[start_index..], start_index..) |c, j| {
        if (matchFn(ctx, c)) return j;
    }

    return null;
}

fn stringsEql(a: []const u8, b: []const u8) bool {
    return std.mem.eql(u8, a, b);
}

pub fn main() !void {
    const strings: []const []const u8 = &.{
        "ABC",
        "XYZ",
        "123",
    };

    const strings_to_find: []const []const u8 = &.{
        "AAA",
        "123",
        "XYZ",
        "1234",
    };

    for (strings_to_find) |string_to_find| {
        std.debug.print("\"{s}\": ", .{string_to_find});
        if (indexOfMatch([]const u8, strings, string_to_find, stringsEql)) |match_idx| {
            std.debug.print("found at idx={}\n", .{match_idx});
        } else {
            std.debug.print("couldn't find...\n", .{});
        }
    }
}

Note that I’ve never proposed a stdlib feature myself, so I don’t know the exact requirements to ā€˜justify’ a new feature. There might not be a lot of appetite for simple utilities like this.

1 Like

Note that this particular thing is usually an example of using the wrong data structure. A hash map (or array hash map) would likely be more appropriate (O(1) search instead of O(n)).

However, if you think searching within an array of strings is okay for your use case, then I’d suggest considering just putting the loop directly in your code. This has a potential side-benefit of allowing you to tailor the solution better to your specific use case (you may have alternate early exit conditions, etc, and writing out the loop forces you to think about that kind of stuff).

(note: this advice to write out the loop doesn’t apply to usage of indexOfScalarPos generally because of the ā€œlow level optimisationsā€ part of it, see this PR for one example)

3 Likes

I’d also like to point out that std.StaticStringMap can help here. It’s very handy.

3 Likes

Bingo. I just wanted to quote this so people can read it again.

3 Likes

Thanks for all your replies.

Because I don’t need performance and I am looking for a list of approximately ten items, I finally just wrote a simple loop in my code.

But I was interesting to have external advices on this question.