Implementing Generic Concepts on Function Declarations

AndrewCodeDev · October 6, 2023, 12:15am

Right, it’s all just thinking out-loud at this moment

Another thing the boolean composition allows is multiple traits in a single statement. I’d have to have that capability to be sold on an idea.

permutationlock · October 6, 2023, 12:16am

The library allows multiple traits in a single statement:

trait.implementsAll(.{ Trati1, Trat2, Trait3 }).assert(T);

or

trait.implements(Trait1).implements(Trait2).assert(T);

AndrewCodeDev · October 6, 2023, 12:17am

What about in the case of:

fn myGenericFunction(comptime T: type, comptime U: type) Contract(...
    Verify T, Verify U

permutationlock · October 6, 2023, 12:28am

For testing purposes I already had a check function that verifies a constraint and returns a ?[]u8 maybe error string without producing a compile error. I just tried this on my local copy and it seems to work:

pub fn Contract(comptime results: anytype, comptime T: type) type {
    for (results) |maybe_result| {
        if (maybe_result) |result| {
            @compileError(result);
        }
    }
    return T;
}

pub fn Returns(comptime T: type) type { return T; }

Then we could do

fn myGenericFunction(comptime T: type, comptime U: type) Contract(
    .{
        trait.implements(MyTrait).check(T),
        trait.implements(MyOtherTrait).check(U)
    },
    Returns(MyReturnType)
) {
    // function body
}

Still doesn’t look great, but I can see something here maybe. Definitely needs to be less verbose.

booniepepper · October 6, 2023, 6:42am

There are some interesting examples in the std library. For example, std.mem.readIntNative(...) requires bit count of a type to be a multiple of 8:

github.com

ziglang/zig/blob/e6590fea19e3eab94b35bfd3c36e29b53cefcaaf/lib/std/mem.zig#L1586


      
                  .signed => return @as(T, @intCast((@as(iN, @bitCast(int)) << pad) >> pad)),
                  .unsigned => return @as(T, @intCast((@as(uN, @bitCast(int)) << pad) >> pad)),
              }
          }
          
          /// Reads an integer from memory with bit count specified by T.
          /// The bit count of T must be evenly divisible by 8.
          /// This function cannot fail and cannot cause undefined behavior.
          /// Assumes the endianness of memory is native. This means the function can
          /// simply pointer cast memory.
          pub fn readIntNative(comptime T: type, bytes: *const [@divExact(@typeInfo(T).Int.bits, 8)]u8) T {
              return @as(*align(1) const T, @ptrCast(bytes)).*;
          }
          
          /// Reads an integer from memory with bit count specified by T.
          /// The bit count of T must be evenly divisible by 8.
          /// This function cannot fail and cannot cause undefined behavior.
          /// Assumes the endianness of memory is foreign, so it must byte-swap.
          pub fn readIntForeign(comptime T: type, bytes: *const [@divExact(@typeInfo(T).Int.bits, 8)]u8) T {
              return @byteSwap(readIntNative(T, bytes));
          }

This gives me the idea that these constraints could also be implemented in the parameters. (Instead of in the return position only)


fn isU8(comptime T: type) type {
    if (T != u8) @compileError("hey it should be a U8 here!");
    return T;
}

pub fn wacky(comptime T: isU8(anytype)) void {
    return true;
}

test "let's get wackier" {
    wacky(u8); // I'm fine

    wacky(u32); // Compile error
}

The above is just a crazy sketch though. It’s not valid Zig right now. anytype is a keyword and not an actual type value that you can pass around.

There’s a small drawback that when it’s not in return position, it stops looking like a beautiful Dafny-esque ensures statement.

AndrewCodeDev · October 6, 2023, 8:51am

That’s a great example from the standard library.

I’ve thought about that syntax too. It could be really cool if a unary function that takes a comptime type as a parameter and returns a type could be used as a constraint. Then, the type of the thing would be implicitly passed to the unary function. Something like f(T) -> T and f(T) could be customized to return compile errors (like your example). It would have to be an identity function though where the parameter and the return type would need to be the same. Otherwise, if we did f(T) -> U, it would simply not make sense because T was the actual argument type. If this could be figured out, then it would hopefully become something like:

pub fn wacky(comptime T: isU8) void {
    return true;
}

Either way, this would bring it closer to the “terse syntax” (which we really do need to be terse to avoid the boilerplate, ultimately).

That said, I think your example more explicitly says “this needs to be deduced” so I like it better for that reason.

permutationlock · October 6, 2023, 6:20pm

booniepepper:

fn isU8(comptime T: type) type {
    if (T != u8) @compileError("hey it should be a U8 here!");
    return T;
}

pub fn wacky(comptime T: isU8(anytype)) void {
    return true;
}

From all the issue threads relating to concepts and improving anytype, it seems that changes very similar to this were proposed and pretty universally rejected by the Zig team. Unless you have a reason that this version would not raise the same concerns?

AndrewCodeDev · October 6, 2023, 7:30pm

At this moment in time, you’re probably correct - this idea won’t get accepted. As for the future? Who knows, but the two issues I posted about changing anytype or introducing infer have been open for a long time (the one suggesting to introduce infer has been open for 2 years now and was referenced 3 weeks ago).

booniepepper · October 6, 2023, 8:47pm

To be clear I don’t want to propose that anytype becomes a type, I don’t even know how to begin thinking about the ramifications of it. I’m just brainstorming ways to do constrained types on parameters.

(Caveats: I think I might be getting away from “generic concepts” with this. Also: Putting these in the return position is relatively clean looking IMO, especially if the restrictions should be composable)

Here’s another sketch that looks kinda ok with parameters:

pub fn firstRow(comptime T: type, tensor: Tensor2D(T)) Inner(T) {
    return tensor[0];
}

/// Requires one-or-more dimensions.
pub fn Tensor2D(comptime T: type) type {
    return switch (@typeInfo(T)) {
        .Array => |a| switch (a.child) {
            .Array => T,
            else => @compileError("Expected two-or-more dimension, found only one"),
        },
        else => @compileError("Expected two-or-more dimensions, found scalar"),
    };
}

pub fn Inner(comptime T: type) type {
    return switch (@typeInfo(T)) {
        .Array => |a| a.child,
        else => @compileError("Expected two-or-more dimensions, found scalar"),
    };
}

permutationlock · November 23, 2023, 8:42pm

Resurrecting this to see what people think of the generic interface convention in my tiny Zimpl library. It achieves a very limited version of “generic concepts on function declarations.”

The idea is to provide clarity and avoid relying on duck typing by taking a separate parameter that contains all the necessary “member functions” and type data. Part of the std.io.Reader interface translated to this style is below as an example.

pub fn Reader(comptime Type: type) type {
    return struct {
        ReadError: type = error{},
        read: fn (reader_ctx: Type, buffer: []u8) anyerror!usize,
    };
}

pub inline fn read(
    reader_ctx: anytype,
    reader_impl: Reader(@TypeOf(reader_ctx)),
    buffer: []u8,
) reader_impl.ReadError!usize {
    return @errorCast(reader_impl.read(reader_ctx, buffer));
}

pub inline fn readAll(
    reader_ctx: anytype,
    reader_impl: Reader(@TypeOf(reader_ctx)),
    buffer: []u8,
) reader_impl.ReadError!usize {
    return readAtLeast(reader_ctx, reader_impl, buffer, buffer.len);
}

pub inline fn readAtLeast(
    reader_ctx: anytype,
    reader_impl: Reader(@TypeOf(reader_ctx)),
    buffer: []u8,
    len: usize,
) reader_impl.ReadError!usize {
    assert(len <= buffer.len);
    var index: usize = 0;
    while (index < len) {
        const amt = try read(reader_ctx, reader_impl, buffer[index..]);
        if (amt == 0) break;
        index += amt;
    }
    return index;
}

The issue now is that calling such functions is verbose and clunky, even in cases like Reader where the interface only has two fields.

test {
    var buffer: [19]u8 = undefined;
    var file = try std.fs.cwd().openFile("my_file.txt", .{});
    try io.readAll(
        file,
        .{
            .read = std.fs.File.read,
            .ReadError = std.fs.File.ReadError,
        },
        &buffer
    );

    try std.testing.expectEqualStrings("Hello, I am a file!", &buffer);
}

My attempt at a solution is the Impl function: Impl(Type, Reader) is a struct with the same fields as Reader(Type) but with the default value of each field set to be the declaration of Type of the same name, if such a declaration exists^[1].

Replacing Reader(@TypeOf(reader_ctx)) with Impl(@TypeOf(reader_ctx), Reader) everywhere in the above example lets us default construct the reader_impl parameter for std.fs.File.

test {
    var buffer: [19]u8 = undefined;
    var file = try std.fs.cwd().openFile("my_file.txt", .{});
    try io.readAll(file, .{}, &buffer);

    try std.testing.expectEqualStrings("Hello, I am a file!", &buffer);
}

I’ve been enjoying this style because it provides type requirements for anytype parameters in function signatures while remaining simple and feeling similar to other Zig patterns. It has very limited power and thus encourages simple uses of generics, which also feels in line with Zig.

Technically, it “unwraps” Type first so that pointer/optional types will work too, see the readme. ↩︎