How does the compiler know two generic types are the same?

kj4tmp · November 19, 2024, 8:20pm

How does the compiler know two instances of a generic type (a type created by a function) are the same?

pub const MyStruct = struct {
    data: std.BoundedArray(u8, 100),

    pub fn init() MyStruct {
        return MyStruct{
            // How does the compiler know this is the same bounded array type as the struct field?
            .data = std.BoundedArray(u8,100){},
        };
    }
}

If this is only based on memoizing the function signature and args, couldn’t I defeat this by using a type generating function that depends on a random number generator?

IntegratedQuantum · November 19, 2024, 8:27pm

The compiler caches the result of comptime functions based on the input parameters. This is an important optimization to avoid having many duplicated of e.g. ArrayList(u8).

There are even easier ways to defeat this than random numbers. Since anonymous types are different each time, you can also just pass opaque{} in there. See for example C/C++ macro challenge #1: BOOST_PP_COUNTER - #4 by ianprime0509

Why would you want to defeat this though?

mnemnion · November 19, 2024, 9:40pm

I’ve thought about this, and related matters, quite a bit actually. Using memorization for comptime parameters has a lot of bang for buck, but I’ve come to believe that a deeper process is needed. I haven’t finished writing all of that up, though, or even thinking it through particularly.

Part of it all is that I don’t have a solid enough grasp of how comptime caching in type-returning functions works. There’s limits on how state can be kept around during comptime, but it isn’t clear to me that those limits are strict enough to guarantee that type-returning functions are deterministic on inputs, or how the compiler might detect the situation where that isn’t the case, if it even does.

I would just call this intended behavior. Might be useful for more than Ziggit challenges, even. Can’t think of how offhand, though.

But it should work insofar as it you make an opaque{} A and assign it to B, then TypeFunction(A) and TypeFunction(B) should return the same type. So passing a fresh opaque{} on each call is just a special case of passing different types to a type-returning function, and getting distinct types back in return.

Sze · November 19, 2024, 10:14pm

I think this post and topic is required reading to get a deeper grasp of how comptime works:

mlugg · November 19, 2024, 10:25pm

Misconception alert!

In modern Zig, function memoization is purely an optimization: it does not impact language semantics and will not be a part of the language specification.

Instead, the compiler determines whether types are equivalent by comparing two things:

Where are the types declared in source code?
What values do the type definitions “capture” – meaning, what is the value of any identifier used within the type’s definition but declared outside of it?

Here’s a simple example:

fn Foo(comptime val: anytype) type {
    const T = @TypeOf(val);
    return struct { x: T };
}

The struct declaration in this function captures the value of the local variable T, so whether two types Foo(x) and Foo(y) are equivalent depends on what T is in each case. For instance, Foo("hello") == Foo("world"), and Foo(1) == Foo(2), but Foo(1) != Foo(1.0).

mnemnion · November 19, 2024, 11:31pm

Right, I think the question here was how does the compiler know when memorization of parameters is or isn’t a valid optimization to apply. It is if the function is a deterministic function of its inputs, but Zig isn’t Haskell so that would need to be determined.

Validark · November 20, 2024, 2:42pm

The term is actually “Memoization”

mnemnion · November 20, 2024, 3:04pm

Memorization is the older term, and still used by a number of respectable authors, Roberto Ierusalimschy in particular. I prefer it.