Types and the Zig Programming Language

9 Likes

So cool, thanks for this!

It might be quite useful for the langref to mention the nominal vs structural types distinction, maybe, in a new Type System section that would precede all the type-specific sections like struct, union and the rest.

On a different note, inferred error unions (anyerror!T or just !T) definitely stand out as a uniquely implicit feature of an otherwise explicit Zig type system.

I usually try and name all my error sets, but it requires a separate effort, which you’re always allowed to neglect. As a result, it gets difficult to continue explicitly specifying error unions in your codebase whenever you’re using someone else’s packaged code that either doesn’t expose its error sets or doesn’t use error sets at all, while still returning plenty of various custom error values. I wonder if this situation can be improved.

Also, Voldemort types. “The ones that cannot be named”. Whoever came up with that is genius :joy:

2 Likes

In C++ we call them “Unutterable”

2 Likes

I don’t think that inferred sets are inherently implicit, as far as language analysis go. Today, ErrorSet is defined as

pub const ErrorSet = ?[]const Error;

and that means, during comptime evaluation, full error sets must be known. I think we could define it as

pub const ErrorSet = union(enum) {
    @"anyerror", // "dynamically typed" error
    someerror,   // some specific error set, which we don't know during Zir->Air conversion
   set: []const Error, // explicitly specified error set
};

during type-checking, we treat someerror more or less as anyerror, so that semantics of the function can not depend on specific error set. After the main analyses, when all comptime is evaluated and all functions are instantiated, we do a separate pass once again to fill in specific error sets, and flag any incorrect switches.

1 Like

Yeah, I agree. I guess “implicit” is misleading here, so I should’ve said that anyerror is “compile-time inferred” instead of “implicit”. But, in essence, anyerror is simply defined as the most general error set, which means that it matches any returned error value.

There is an issue with it, though, whenever it’s used as part of the error union return type anyerror!T or !T, because we actually want the most minimal error set to be written there. At the moment, we have to manually define a specific error set with all its values and then use it instead. I guess the solution could be that the compiler edits the source code and substitutes !T and anyerror!T with specific minimal error sets, like so: error{Value}!T. I’m not quite sure, but I guess “editing source code and reporting compiler errors” is what you meant by “filling in specific error sets and flagging any incorrect switches” during that additional analysis pass, right?

UPD: I guess what you meant was that the specific error sets that right now are explicitly specified as ?[]const Error would be resolved later from someerror to []const Error during that additional final pass. If so, I’m not sure I understand what that deferred inference solves :thinking:

1 Like

Yeah, this is a subtle point which only bothers IDE folks :slight_smile: Let me give a concrete litmus test:

const std = @import("std");
const assert = std.debug.assert;

fn f() !void {
   // Mystery! 
}

pub fn main() !void {
    f() catch |err| {
        comptime assert(@typeInfo(@TypeOf(err)).ErrorSet.?.len == 1);
    };
}

In today’s Zig, to analyze the main function, the compiler needs to analyze both the signature and the body of f. Depending on how many errors f can return, that comptime assert is either tripped or not.

So, when the compiler analyses main, it should stop to analyze first f signature (which is very fast, as it is very short) and then f’s body (this part could be quite slow, there might be a lot of code behind that Mystery!

In the alternative world I am suggesting,@typeInfo(@TypeOf(err)).ErrorSet would return .someerror regardless of the body of f. In that world, when the compiler analyses main, it needs to stop only to check f’s signature, and then proceed with main. The actual body of f could be analysed separately, later, or on a different thread even.

1 Like

Oh yeah, that’s cool!

More Sema parallelisation and micro-optimizations are definitely planned once all of the accepted proposals are implemented and the language stabilises a bit more. I guess such behind-the-scenes optimisations could even be explored post-1.0.