Comptime capture

dude_the_builder · June 24, 2024, 12:58pm

Playing around with a lexer, I have a helper function:

fn consumeIf(predicate: fn (u8) bool) ?u8 {
    if (peek()) |b| {
        if (predicate(b)) {
            _ = advance();
            return b;
        }
    }

    return null;
}

which allows this type of logic:

while (consumeIf(isIdentByte)) ...

But then I needed a predicate to match against a specific byte. I came up with this and was pleasantly surprised it worded.

fn is(b: u8) fn (u8) bool {
    return struct {
        fn predicate(c: u8) bool {
            return c == b;
        }
    }.predicate;
}

This only works with comptime known byte arguments like:

consumeIf(is('z'))

So from what I understand, this is possible precisely because b is comptime known and thus can be “captured” as part of the body of predicate in the anonymous struct. Is this correct?

If this is the case, I think the more accurate syntax would be:

fn is(comptime b: u8) ...

but the compiler isn’t requiring this at the moment.

Sze · June 24, 2024, 1:07pm

fn (u8) bool is a function body type, those are comptime only, so the is function is also comptime only, because its return value can’t exist at run time.

I am not sure whether I would call it more accurate, I think it is more explicit and fails earlier, which could be helpful in this case (because the function can’t be used at run time anyway).

mlugg · June 24, 2024, 11:56pm

[…] but the compiler isn’t requiring this at the moment.

It kind of can’t. Conditional execution and the design of the compiler mean that whether a comptime annotation is necessary has to be determined by a combination of simple AST analysis, and semantic checks at function analysis time. Technically, we could introduce a rule that requires a comptime annotation in this specific case, by saying that if a type syntactically closes over x, then x must be explicitly marked comptime where applicable. However, this rule has some weird consequences: if you consider a comptime-only function (e.g. returning type), this would require you to write comptime var instead of var everywhere, even though the code can semantically only be called at comptime. We could introduce another special case to defer that error until the function is determined to have a non-comptime return type, but there are probably more subtle exceptions, and at this point the rules start getting really, really complicated. So, we take a step back and ask: what would be practically gained from such a rule? The answer, in short, is very little: you’re just substituting one compilation error for another. Nothing in Zig today stops you from marking that parameter as comptime, and indeed, you probably should, even if just for documentation: but not having the marker still works perfectly well. The idea here is somewhat analagous to that of duck typing: try to evaluate the function, and if you’ve done something wrong, you’ll wind up getting an error anyway.