Clarification on comptime meta-programming

deckarep · May 16, 2024, 9:59pm

Hello,

I’ve got a few questions on comptime that I’m looking to gain a deeper/better understanding of:

The zig website says something like: “call any function at comptime”. Not to be pedantic but I feel like this is actually not true. For example if you invoke a function at comptime that makes use of system calls it won’t work correctly right? It’s my understanding that comptime functions need to largely be pure and deterministic.
It’s not entirely clear to me but is it true that “anytype” is a comptime evaluated marker? I was trying to create a struct that has a field that is a callback. The callback definition had one argument marked as “anytype” because I’m trying to support “a variadic style” argument. I was not able to do this because the compiler said something about it failing to evaluate the code at comptime. I tracked it down to “anytype” being the culprit here. I can’t seem to track down docs for “anytype”.
If it’s true that comptime functions must be pure. I still feel like it’s reasonable for example to want to use a hashmap (or other datastructure) at comptime. The problem though is that when you pass in an allocator if it’s internally using syscalls to grab some heap memory this is a problem. Is there a way to still make this work? I saw something online about the possibility of a comptime allocator for such purposes. What I’d like to do is have a function that can process some data using a datastructure, and build some kind of final, static result set at comptime. Then, at runtime my program uses the final result set.

Thanks to anyone who’s willing to shed some light!

twhague · May 16, 2024, 10:23pm

For #2, there’s this: What is anytype?

Of note is that it is indeed comptime, it uses duck typing logic by looking at how you use the type, and the compiler will generate specialized code.

AndrewCodeDev · May 16, 2024, 11:46pm

Yes, it’s technically not true. It needs to be information that the compiler can compute directly because it’s actually building your program at the same time. If you look into how the stuff actually get created (a good place is to peek at the intern pool), it will make more sense as to why it happens that way: zig/src/InternPool.zig at master · ziglang/zig · GitHub

Allocators at comptime is a weird subject. Let’s take a basic example and see why this is a strange idea (it’s a dumb example so forgive me):

const std = @import("std");

fn decode_array_size(comptime string: []const u8) usize {
    // toy decoder - just add one everytime we see the letter 'a'
    comptime {
        var n: usize = 1;
        for (string) |char| {
            if (char == 'a') n += 1;
        }
        return n;
    }
}

// later...
const n = comptime decode_array_size("aaaaa");
// notice this line - how is this possible?
const array: [n]usize = undefined;

So here I am calculating an array size from some highly arbitrary process and then instantiating it from that size. Note that this is a kind of allocation. I’d think of that an “afforded” allocation. We’re not calling an allocation function (like alloc), but comptime affords that we can do this.

In fluent, we create whole data structures that branch in the type system. We could have done them as data members, but essentially there’s no need for explicitly creating an allocator interface because the nested structures can be as large as we can deduce them to be: Fluent/fluent.zig at main · andrewCodeDev/Fluent · GitHub

Due to that, it’s a bit… redundant… to use an allocator interface at comptime. The ability to make arbitrarily sized things is already afforded. Please note that I’m not saying something like “there will never be a reason to do that” - I’m definitely not. I’m saying that many of the things you can think of that require allocation can actually be handled through comptime programming inherently.

On the issue of purity… I see why you’re thinking that, but that’s not quite right. Purity is not a constraint… here’s an example:

const std = @import("std");

const Thing = struct {
    index: usize = 0,
    fn foo(comptime self: *Thing) void {
        if (comptime self.index < 3) {
            self.index += 1;
            return;
        }
        @compileError("Reached the end");
    }
};

pub fn main() void {
    comptime {
        var thing: Thing = .{};
        thing.foo(); // fine...
        thing.foo(); // fine...
        thing.foo(); // fine...
        thing.foo(); // error
    }
}

Here we can see that foo has side effects, does not have the same behavior per input, and is run at comptime.

LucasSantos91 · May 17, 2024, 1:06am

Yes, pure functions only.

Yes, anytype is a syntatic sugar for this:

fn f(arg: anytype) void => fn f(comptime Arg: type, arg: Arg) void

And when you call it, it does this:

f(arg) => f(@TypeOf(arg), arg)

If you used an allocator that made syscalls, indeed, that would fail. Zig’s goal is that It should still be possible to call an allocator that doesn’t make syscalls, like FixedBufferAllocator. This would decrease the barriers between runtime programming and comptime programming, avoiding code duplication. However, the Allocator interface currently uses @intFromPtr and @ptrFromInt, which are forbidden at comptime, so it will fail even in the cases where the allocator implementation itself would be valid at comptime. The proposal you mentioned about a comptime allocator is trying to tackle this. There are many ways to go about this, maybe change the interface and move these forbidden functions into the implementation itself, so that the Allocator interface is amenable to comptime, or maybe create a second allocator interface, and then functions would take their allocators as anytype. Anyway, this is an open problem.

AndrewCodeDev · May 17, 2024, 1:22am

Hey @LucasSantos91, I’m curious about your definition of pure here. I’m checking my understanding of pure and I may be wrong here.

When you say pure, are you saying that it cannot modify things outside the function scope through pointers? We can certainly do that at comptime, but it is technically true that the compiler can see the changes between calls but that might qualify as pure technically, so I may be wrong about that.

For instance, comptime var can be mutated from the function scope if it’s passed in via pointer, but it looks like some people don’t count things like memcpy as changing purity.

deckarep · May 17, 2024, 1:30am

Ah, that was one of my questions that I forgot to ask about. Actually, I wrongfully assumed that a FixedBufferAllocator would work at comptime but it’s good to know that was a bad assumption.

Hopefully there is room for a comptime allocator that would allow for using some of the more involved datastructures at comptime.

In my mind…using something like a HashMap to dedup some list of N type objects is a perfectly valid use case. A working comptime allocator would sort of unlock advanced use cases like this.

AndrewCodeDev · May 17, 2024, 1:34am

I see no reason why a non-standard compliant stack allocator couldn’t be created, but then you run into the problem with communicating it between standard data structures. It’s an interesting problem.

I’d suggest opening brainstorming topic about that so we can play around with some options here.

LucasSantos91 · May 17, 2024, 3:27am

You’re right, I didn’t think it so deeply. By pure, I meant functions that have no side-effects, but I forgot that modifying a comptime var through a pointer is legal. I don’t actually know if such a function could still be considered “pure”. One of the greatest advantages of pure functions is that they allow memoization. I was under the impression that
Zig always memoized functions called at comptime. But now that you mention it, I’m curious how Zig handles memoization of functions with pointer arguments at comptime.

AndrewCodeDev · May 17, 2024, 3:29am

That’s a really good explain topic candidate.

deckarep · May 17, 2024, 4:12am

Hey all,

I appreciate the really thorough responses. I’ve been reading these comments over and over a few times and it’s helping solidify my knowledge and fill in some gaps.

The label “pure” functions may not be applicable…there’s other languages like Starlark and Babel (if I’m remembering correctly) that are designed to be hermetic.

Hermetic functions (or hermetic methods) are pieces of code that can modify only local state (including its arguments) to store the result of computation

They can’t make system calls, but they can call other hermetic functions. Any recursion must be finite, and they can’t call user-supplied callbacks.

Perhaps that’s what we’re talking about here or at least a more accurate definition vs claiming that comptime only allows for pure/side-effect free invocations which does not seem to be the case.

dee0xeed · May 17, 2024, 8:14am

That’s the thing that has been nagging at me since I had seen this “call any function at comptime”.

if we really can do any (or at least some) syscalls at comptime… does it make sense at all?!? And this would be extremely dangerous - what if a compiler is running as root?..
if we cannot, then “call any function at comptime” is a lie and I would add “(except system calls)” to this sentence in order to not confuse people

neurocyte · May 17, 2024, 9:46am

I think the actual rule is no syscalls, no extern function calls and no inline asm, though I’ve never personally checked that last one. In other words, you can call any function written only in zig that calls only other functions written in zig.

So the statement on the homepage is mostly true, at least if you consider that the page is specifically talking about zig. But, I guess it could be made more accurate by specifying that this only applies to plain zig and does not include any kind of FFI.

chung-leong · May 17, 2024, 10:10am

Version 0.12.0 introduced many restrictions on comptime pointers. Any structure that contains pointers to comptime vars is effectively no longer comptime known. Consider the following code:

const std = @import("std");

const TypeEntry = struct {
    T: type,
    id: comptime_int,
};
const TypeDatabase = struct {
    entries: []TypeEntry,
    count: comptime_int,

    fn init() @This() {
        comptime var entries: [128]TypeEntry = undefined;
        return .{
            .entries = &entries,
            .count = 0,
        };
    }

    fn getId(comptime self: @This(), comptime T: type) comptime_int {
        return inline for (0..self.count) |index| {
            if (self.entres[index].T == T) {
                break self.entres[index].id;
            }
        } else 0;
    }
};

fn a(comptime tdb: TypeDatabase) void {
    std.debug.print("{any}\n", .{tdb.getId(u32)});
}

fn b(comptime tdb: TypeDatabase) void {
    @compileLog(tdb.getId(u32));
}

pub fn main() void {
    const tdb = TypeDatabase.init();
    a(tdb);
}

This does not work any more in 0.12.0. You’ll get a frustrating “runtime value contains reference to comptime var” error. But why? tdb is clearly marked as comptime after all. The problem here is, of course, the pointer. Its presence means tdb might not be comptime invariant in the future, after incremental compilation has been implemented. So while the compiler definitely knows the value of tdb, it is not allowing you to use it in a context requiring a comptime known value.

In the example, the calling of a() leads to generation of runtime code. The comptime argument has to be comptime known. Hence the code wouldn’t compile in 0.12.0. b() can be called, on the other hand, because it’s a comptime-only function.

So what you said about comptime functions needing to be pure is not entirely off. The restriction is applicable in situations where comptime variables would change the behavior of runtime code.

dee0xeed · May 17, 2024, 10:53am

What if entire OS is written in Zig?

dee0xeed · May 17, 2024, 11:07am

One way or the other, “any” means “any”. std.posix.write() (for instance) is also a function, but you can not call at comptime just because it’s return value can not be known at comptime.

Sze · May 17, 2024, 12:32pm

I think instead of putting comptime vars within structures you can push the comptime vars towards the “edge” of the program (instead of within the datastructures).
So I think you are right about the structure not being able to contain pointers to comptime vars, however I think this particular example can be rewritten like this:

const std = @import("std");

const TypeEntry = struct {
    T: type,
    id: comptime_int,
};
const Invalid = opaque {};
const TypeDatabase = struct {
    const Amount = 128;
    entries: [Amount]TypeEntry = [1]TypeEntry{.{ .T = Invalid, .id = 0 }} ** Amount,
    count: comptime_int,

    fn init() @This() {
        return .{
            .count = 1,
        };
    }

    fn add(self: *TypeDatabase, comptime T: type) void {
        const id = self.count;
        self.count += 1;
        self.entries[id] = .{ .T = T, .id = id };
    }

    fn getId(self: TypeDatabase, comptime T: type) comptime_int {
        return inline for (0..self.count) |index| {
            if (self.entries[index].T == T) {
                break self.entries[index].id;
            }
        } else 0;
    }
};

fn a(comptime tdb: TypeDatabase) void {
    std.debug.print("{any}\n", .{tdb.getId(u32)});
    std.debug.print("{any}\n", .{tdb.getId(u16)});
}

fn b(comptime tdb: TypeDatabase) void {
    @compileLog(tdb.getId(u32));
    @compileLog(tdb.getId(u16));
}

pub fn main() void {
    comptime var tdb = TypeDatabase.init();
    comptime {
        tdb.add(u32);
        tdb.add(u16);
    }
    a(tdb);
}

LucasSantos91 · May 17, 2024, 12:47pm

The main problem is not the language itself, it’s the information that is available at comptime. If the OS is written in Zig, but the interface still uses dynamically loaded functions or functions with global state, we would have the same problems. Also, functions added through object files are also forbidden at comptime, even though they are tecnically available during compilation. The problem with these is that they get sent directly to the backend, so Zig doesn’t actually look at them, and can’t guarantee the comptime invariants.

dee0xeed · May 17, 2024, 1:07pm

This is exactly what I meant by my retortive question.

chung-leong · May 17, 2024, 2:30pm

I’ve detailed how I dealt with the issue in this thread . The trick to using comptime pointers go as follows:

Use them in comptime code only (no intermixing with runtime code)
Generate a pointer-free structure from your pointer-bearing structure when your comptime code is done (dynamically define a struct with a fixed-length array based on the length of a slice)
Duck-type the comptime argument in your runtime function (i.e. use anytype)

chung-leong · May 17, 2024, 3:20pm

The post-0.12.0 comptime system is just far too strict, in my opinion. After incremental compilation has been implemented the issue should be revisited. We should be able to declare certain comptime pointers as volatile. Any code that touches such a pointer would be tainted as comptime volatile and will always be recompiled. Guaranteed execution of comptime code paths would then make structures containing such pointers comptime known again.