Can you select between a set of comptime types at runtime?

spicydll · June 1, 2024, 4:55pm

I’m writing an interpreter for BrainF**k that has support for combining multiple cells for certain operations.

For example if you wanted to increment the current cell as a byte, you would simply send “+”. However, if you wanted to combine 4 cells into a 32 bit integer, you can use a modifier like “4+”.

The way I have attempted to implement this is through comptime types. The idea is that I would simply switch between a set of types in my interpreter loop and pass them to the operation functions.

// NOTE: This code does not compile
memory: [30000]u8,
ptr: usize,

//...
//load_int() is similar
fn store_int(self: *Self, int_type: comptime_int, value: int_type) void {
    if (int_type == u8) {
        self.memory[self.ptr] = value;
    } else {
        const size = @sizeOf(int_type);
        var bytes: [size]u8 = undefined;
        std.mem.writeInt(int_type, &bytes, value, .little);
        for (0..size) |idx| {
            var offset = self.ptr + idx;
            if (offset >= self.memory.len) {
                offset -= self.memory.len;
            }
            self.memory[offset] = bytes[idx];
        }
    }
}

fn increment(self: *Self, int_type: comptime_int) void {
    var value: int_type = self.load_int(int_type);
    value = @addWithOverflow(value, 1)[0];
    self.store_int(int_type, value);
}

//...

pub fn interpret(self: *Self, code: []const u8) {
    var int_type = u8; // Type selector (doesn't compile)
    var code_ptr: usize = 0;
    var mod_on = false;
    // ...
    while (code_ptr < code.len) : (code_ptr += 1) {
        // "Command" is an enum
        const cur_command = Command.from_char(code[code_ptr]);
        //...
        switch (cur_command) {
            //...
            .Mod2 => int_type = u16, // Handle modifiers
            .Mod4 => int_type = u32,
            .Mod8 => int_type = u64,
            .Increment => self.increment(int_type),
            //...
        }
        if (int_type != u8) {
            mod_on = !mod_on;
            if (!mod_on) {
                int_type = u8;
            }
        }
    }
}

However, this fails to compile with an error similar to this (compiler version 0.12.0):

src/root.zig:157:23: error: variable of type 'comptime_int' must be const or comptime
        var int_type: comptime_int = u8;
                      ^~~~~~~~~~~~
src/root.zig:157:23: note: to modify this variable at runtime, it must be given an explicit fixed-size number type

Upon seeing this, I realized I will probably need to have to refactor all my functions like increment and store_int to take something like a usize or an enum instead of comptime_int. However, I am still curious if there is a way to select between a set of comptime types at runtime like this.

In addition, if it isn’t currently possible, I’m curious if you think the language should support this. I would think the compiler could generate an enum to represent each type that’s used and use that to resolve the types at runtime. However, maybe I’m missing something that would make this a bad idea.

AndrewCodeDev · June 1, 2024, 5:02pm

Hey @spicydll, welcome to Ziggit.

I think there is a misunderstanding here that we should probably iron out first before continuing forward. This line from the compile error:

 var int_type: comptime_int = u8;

comptime_int is a type that holds a value like 0 or 42 - what you’re attempting to do here is assign something like u8 which is a type of integer, not an integer value.

In C, this would be like saying:

long x = int;

If you want to capture the type, you can do it like so:

fn increment(self: *Self, int_type: type) void { //...

Because the type is different than the value (I’m sure you know this, but I think we’re just reaching for the wrong tool here) and comptime_int carries a value (again, like 12 or something).

I have other thoughts but I’ll hold off until we get this part straightened out

spicydll · June 1, 2024, 5:14pm

That makes a lot of sense. I refactored my code, hopeful that maybe this might work, but I still receive a compiler error. This one is much more helpful though:

src/root.zig:157:23: error: variable of type 'type' must be const or comptime
        var int_type: type = u8;
                      ^~~~
src/root.zig:157:23: note: types are not available at runtime

Thanks for the clarification. I’m ready for your other thoughts now

AndrewCodeDev · June 1, 2024, 5:22pm

Certainly, so there’s a few more concepts I’m going to encourage you to explore - using the comptime keyword for blocks and execution.

comptime var x: T = ... // variable at comptime

comptime {
   // comptime run scope...
}

const x = comptime foo(); // run foo at comptime...

And const can carry over to comptime quite easily. The issue here is that types needs to be resolved at comptime, but in this case:

var int_type: whatever = ...

Is being run at runtime.

I probably need to make a doc about this so we can have better reference material, but it looks like you’re still figuring out how to use comptime programming and we just need to take a step back and get the fundamentals first.

Edit: In fact, I’m going to make that doc right now and I’ll link it here when I’m done. Hopefully it will help!

AndrewCodeDev · June 1, 2024, 6:01pm

Alright, so I started to write a doc, but I decided to hold off because it’s probably best to start with the official docs first - here’s the link: Documentation - The Zig Programming Language

Take a look at that and tell me if there’s anything about comptime that’s still confusing so we can work on getting the picture straightened out. I’m trying to find a productive way to respond but I still think there’s some missing pieces here.

andrewrk · June 1, 2024, 9:33pm

Apologies if this is not helpful; I’m responding to the title without reading the body.

You can do this with inline switch prongs.

Switch on a runtime value, then inline switch prongs turn that runtime value into a comptime value, which you can then use in a type expression. Be warned that this can generate a large amount of machine code, by generating a separate switch prong for each possible runtime value.

spicydll · June 2, 2024, 5:58pm

Ok so I’ve been cooking. I was able to figure most of this out using tagged unions and inline switch prongs (thanks @andrewrk). It definitely wasn’t immediately intuitive from the docs but eventually I pieced it together… mostly.

I’m gonna talk about what I did, the issues I still have, and my journey going through current documentation.

tl;dr: I didn’t know about the existence of tagged unions and how they can resolve runtime enums into comptime types with the help of inline switch prongs. I also still don’t know how to increment the value inside of a tagged union.

My discovered solution: Tagged Unions + inline switch prongs

I decided to create the following tagged union:

// Following Official docs here
// https://ziglang.org/documentation/0.12.0/#Tagged-union
// (test_tagged_union.zig)
const CellSize = enum { c1, c2, c4, c8 };
const Cell = union(CellSize) {
    c1: u8,
    c2: u16,
    c4: u32,
    c8: u64,

    // Custom helper functions
    fn get_type(comptime tag: CellSize) type {
        // Based on looking through LSP options (zls 0.12.0)
        return std.meta.TagPayload(Cell, tag);
    }

    // Based on official docs for Inline Switch Prongs
    // https://ziglang.org/documentation/0.12.0/#Inline-Switch-Prongs
    // (test_inline_else.zig)
    fn get_size(tag: CellSize) usize {
        return switch (tag) {
            inline else => |size| @sizeOf(std.meta.TagPayload(Cell, size)),
        };
    }
};

This allowed me to refactor the load_int function to the following without compile time errors:

fn load_int(self: *Self, cell_size: CellSize) Cell {
    const size = Cell.get_size(cell_size);
    var bytes: [size]u8 = undefined;
    for (0..size) |idx| {
        var offset = self.ptr + idx;
        if (offset >= self.memory.len) {
            offset -= self.memory.len;
        }
        bytes[idx] = self.memory[offset];
    }
    switch (cell_size) {
        inline else => |cell| return std.mem.readInt(Cell.get_type(cell), @constCast(bytes), .little),
    }
}

I also refactored the interpret function to the following:

pub fn interpret(self: *Self, code: []const u8) {
    var cell_size = CellSize.c1;
    var code_ptr: usize = 0;
    var mod_on = false;
    // ...
    while (code_ptr < code.len) : (code_ptr += 1) {
        // "Command" is an enum
        const cur_command = Command.from_char(code[code_ptr]);
        //...
        switch (cur_command) {
            //...
            .Mod2 => cell_size = CellSize.c2, // Handle modifiers
            .Mod4 => cell_size = CellSize.c4,
            .Mod8 => cell_size = CellSize.c8,
            .Increment => self.increment(cell_size),
            //...
        }
        if (cell_size != CellSize.c2) {
            mod_on = !mod_on;
            if (!mod_on) {
                cell_size = CellSize.c2;
            }
        }
    }
}

My remaining skill issue: Incrementing the value in a typed union

I am struggling a bit with a casting issue in the increment function:

fn increment(self: *Self, cell_size: CellSize) void {
    var value: Cell = self.load_int(cell_size);
    switch (value) {
        // acc to docs, this doesn't need to be inline... right?
        // https://ziglang.org/documentation/0.12.0/#Tagged-union
        // (test_switch_modify_tagged_union.zig)
        // NOTE: Compile error: incompatible types *root.Cell and comptime_int
        else => |*val| val.* = @addWithOverflow(val, 1)[0],
    }
    self.store_int(value);
}

This function causes the following compile error:

src/root.zig:140:36: error: incompatible types: '*root.Cell' and 'comptime_int'
            else => |*val| val.* = @addWithOverflow(val, 1)[0],
                                   ^~~~~~~~~~~~~~~~~~~~~~~~
src/root.zig:140:53: note: type '*root.Cell' here
            else => |*val| val.* = @addWithOverflow(val, 1)[0],
                                                    ^~~
src/root.zig:140:58: note: type 'comptime_int' here
            else => |*val| val.* = @addWithOverflow(val, 1)[0],
                                                         ^

I’ve tried a few different ways to rectifiy this type error such as @addWithOverflow(val, @as(@TypeOf(val)), @intCast(1)) to no avail. I just don’t know the proper way to do this when working with a tagged union. Probably something simple.

My journey through the docs

Since you’re interested in creating more documentation, I figured I would share my process of coming upon this solution. I’ll go in chronological order starting after I tried changing every comptime_int to a type.

I still have all my tabs open so the research trail will be fairly accurate. I’ll also try to recall all the refactoring I did. However, I can’t guarantee perfect accuracy there because I haven’t been commiting each change.

1. `comptime var int_type: type = u8;`

Off of your suggestion, I decided to try and figure out if I could simply drop in some comptime’s somewhere and make it work. I thought that maybe the compiler could resolve all the possible values through what is assigned to int_type if I simply marked it and each switch prong that modified the value as comptime. However, this resulted in an error stating something like int_type depends on runtime control flow.

2. `const int_types = .{u8, u16, u32, u64};`

After this, I had an idea. What if I defined a constant slice containing all of the valid types and simply index into that array at runtime?

I defined values similar to the following:

const int_types = .{u8, u16, u32, u64};
var cur_int_type: usize = 0;

Then, I passed them to my unchanged functions using the following syntax:

switch (cur_command) {
    .Mod2 => cur_int_type = 1,
    .Mod4 => cur_int_type = 2,
    .Mod8 => cur_int_type = 3,
    .Increment => self.increment(int_types[cur_int_type]),
    //...
}

However, this did not work. Here, I got an error stating that cur_int_type was not resolvable at compile time. I tried many different permuations of this solution, including making cur_int_type comptime and throwing in comptime everytime it was modified, getting various different compile errors.

Eventually, I realized that I was actually defining a tuple rather than a slice of types. I attempted to add []type to the definition but the compiler did not like that either.

3. `inline`

It was around this time that I noticed a new reply from @andrewrk to this post. I had never seen this keyword before so I decided to research it.

I first went to the official docs for inline switch prongs. I remember the code being a bit confusing and thinking it was not quite applicable. I was thinking “How do I do that for a set of types?”. I also remember being quite confused with where I would put the inline prongs. Every idea became a catch 22 situation coming back to the fact that I would have to edit int_type at runtime, an impossibility because this would cause int_type to rely on runtime control flow.

This led me to googling “how to use inline zig”, which led me to Zig Comptime - WTF is Comptime (and Inline) - Zig NEWS. This article really helped me grasp the fundamentals of comptime, but left me hanging a bit on inline. Here were my main takeaways:

A function taking in any argument as comptime becomes a comptime function.
My functions like increment and load_int take in comptime type’s, making them comptime functions.
My friend [InKryption] put it succinctly in a quote: “comptime exists in a sort of pure realm where I/O doesn’t exist.”
Since the types passed to those functions were dependent on runtime input, I would need to refactor them somehow.
You can use the inline keyword on a function to tell the compiler to copy the function contents wherever the function is called instead of calling the function, similar to macro functions in C.

While I went in wanting to understand inline, I came out realizing that I was going at this fundamentally wrong. It was less that I didn’t understand what comptime was, it was more that I was unaware how it propagated.

I now know that changing the type passed using var int_type in any shape or form made it unresolvable when passed to functions like increment(self: *Self, int_type: type). This is because increment’s true signature should be increment(self: *Self, comptime int_type: type).

As a retrospective aside, I actually think that the compiler errors and/or the language design failed me here. The compiler only errored on this function where it was called in interpret. I think it should’ve errored on the function for not explicitly marking the int_type argument as comptime. Maybe this is because type is always comptime, and therefore does not require the keyword. I can understand wanting to type less, but this seems against the ethos of the language which favors readability over writability. Either that, or I’m remembering the compiler errors wrong.

4. Wtf do be that `union(enum)` tho frfr???

After that, I returned to the official docs on inline switch prongs. This is where I looked back at test_inline_else.zig and noticed something I had never seen before:


const AnySlice = union(enum) {
    a: SliceTypeA, //  ^--- what does this mean??
    b: SliceTypeB,
    c: []const u8,
    d: []AnySlice,
};

I already had a grasp on what a union was from C. I also know that optionals and errors in zig are implemented as unions. However, I had never run into a union(enum). I had to Ctrl-f for this pattern through the docs to find out what this was.

Eventually, I stumbled upon the tagged union. This is where everything started to really click. I realized that the tags could be runtime resolvable enum’s that specify what union field to use.

What was a bit disappointing was that none of the code snippets under tagged union really demonstrated how hand-in-hand this plays with inline switch prongs. For that, I had to return back to inline switch prongs. Specifically, it didn’t all click until I looked again at test_inline_else.zig with the knowledge that AnySlice was a tagged union.

This was where I truly realized that inline also implies comptime in a way, meaning that the captured value (in this case, slice) is available at comptime. This is because a prong is generated at comptime for every possible tag of any:

fn withSwitch(any: AnySlice) usize {
    return switch (any) {
        // With `inline else` the function is explicitly generated
        // as the desired switch and the compiler can check that
        // every possible case is handled.
        inline else => |slice| slice.len,
    };
}

Now, based on the comment from the docs, I could’ve potentially realized this sooner. This is especially true when the following is the first line under the section title:

Switch prongs can be marked as inline to generate the prong’s body for each possible value it could have, making the captured value comptime.

However, I was mostly looking at the code. The comment featured in the code snippet above implies that slice is comptime because the compiler is checking for every possible case. I think if this comment explicitly mentioned that the captured slice value was now a comptime value, everything would’ve clicked a bit sooner.

Conclusion

I think tagged unions should be discussed more in documentation as a way to select a type to use for certain operations during runtime. In addition, there should be more emphasis placed on how inline can be used as a way to resolve runtime values to comptime values.

Sorry for the lack of convenience links when I reference the docs. As a new user I only get two links per post. Also, sorry the post got so long.

Thanks for your help!

andrewrk · June 2, 2024, 6:23pm

Although Zig supports peer type resolution in switch captures, as in this example:

test "switch capture peer type resolution" {
    const U = union(enum) {
        a: u32,
        b: u64,
        fn innerVal(u: @This()) u64 {
            switch (u) {
                .a, .b => |x| return x,
            }
        }
    };

    try expectEqual(@as(u64, 100), U.innerVal(.{ .a = 100 }));
    try expectEqual(@as(u64, 200), U.innerVal(.{ .b = 200 }));
}

…it does not work if you make the capture mutable (*val), because the type of the pointer would be different for each union field type. Looking at your code, can you answer this question: What integer type are you expecting @addWithOverflow to be operating on?

Essentially, you’re missing the logic here that would make incrementing the value from 255 to 256 work correctly.

Side note, what’s the point of having u8, u16, u32, and u64 in a union anyway? Just make it a u64. The smaller integers are taking up 8 bytes regardless, since it’s a union.

spicydll · June 2, 2024, 11:47pm

What I intend is for the value to overflow according to the active type’s size. For example, incrementing a Cell.u2 contiaining value 65535 would result in 0. Same with incrementing a Cell.u1 containing value 255.

I have realized I could achieve this simply by using a u64 everywhere and truncating as you state. However, I wanted to learn the language better by using language features I haven’t used before.

mnemnion · June 7, 2024, 1:36am

@addWithOverflow is probably not what you want here, but rather +%, which makes twos-complement overflow behavior into defined behavior.

Can you select between a set of comptime types at runtime?

My discovered solution: Tagged Unions + inline switch prongs

My remaining skill issue: Incrementing the value in a typed union

My journey through the docs

1. comptime var int_type: type = u8;

2. const int_types = .{u8, u16, u32, u64};

3. inline

4. Wtf do be that union(enum) tho frfr???

Conclusion

1. `comptime var int_type: type = u8;`

2. `const int_types = .{u8, u16, u32, u64};`

3. `inline`

4. Wtf do be that `union(enum)` tho frfr???