Newbie comptime/anytype Papercuts

robert-wallis · April 24, 2024, 6:19pm

I keep running into papercuts, spend hours fighting the comptime expression checker, instead of hours writing code. In the following example I won’t be changing any logic, just type signatures.

Example A:

This code works. It uses an anytype and returns input.len * 2 bytes. This is a pattern from std.fmt.bytesToHex which I’ll bring up in Example D.

const std = @import("std");

fn twice(input: anytype) [input.len * 2]u8 {
    var output: [input.len * 2]u8 = undefined;
    for (input, 0..input.len) |c, i| {
        output[i] = c;
        output[input.len + i] = c;
    }
    return output;
}

test twice {
    const input = "pizza";
    const actual: [input.len * 2]u8 = twice(input);
    try std.testing.expectEqualStrings("pizzapizza", &actual);
}

Example B:

By just changing input from anytype to []const u8

fn twice(input: []const u8) [input.len * 2]u8

I get this error :

% zig-0.12.0 test example_b.zig 
example_b.zig:3:35: error: unable to evaluate comptime expression
fn twice(input: []const u8) [input.len * 2]u8 {
                             ~~~~~^~~~
referenced by:
    decltest.twice: example_b.zig:14:39

When I read that error, not knowing about the anytype hack, I was confused why the Standard Library can use a comptime expression for the return value, but I am not allowed.

Example C:

Changing the signature to add comptime with []const u8 works:

fn twice(comptime input: []const u8) [input.len * 2]u8

And that makes sense logically, the error says input.len is a comptime expression so it makes sense to force input to be comptime.

But the original anytype version Example A, doesn’t indicate it’s a comptime function.

I as a newbie, don’t want it to be comptime because that means to me that it won’t work for dynamic strings, and it’s only working now because it’s in a comptime test.

Example D:

Let’s start over, and use a function I can’t change.
std.fmt.bytesToHex has the following signature:

pub fn bytesToHex(input: anytype, case: Case) [input.len * 2]u8

So to use it I started by asking for a slice of []u8 which won’t compile.

const std = @import("std");

test "bytesToHex" {
    const input = "\xde\xad\xc0\xde";
    const actual: []u8 = std.fmt.bytesToHex(input, .lower);
    try std.testing.expectEqualStrings("deadc0de", &actual);
}

example_d.zig:5:44: error: array literal requires address-of operator (&) to coerce to slice type '[]u8'
    const actual: []u8 = std.fmt.bytesToHex(input, .lower);

You probably already knew it was an array and I was asking for a slice. I as a newbie haven’t burned that into my brain yet.

Example E:

Sure I’ll coerce it, no big deal.

% zig-0.12.0 test example_e.zig
example_e.zig:5:26: error: expected type '[]u8', found '*const [8]u8'
    const actual: []u8 = &std.fmt.bytesToHex(input, .lower);
                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
example_e.zig:5:26: note: cast discards const qualifier

Example F:

I see an [8] in the length, so it has a constant length.
bytesToHex returns an array of double length. I can do that too.
Just adding [input.len * 2] to change actual into an array should work:

example_f.zig:5:39: error: expected type '[8]u8', found pointer
    const actual: [input.len * 2]u8 = &std.fmt.bytesToHex(input, .lower);
                                      ^
example_f.zig:5:39: note: address-of operator always returns a pointer

Yeah, the ‘&’ in there because zig wanted me to use the “address of operator(&) to coerce to slice”.

Example G

Got it, actual is an array now not a slice.

const std = @import("std");

test "bytesToHex" {
    const input = "\xde\xad\xc0\xde";
    const actual: [input.len * 2]u8 = std.fmt.bytesToHex(input, .lower);
    try std.testing.expectEqualStrings("deadc0de", &actual);
}

Example H

But the sad part is I could have avoided all that by just not specifying the type.

const std = @import("std");

test "bytesToHex" {
    const input = "\xde\xad\xc0\xde";
    const actual = std.fmt.bytesToHex(input, .lower);
    try std.testing.expectEqualStrings("deadc0de", &actual);
}

Back and Forth

What it feels like to be a zig user, is going back and forth with the compiler trying to implement its suggestions but finding another error.

It sounds good to have a language that doesn’t hide allocations. But sometimes it does hide stack allocations at comptime. bytesToHex is doubling the memory used by input.

I’ve only been using the language for about two weeks. But I find myself writing code in another language to get the logic figured out. And then 2x to 10x more time translating that to zig. Dev time is valuable, sometimes it’s more expensive than runtime.

Solutions?

What do you think the solutions are? It’s a ‘skill issue’ on my part, but can something be done to help bring engineer’s skill up?

I read this documentation on type coercion slices, arrays and pointers but when actually coding like above, I am somehow often iterating trying to get it to compile
Is not specifying a type a good shortcut to writing code faster that compiles?
Should the array literal coerce error not activate for const qualified arrays?

kristoff · April 24, 2024, 6:59pm

An array has comptime-known length, a slice has runtime-known length, hence why the former can be used to define the return type, while the latter causes an error. Usage anytype in this context is not a hack, but just leveraging the correct language tool for the job. Slices are designed to represent references sequences of data whose length is not statically known so you shouldn’t accept a slice if you’re trying to define at comptime the length of the returned array.

anytype doesn’t mean that the argument is comptime known. for example if you were to pass a slice to the example A implementation, you would see an error. anytype in that context was meant to accept array values, which works because the length is part of the type, and thus is statically known.

The slice cases (D, E) didn’t work because []u8 is a mutable slice. Had you used []const u8 it would have worked. The slice has to be const because you’re taking a pointer from a temporary value (the array returned by bytesToHex) and those values are implicitly const.

The compiler could maybe give you better suggestions but no matter how hard it tries, it can’t reliably direct you towards success because the reason why you got stuck in the first place is that you yourself don’t know exactly what it is that you want. For example, given your final success examples, it turns out that you didn’t want a []u8, because expectEqualStrings accepts a []const u8 as the second argument.

But in a different case maybe another user might actually need to obtain a mutable slice because the function they eventually plan to pass it to does need mutability.

So IMO the only sound solution to this problem is the following:

Don’t overdo it with comptime metaprogramming, especially in the beginning.
Learn well the relationship between arrays, pointers and slices.

[2] Takes some time but if you make a deliberate effort to learn how this stuff works (as opposed to trying to pick it up from compile errors), you will get to a point where it all clicks into place, and at that point you will be fully aware of the implications of each of the things that you tried in this post, and why the Zig compiler was designed to error out in some of them and not others.

You’ve already linked to some resources on that topic, which is the right move in my opinion. Here are some other resources that might help:

LucasSantos91 · April 25, 2024, 3:18pm

One way to avoid headaches with anytype is to request the things that you need as explicit parameters, like so:

fn twice(comptime T: type, comptime n: usize, input: [n]T) [2*n]T

Validark · April 27, 2024, 2:08pm

If you are confused by what types are being assigned automatically, you should use @compileLog. That will help you realize the problems that Loris points out.

pierrelgol · April 28, 2024, 1:13am

I was not so long ago in the same position as you, but I can tell you that you’ll get acquainted with the type system sooner than you might think. I came to Zig from a C background, and trying to code in Zig the same way I’m used to in C (but I guess it’s probably the case with most C like languages) was a miserable experience, because in C you sort of forget that the type system is mostly optional, and that you are free to cast and assign anything to anything, so coming to Zig was a big shift and at first just like you I was frustrated that I needed to create so many intermediary values, that the array syntax and just type declaration was confusing, but once you get used to it, it really does make a lot of sense, and even with the seemingly unhelpful compiler errors, at some point you’ll see patterns, like the most frustrating error for me but also the one that I’m now able to resolve the quickest is when you forget a try somewhere, it breaks all the type resolution, and you can get some pretty wild compiler errors.

TLDR it will get better overtime.

robert-wallis · April 28, 2024, 8:21am

Thanks everyone! Your suggestions have really helped me in the last few days.

My fluency writing zig, with fewer compiler error stops, has gone up dramatically by following these rules in my head:

input byte buffers should be []const u8, outputs []u8
slices are not arrays, slices are not arrays, slices are not arrays
in tests, use &"\xde\xad\xc0\xde".* to make a quick buffer, so that my tests aren’t dictating the type of the function just to pass the test, and then messing up the implementation that uses a dynamic buffer
adding compile time validations to the top of a function are pretty helpful

kristoff · April 28, 2024, 12:04pm

If the return value is the result of slicing an input, that too will need to be const. More in general I think that const should be the default option unless you have an active reason for it to be mutable (both as input and output).