Best practices for type conversion methods

sorairolake · July 7, 2024, 3:38pm

I want to declare type conversion methods in Zig similar to From and TryFrom of Rust.

For methods to convert T to U, the method signature should probably look like the following:

// Conversions that always succeed.
pub fn from(self: T) U
// Conversions that may fail.
pub fn tryFrom(self: T) !U

In these cases, what are the recommended method names?

dimdin · July 7, 2024, 6:54pm

if T is a string:

pub fn parseSomething(text: []const u8) !U

tryFrom is not a good name for a zig function, because the try keyword can appear before the name: try tryFrom(
There are 5 functions in the std library that follow the naming pattern:

fn something(self: T) ?U
fn somethingUnchecked(self: T) U

Most of the times, one version (the try/checked one) is enough because it is easy to ignore errors:

const u = try t.from(); // caller handles the error
const u = t.from() catch unreachable; // crash on production if Murphy is right

Sze · July 7, 2024, 11:41pm

Language Reference: unreachable:

In ReleaseFast and ReleaseSmall mode, the optimizer uses the assumption that unreachable code will never be hit to perform optimizations.

How can it be used for optimizations if you rely on it causing a crash, do you mean, do what ever it wants?

I thought we had come to a consensus that this is a misuse, unless the error is actually impossible. In this doc (error discarding misuse):

mnemnion · July 8, 2024, 12:15am

There’s rather a lot of this in some parts of the standard library, unfortunately. My take is that if those error codes are unrecoverable, they should be using @panic, not unreachable. “A system my program doesn’t control will never give me this number in response to a syscall” is an invalid invitation to undefined behavior.

AndrewCodeDev · July 8, 2024, 12:46am

A quick clarification for new users who want to read up on this - unreachable in a comptime context is not the same as the runtime context. It’s the runtime context we’re mostly talking about here.

I agree with @Sze here. Unreachable sounds like the wrong tool for the job. I’ve brought this up before, but there are also places in the standard where errors are just silently dropped but probably for good reason. Take a look at the default logging implementation to see an example of this.

mnemnion · July 8, 2024, 12:53am

Good point to make, and I wasn’t talking about the unreachables in system-specific code, but in the error-return switch statements. I think the best policy is to just return whatever bonkers error the kernel decides to hand out today, but if some of them should crash, they should really crash. The status quo can lead to undefined behavior in code which isn’t safety-checked, which I see as a bad combination with errors returned from syscalls. I hold that a runtime unreachable should always mean “by construction, this code will never produce this value/reach this branch”, and that is not a claim which can be made about a magic number which comes from outside the program.

But sure, a comptime unreachable will crash the compiler, and while @compileError is a little nicer, what with the message and all, it’s not a risk to anyone’s runtime, it’s fine, just a bit terse.

AndrewCodeDev · July 8, 2024, 2:12am

Yes, and to be fair, it’s possible that many of those are placeholders for what will be a more permanent solution in the future.

@sorairolake, anyhow, back to the above example. Zig is already very particular about type conversion, so I’d like to see an example that isn’t already handled best by native code.

When we talk about conversion, I wonder if we’re referring to it in the “conversion constructor” sense of C++. In other words, I try to invoke some function quietly that knows how to take an integer and return an array-like-type (like std::vector(10) or something similar).

If we’re talking about going between native numeric types (f16 → u8) then there’s a lot you can do by just wrapping the builtin conversion operators and calling the correct one. I have something like this already but I ended up not using it because my designs changed. They are definitely helpful, but I don’t see a generic way forward for taking any kind of T and turning it into a U instead. That gets into the territory of “implicit conversion” and unless we’re scanning for named functions (aka, does this type have a function named init that takes a single i32), then I’d need to see a concrete example to be more clear here.

sorairolake · July 9, 2024, 11:00pm

As a concrete example, suppose I want to define methods to convert the ExitCode type in the following library to primitive integer types, and from error types in the standard library.

pub fn foo(self: ExitCode) u8
pub fn bar(self: ExitCode) i32
pub fn baz(err: WriteError) !ExitCode

In this case, what names would be recommended for foo, bar and baz? Would toU8, toI32 and fromWriteError be recommended?

dude_the_builder · July 10, 2024, 12:38am

For the first two, maybe just as with a type parameter (haven’t tested this):

pub fn as(self: ExitCde, comptime T: type) T {
    return @intFromEnum(self);
}

// usage
const int_a = exit_code.as(u8);
const int_b = exit_code.as(u32);

The last one could be exitCodeFromError but that would require a more elaborate implementation.

AndrewCodeDev · July 10, 2024, 12:41am

In this case, what you’re technically referring to here is a take on the “Lippincott” function: C++ Secrets: Using a Lippincott Function for Centralized Exception Handling

Instead of using exceptions, you’re using enums/errors.

Enums can be signed and using @intFromEnum works fine for making the conversion relatively easy.

I don’t know of a way to cleanly convert errors though to signed values (say for instance if some C library uses -1 as a return value). That will probably require a switch statement. Also, errors are not numbered like enums in all cases - if you coerce to the global error set, your values may not be what you anticipate (so @intFromError may return values you weren’t expecting).