Un-escaping characters

Durobot · December 30, 2023, 10:35pm

Is there anything in stdlib (?) to help format a string as a character?

For example, if I have a string like \' (say, const s = "\\'";), or \x30 (const s = "\\x30";), how would I convert such a string to a single u8 containing ASCII code of ' (in the first case), or ASCII code of 0 (in the second case)?

I’ve been tinkering with bufPrint, but can’t get it to do this. Perhaps it’s not the right tool for the job.

mscott9437 · December 31, 2023, 12:33am

Any character you store in a u8 will be stored as a numerical value. The way you get the character back out is to print it from a string literal using the {s} specifier.

   const s = "\x30";
   std.debug.print("{d}\n", .{ s[0] }); // outputs 48
   std.debug.print("{s}\n", .{ s }); // outputs 0
   const i: u8 = s[0];
   std.debug.print("{d}\n", .{ i }); // outputs 48
   const t: u8 = '\x30';
   std.debug.print("{d}\n", .{ t }); // outputs 48

You would use bufprint to print the integer back into a slice to output it as the character itself

castholm · December 31, 2023, 12:37am

If what you’re asking is “how do I parse a string the same way the Zig compiler parses string literals”, check out the functions in the std.zig.string_literal namespace. I haven’t yet used any of them myself but parseCharLiteral seems to do exactly what you want with regard to individual characters.

Durobot · December 31, 2023, 5:06am

Yes, this is precisely what I want.
Thank you, I will look into parseCharLiteral.

Update: parseCharLiteral does indeed work like that, but requires quite a few sanity checks before you can call it.

Otherwise you risk running into panic: reached unreachable code. For example, if your string, after the opening ' does not start with \, or a 0 byte, parseCharLiteral will just throw it at utf8Decode, which, as you can plainly see, doesn’t care much for strings longer than 4 bytes.

Also the functions called by utf8Decode, utf8Decode* - if the string you gave them is not valid UTF8… well, unreachable.

Anyway, here’s how I currently plan calling parseCharLiteral:

const std = @import("std");

pub fn main() !void
{
    const char_strings = [_][]const u8{ "'A'", "'\\''", "'hello'", "'\\x30'", "'⚡'", "'\\u{}'",
                                        "'abcd'", "'abc'", "'ab'", "'\\x31'", "abcde" };

    for (char_strings) |s|
    {
        // Must ensure the preconditions are met before calling `parseCharLiteral`,
        // or get `panic: reached unreachable code`
        if (s.len < 3 or s[0] != '\'' or s[s.len - 1] != '\'' or s.len > 6
            or (s[1] != '\\' and s[1] != 0 and
                ((s.len == 6 and s[1] & 0b11111000 != 0b11110000) or
                 (s.len == 5 and s[1] & 0b11110000 != 0b11100000) or
                 (s.len == 4 and s[1] & 0b11100000 != 0b11000000))))
        {
            std.debug.print("!!! {s} does not represent a character\n", .{ s });
            continue;
        }
        const p = std.zig.parseCharLiteral(s);
        switch (p)
        {
            .success =>
            {
                if (p.success > 255)
                    std.debug.print("Character value, {}, is greater than 255 (must use u21)\n", .{ p.success })
                else
                    std.debug.print("Parsed character: {c}\n", .{ @as(u8, @intCast(p.success & 0x0000FF)) });
            },
            .failure => std.debug.print("!!! Could not parse {s} as a character\n", .{ s })
        }
    }
}