How can I convert a []const u8 to a []const u16?

I’ve tried…

var f: []const u16 = undefined;
var i: usize = 0;

for (s) |v| { //s in the string i want to convert, s = []const u8
  f[i] = @as(u16, v);
  i = i + 1;
}

…but doesn’t work, it errors “error: cannot assign to constant” on f[i] = @as(u16, v); What do i do?

It’s not possible to convert that way, you have to create a new slice of the appropriate type (allocating on the heap) and copy the elements over. Something like this:

const std = @import("std");

pub fn main() !void {
    const allocator = std.heap.page_allocator;
    const v: []const u8 = &.{ 0, 1, 2, 3 };
    const f = try allocator.alloc(u16, v.len);
    defer allocator.free(f);
    for (v, f) |vi, *fi| {
        fi.* = vi;
    }
    std.debug.print("{any}\n", .{v});
    std.debug.print("{any}\n", .{f});
}
2 Likes

Also, you can use 0.. syntax to get the current iteration index like:

for (s, 0..) |v, i| {
  f[i] = @as(u16, v);
}

(Or take a pointer to the current element like in @adria’s solution.)

If you’re looking to interpret each individual u8 value as its equivalent u16 value, then I believe the above answers are correct; but if instead you’re looking to simply bend the memory to your will, then this works:

const std = @import("std");

pub fn main() void {
    const array = [_]u8{ 'h', 'e', 'l', 'l', 'o', ' ', 'h', 'i' };
    comptime if (array.len % 2 != 0) @compileError("array len must be even");

    const s8: []const u8 = array[0..];
    for (s8) |n| {
        std.debug.print("{x} ", .{n});
    }
    std.debug.print("\n", .{});

    const s16: []const u16 = @as(
        *const [s8.len / 2]u16,
        @ptrCast(@alignCast(s8.ptr[0..])),
    );
    for (s16) |n| {
        std.debug.print("{x} ", .{n});
    }
    std.debug.print("\n", .{});
}
$ zig run main.zig
68 65 6c 6c 6f 20 68 69
6568 6c6c 206f 6968
1 Like

Arrays are value types, if they’re both arrays you don’t need the pointer indirection

const array = [_]u8{ 'h', 'e', 'l', 'l', 'o', ' ', 'h', 'i' };
comptime if (array.len % 2 != 0) @compileError("array len must be even");
const s16: [array.len / 2]u16 = @bitCast(array);

Edit: Which is of course not the same thing as OP, but I figured I’d add it here :sweat_smile:

1 Like

No, what I’m using it with OpenFile (the windows version), anyways thanks to all that has helped @smlavine @tsdtas @Zonion @adria

If you’re asking about utf8 → utf16, there are a variety of stuff in std.unicode, note that it doesn’t provide complete Unicode support, just what is necessary to work with windows.

If this is what you are asking, then the solutions up to now are not helpful, utf16 is not utf8 with larger integers, it is a different format.

2 Likes

ok…well noted!

Just a minor nitpick: Windows uses WTF-16, which is UTF-16 but allowing for invalid surrogate pairs. So @f-tuason should use the wtf functions in std.unicode and not the utf ones when dealing with windows paths.

5 Likes

Similarly, file paths on other operating systems are wtf-8. Invalid Unicode in file systems is not just a Windows issue.

Which operating systems are you referring to?

On Linux and the BSDs, filenames are opaque byte strings. On macOS, they’re normalized according (approximately) to Unicode NFD.

Neither of those things are WTF-8.

Theoretically, Linux filenames are opaque byte strings. But in many cases, you have to represent them to users or read the file names from somewhere else. Thus you need to know the encoding for these cases, and so the filenames can be invalid UTF-8 sequences.

Besides, I have also seen valid encodings but wrong nevertheless, eg an Latin letter A and a diaresis, but not a combining diaresis.

File name encoding is a constant source of trouble in the real world.

Like, I want to know these kind of things but I don’t know currently how to convert this…Is it suppose to be like this?

var utf16 = allocator.alloc(u16, utf8.len);
var sz: usize = 0;
defer allocator.destroy(utf16);
sz = try std.unicode.wtf8toWtf16Le(utf16, utf8);

or can you give me a sample on how to convert?

If you want to allocate on the heap, simplest would just be

const wtf16 = try std.unicode.wtf8ToWtf16LeAlloc(allocator, wtf8);
defer allocator.free(wtf16);

If you want to allocate a buffer, you can get the required size with std.unicode.calcWtf16LeLen (returns the required size for allocating []u16)

If the WTF-8 string is a string literal, you can get a WTF-16 LE version of that with std.unicode.wtf8ToWtf16LeStringLiteral:

const wtf16 = std.unicode.wtf8ToWtf16LeStringLiteral("foo");

Relevant

2 Likes

MacOS dropped unicode normalization with APFS. Good riddance, it made dealing with CJK filenames very annoying.

1 Like