How can I convert a []const u8 to a []const u16?

f-tuason · December 3, 2025, 11:54am

I’ve tried…

var f: []const u16 = undefined;
var i: usize = 0;

for (s) |v| { //s in the string i want to convert, s = []const u8
  f[i] = @as(u16, v);
  i = i + 1;
}

…but doesn’t work, it errors “error: cannot assign to constant” on f[i] = @as(u16, v); What do i do?

adria · December 3, 2025, 12:28pm

It’s not possible to convert that way, you have to create a new slice of the appropriate type (allocating on the heap) and copy the elements over. Something like this:

const std = @import("std");

pub fn main() !void {
    const allocator = std.heap.page_allocator;
    const v: []const u8 = &.{ 0, 1, 2, 3 };
    const f = try allocator.alloc(u16, v.len);
    defer allocator.free(f);
    for (v, f) |vi, *fi| {
        fi.* = vi;
    }
    std.debug.print("{any}\n", .{v});
    std.debug.print("{any}\n", .{f});
}

Zonion · December 3, 2025, 1:29pm

Also, you can use 0.. syntax to get the current iteration index like:

for (s, 0..) |v, i| {
  f[i] = @as(u16, v);
}

(Or take a pointer to the current element like in @adria’s solution.)

smlavine · December 3, 2025, 2:55pm

If you’re looking to interpret each individual u8 value as its equivalent u16 value, then I believe the above answers are correct; but if instead you’re looking to simply bend the memory to your will, then this works:

const std = @import("std");

pub fn main() void {
    const array = [_]u8{ 'h', 'e', 'l', 'l', 'o', ' ', 'h', 'i' };
    comptime if (array.len % 2 != 0) @compileError("array len must be even");

    const s8: []const u8 = array[0..];
    for (s8) |n| {
        std.debug.print("{x} ", .{n});
    }
    std.debug.print("\n", .{});

    const s16: []const u16 = @as(
        *const [s8.len / 2]u16,
        @ptrCast(@alignCast(s8.ptr[0..])),
    );
    for (s16) |n| {
        std.debug.print("{x} ", .{n});
    }
    std.debug.print("\n", .{});
}

$ zig run main.zig
68 65 6c 6c 6f 20 68 69
6568 6c6c 206f 6968

tsdtas · December 3, 2025, 3:00pm

Arrays are value types, if they’re both arrays you don’t need the pointer indirection

const array = [_]u8{ 'h', 'e', 'l', 'l', 'o', ' ', 'h', 'i' };
comptime if (array.len % 2 != 0) @compileError("array len must be even");
const s16: [array.len / 2]u16 = @bitCast(array);

Edit: Which is of course not the same thing as OP, but I figured I’d add it here

f-tuason · December 3, 2025, 3:33pm

No, what I’m using it with OpenFile (the windows version), anyways thanks to all that has helped @smlavine @tsdtas @Zonion @adria

vulpesx · December 3, 2025, 11:23pm

If you’re asking about utf8 → utf16, there are a variety of stuff in std.unicode, note that it doesn’t provide complete Unicode support, just what is necessary to work with windows.

If this is what you are asking, then the solutions up to now are not helpful, utf16 is not utf8 with larger integers, it is a different format.

f-tuason · December 4, 2025, 3:55pm

ok…well noted!

invlpg · December 4, 2025, 4:16pm

Just a minor nitpick: Windows uses WTF-16, which is UTF-16 but allowing for invalid surrogate pairs. So @f-tuason should use the wtf functions in std.unicode and not the utf ones when dealing with windows paths.

pachde · December 4, 2025, 5:09pm

Similarly, file paths on other operating systems are wtf-8. Invalid Unicode in file systems is not just a Windows issue.

invlpg · December 4, 2025, 5:23pm

Which operating systems are you referring to?

On Linux and the BSDs, filenames are opaque byte strings. On macOS, they’re normalized according (approximately) to Unicode NFD.

Neither of those things are WTF-8.

hvbargen · December 4, 2025, 6:22pm

Theoretically, Linux filenames are opaque byte strings. But in many cases, you have to represent them to users or read the file names from somewhere else. Thus you need to know the encoding for these cases, and so the filenames can be invalid UTF-8 sequences.

Besides, I have also seen valid encodings but wrong nevertheless, eg an Latin letter A and a diaresis, but not a combining diaresis.

File name encoding is a constant source of trouble in the real world.

f-tuason · December 5, 2025, 7:25am

Like, I want to know these kind of things but I don’t know currently how to convert this…Is it suppose to be like this?

var utf16 = allocator.alloc(u16, utf8.len);
var sz: usize = 0;
defer allocator.destroy(utf16);
sz = try std.unicode.wtf8toWtf16Le(utf16, utf8);

or can you give me a sample on how to convert?

squeek502 · December 5, 2025, 9:32am

If you want to allocate on the heap, simplest would just be

const wtf16 = try std.unicode.wtf8ToWtf16LeAlloc(allocator, wtf8);
defer allocator.free(wtf16);

If you want to allocate a buffer, you can get the required size with std.unicode.calcWtf16LeLen (returns the required size for allocating []u16)

If the WTF-8 string is a string literal, you can get a WTF-16 LE version of that with std.unicode.wtf8ToWtf16LeStringLiteral:

const wtf16 = std.unicode.wtf8ToWtf16LeStringLiteral("foo");

Relevant

Cloudef · December 5, 2025, 10:28am

MacOS dropped unicode normalization with APFS. Good riddance, it made dealing with CJK filenames very annoying.

n4chh · May 25, 2026, 6:16am

I’ve found there are some util functions in the std.unicode that already do the proposed solution:

Depending if the string is known at comptime or not, you can use the first one without needing to allocate memory, otherwise you can pass an allocator to other functions and they will allocate the resources.

Sietse2202 · May 25, 2026, 7:28am

Those are close, but for windows it’s specifically WTF-16, not UTF-16