Think of it as a view into a sequence of bytes encoded in UTF-8.
I don’t think this is a particularly good/useful framing. The way I’ve framed it before is that “strings” in Zig are arbitrary sequences of bytes, and Zig makes it convenient to create UTF-8 encoded string literals.
Personally, I’d actually put even more emphasis on Zig not having strings at all (e.g. []u8
and []const u8
are just slice types like any other, so unless proven otherwise they’re just a slice of arbitrary bytes).
const a: []const u8 = "hello"; const b = a; // a and b point to the same data std.debug.print("a: {}, b: {}\n", .{ &a, &b });
This example isn’t showing what you intend. You’re printing the addresses of the slices themselves, not the pointers contained in the slices. As it stands, it only prints equal values because everything is comptime-known, if you changed either a
or b
to runtime known (const
→ var
), then this would print different values for &a
and &b
. You actually want a.ptr
and b.ptr
:
const a: []const u8 = "hello";
var b = a;
_ = &b; // force b to be runtime-known for demonstration purposes
// a and b point to the same data
std.debug.print("a: {*}, b: {*}\n", .{ a.ptr, b.ptr });
This is different from languages like JavaScript, where strings are immutable sequences of UTF-16 characters. In Zig, you’re much closer to the metal, working directly with bytes.
It might be interesting to note the similarities/differences of working with UTF-16 and UTF-8 in Zig:
- Similar in that you just use a slice type (
[]u16
/[]const u16
for UTF-16,[]u8
/[]const u8
for UTF-8) - Different in that it’s not convenient to make UTF-16 encoded string literals, but there’s a helper in
std.unicode.utf8ToUtf16LeStringLiteral
that makes it more convenient (andwtf8ToWtf16LeStringLiteral
if you want to explore that rabbit hole, which is relevant to JavaScript as well as far as I understand)