Notes and reference about working with strings

Think of it as a view into a sequence of bytes encoded in UTF-8.

I don’t think this is a particularly good/useful framing. The way I’ve framed it before is that “strings” in Zig are arbitrary sequences of bytes, and Zig makes it convenient to create UTF-8 encoded string literals.

Personally, I’d actually put even more emphasis on Zig not having strings at all (e.g. []u8 and []const u8 are just slice types like any other, so unless proven otherwise they’re just a slice of arbitrary bytes).

const a: []const u8 = "hello";
const b = a;
// a and b point to the same data
std.debug.print("a: {}, b: {}\n", .{ &a, &b });

This example isn’t showing what you intend. You’re printing the addresses of the slices themselves, not the pointers contained in the slices. As it stands, it only prints equal values because everything is comptime-known, if you changed either a or b to runtime known (constvar), then this would print different values for &a and &b. You actually want a.ptr and b.ptr:

const a: []const u8 = "hello";
var b = a;
_ = &b; // force b to be runtime-known for demonstration purposes
// a and b point to the same data
std.debug.print("a: {*}, b: {*}\n", .{ a.ptr, b.ptr });

This is different from languages like JavaScript, where strings are immutable sequences of UTF-16 characters. In Zig, you’re much closer to the metal, working directly with bytes.

It might be interesting to note the similarities/differences of working with UTF-16 and UTF-8 in Zig:

  • Similar in that you just use a slice type ([]u16/[]const u16 for UTF-16, []u8/[]const u8 for UTF-8)
  • Different in that it’s not convenient to make UTF-16 encoded string literals, but there’s a helper in std.unicode.utf8ToUtf16LeStringLiteral that makes it more convenient (and wtf8ToWtf16LeStringLiteral if you want to explore that rabbit hole, which is relevant to JavaScript as well as far as I understand)
7 Likes