Why no builtin string type?

squeek502 · July 24, 2024, 12:00am

Because strings are complicated.

That zig-string library you linked, for example, would mishandle comparison:

var myString = String.init(allocator);
defer myString.deinit();

try myString.concat("Ç");
assert(myString.cmp("Ç"));

This assertion would fail, even though the strings appear to be identical. That’s because the first uses Normalization Form D: C (U+0043) + ◌̧ (U+0327), while the second uses Normalization Form C: Ç (U+00C7). To actually compare UTF-8 strings in ways a human might expect, decisions about normalization need to be made.

The above is just one example. This series of articles by @dude_the_builder details the complication of Unicode well:

(note that ziglyph has now been superseded by zg)

So, for Zig to have a ‘proper’ UTF-8 String implementation, it would need to embed the Unicode data and deal with all the complications of dealing with Unicode. My understanding is that’s not something that Zig-the-language or Zig-the-standard-library wants to take on if it doesn’t have to (especially since the Unicode data is a moving target).

Additionally, a UTF-8 String type is unable to handle arbitrary data, meaning the String type could not be used for a lot of the things Zig cares about: file paths, environment variables, etc. See Fix handling of Windows (WTF-16) and WASI (UTF-8) paths, etc by squeek502 · Pull Request #19005 · ziglang/zig · GitHub for more details on that sort of thing.