Because strings are complicated.
That zig-string
library you linked, for example, would mishandle comparison:
var myString = String.init(allocator);
defer myString.deinit();
try myString.concat("Ç");
assert(myString.cmp("Ç"));
This assertion would fail, even though the strings appear to be identical. That’s because the first uses Normalization Form D: C
(U+0043) + ◌̧
(U+0327), while the second uses Normalization Form C: Ç
(U+00C7). To actually compare UTF-8 strings in ways a human might expect, decisions about normalization need to be made.
The above is just one example. This series of articles by @dude_the_builder details the complication of Unicode well:
- Unicode Basics in Zig - Zig NEWS
- Ziglyph Unicode Wrangling - Zig NEWS
- Unicode String Operations - Zig NEWS
(note that ziglyph has now been superseded by zg)
So, for Zig to have a ‘proper’ UTF-8 String implementation, it would need to embed the Unicode data and deal with all the complications of dealing with Unicode. My understanding is that’s not something that Zig-the-language or Zig-the-standard-library wants to take on if it doesn’t have to (especially since the Unicode data is a moving target).
Additionally, a UTF-8 String type is unable to handle arbitrary data, meaning the String type could not be used for a lot of the things Zig cares about: file paths, environment variables, etc. See Fix handling of Windows (WTF-16) and WASI (UTF-8) paths, etc by squeek502 · Pull Request #19005 · ziglang/zig · GitHub for more details on that sort of thing.