I appreciate you taking the time to review my work and share your thoughts. Constructive feedback is always valuable, and I understand the importance of aligning with std
conventions where it makes sense. I’ll go through your points one by one to clarify my design choices and address any concerns.
1. Following std
conventions (std.ArrayList
)
I understand the preference for following std
conventions, particularly how std.ArrayList
manages memory with a capacity
field. However, my goal was not to create a drop-in replacement for std.ArrayList
, but rather a specialized string-handling library optimized for Unicode.
std.ArrayList
is great for general-purpose dynamic arrays, but strings have unique requirements, especially regarding efficient Unicode manipulation.
- That said, I’m open to adjusting terminology (e.g., using
capacity
instead of len
where applicable) to improve compatibility if it provides real benefits.
2. Why create new types instead of using std.mem
, std.unicode
, and std.ArrayList
?
I see the logic behind this question, but my reasoning is as follows:
- While
std.unicode
and std.ArrayList
provide useful utilities, they are not optimized for seamless text handling.
std.unicode
offers limited Unicode support, primarily at the Codepoint level, but working with Grapheme Clusters (which are essential for proper text rendering and manipulation) requires extra effort.
- My library eliminates this complexity, providing a simple and efficient API to handle Unicode text correctly without requiring developers to compose multiple
std
functions manually.
3. “Saying the library supports Unicode is ambiguous”
I see why you’d bring this up, but the statement isn’t entirely accurate. My library fully supports Unicode, not just UTF-8.
- The key distinction is that
std.unicode
only supports Codepoints, while my library adds Grapheme Cluster support, which is crucial for correct string processing.
- Working with
std.unicode
at the Codepoint level can lead to incorrect results when dealing with complex characters (e.g., emoji sequences or modifier characters).
- My library simplifies this significantly.
Example:
const txt = "Aأ你🌟☹️👨🏭@";
var iterator = try unicode.Iterator.init(txt);
while (iterator.nextGraphemeCluster()) |grapheme_cluster| {
std.debug.print("[{s}]\n", .{grapheme_cluster});
}
// Output:
// [A]
// [أ]
// [你]
// [🌟]
// [☹️]
// [👨🏭]
// [@]
With std.unicode
, achieving the same result would require manual processing, making things far more complex.
4. “I would prefer if Unicode handling was like std.unicode
rather than custom string types”
This depends on the intended goal. std.unicode
provides basic Unicode utilities, but it does not offer the kind of structured text handling that my library does.
std.unicode
focuses on individual Codepoints, but real-world text often requires Grapheme Cluster awareness (especially for emoji, accented characters, and complex scripts).
- If I relied solely on
std.unicode
, developers would still need to manually handle Grapheme Clusters, whereas my library provides this functionality out of the box.
Conclusion:
I did not reinvent the wheel—I improved Unicode handling in a practical way that std.unicode
lacks.
My library provides direct support for Grapheme Clusters, making text processing easier and more accurate.
While std.ArrayList
is great, my library is designed specifically for efficient string handling.
I’m open to aligning some terminology with std
conventions where it improves compatibility.
Again, I appreciate your feedback! If you have further thoughts, I’m happy to discuss. 