Dynamic string done right

maysara-elshewehy · January 6, 2025, 10:41am

Hello everyone,
Thank you so much for your valuable comments and the time you’ve taken to respond to this topic.

I deeply appreciate all opinions, respect them, and believe they can greatly help me improve my work.

I must admit that I may have overstated my work’s current state. However, if you read the post, you’ll notice that I am primarily seeking advice and constructive criticism because I am fully aware that my code (as it stands) is not perfect. Therefore, there’s no need to emphasize this point—it’s already the purpose of the post.

Regarding the term Super-ZIG, there seems to be some confusion, as I haven’t mentioned it in this post at all.

As for the lack of functions, I had many functions in earlier iterations of the code. However, after receiving substantial criticism, I decided to rewrite the code entirely from scratch. At this stage, I chose to focus on the core functionality to ensure a solid foundation, rather than spending time rewriting functions. Rest assured, I will reintroduce those functions later after making sure I’ve implemented the core correctly.

Concerning the suggestion to use the std.ArrayList interface by @pierrelgol, I admit I was initially resistant due to my lack of understanding. Now, I see things clearly, and I completely agree with your suggestion—thank you for pointing it out.

I will now work on rebuilding the code to make it better and more efficient. Thank you all once again for your time and constructive feedback.

Calder-Ty · January 6, 2025, 11:28am

There is plenty of room between constructive criticism and personal attacks. There’s no need for the later.

maysara-elshewehy · January 6, 2025, 12:19pm

Thank you for pointing this out; it helped me a lot in understanding how to optimize my implementation.

Regarding the rest of your feedback, I’ve shared my thoughts in my earlier comments, and I’ll continue working on improving the library step by step.

I appreciate your time and input. Constructive criticism like this is invaluable as I learn and refine my approach.

maysara-elshewehy · February 8, 2025, 11:43pm

ForeverZer0 · February 9, 2025, 12:16am

This doesn’t exactly look like organic growth.

I fully support your endeavor to learn Zig and contribute to the community, but am turned off by the “shortcuts” (for lack of a better word) being taken. I will be charitable and assume that issues such as a the comically misleading/inaccurate benchmarks results are simply a mistake, and not false marketing, but along with other evidence (such as the stars, previous posts, “done right”, etc), it is beginning to strain credulity.

maysara-elshewehy · February 9, 2025, 12:28am

I appreciate your concern and your support for Zig contributions. However, I’d like to clarify a few points:

Regarding the star growth: GitHub stars are a public metric, and anyone can verify the timeline. The growth pattern reflects organic interest, likely influenced by community engagement, discussions, or external mentions. If you have any concrete evidence suggesting otherwise, I’d be happy to address it.
Regarding the benchmarks: If you believe there are inaccuracies, I welcome specific feedback. I’m always open to refining tests to ensure they are as fair and transparent as possible. Instead of general skepticism, I’d appreciate concrete examples where results seem misleading so they can be improved.
Regarding naming (‘done right’): This is a common way to express confidence in an approach, not a claim of absolute superiority. If the name causes misunderstanding, I’m open to discussing it, but the focus remains on delivering a solid library.

I value constructive criticism, and I’m happy to engage in a productive discussion based on factual evidence. Let me know how we can move forward in a meaningful way.

Edit:
Regarding “but am turned off by the ‘shortcuts’ (for lack of a better word) being taken”, someone else asked me the same question, and I already addressed it. You can check the discussion here.

ForeverZer0 · February 9, 2025, 12:56am

I truly do not want to engage in some arbitrary back and forth on what I perceive to be quite apparent to even a casual observer, nor stir up any negativity. This is not a legal venue; I am not wasting either of our time litigating the obvious.

I stated and stand by my concerns, and I wish your project success.

maysara-elshewehy · February 9, 2025, 1:01am

I appreciate your well wishes.

However, constructive criticism is always more valuable when backed by specifics rather than general perceptions.

If something seems ‘apparent,’ it should be easy to demonstrate with concrete evidence.

That said, I respect your perspective and will continue focusing on improving the project based on actionable feedback.

Best of luck with your own endeavors as well!

TUSF · February 9, 2025, 1:42am

Why are your strings null-terminated? And why are you storing the length of the string again?

The whole point of having []const u8 is that it comes with a length already. If you want to have those be seperate fields in your struct, then instead of []const u8, you would use [*]u8 (or [*]const u8) and your Viewer.initWithSlice (for example) would store initial_bytes.ptr and initial_bytes.len in those fields.

maysara-elshewehy · February 9, 2025, 1:50am

Good question! However, this isn’t entirely accurate.

Edit:
It looks like I had already addressed this issue earlier. The current code does not null-terminate the string, even for fixed-size buffers. Could you provide a specific example where you see this happening?

Here’s how the current implementation works:

pub fn initWithSlice(slice: []const u8) Self {
    return Self{
        .m_src = makeArrayAndFillWithSlice(slice),
        .m_len = utils.bytes.countWritten(slice),
    };
}

Since self.m_len represents the length of the written bytes, there is no need for null termination.

Edit #2:
Additional point,

self.m_len and self.len() return the length of the written bytes.

self.m_src.len and self.size() return the capacity (how many bytes can be written to this bytes-array?)

TUSF · February 9, 2025, 4:23am

That countWritten function checks for null bytes in the string, and sets the length there. Does your library basically assume strings can’t contain null bytes?

Gotcha. Valid. I would maybe store the capacity as a separate field to make that clear, but that’s just me.

maysara-elshewehy · February 9, 2025, 4:33am

We don’t need to store the capacity separately, it would just be unnecessary overhead, you can always use the inlined-functions like len() and size() directly.

vulpesx · February 9, 2025, 4:49am

i would prefer if you followed the convention of std, std.ArrayList in particular, of storing a slice of used memory and capacity as a second field, or at least use the same terminology, i.e. capacity instead of len where appropriate. That would allow it to be almost a drop-in replacement for std.ArrayList, also would remove the need for some of the helper functions.

i also dont see the point of these new types, std.mem, std.unicode and std.ArrayList largely implement most of the functionality that can be composed to to implement the rest of the functionality, you could have just made some wrapper functions.

i understand this was probably a learning project but since your presenting this as a ‘real’ string library i felt i should mention it.

also its ambiguous to say it supports unicode, there are many unicode formats, unless im missing something its just utf-8,

if you supported other formats that would be more interesting, even then i would prefer it to be like std.unicode instead of string types, as its simply more versitile.

maysara-elshewehy · February 9, 2025, 5:26am

I appreciate you taking the time to review my work and share your thoughts. Constructive feedback is always valuable, and I understand the importance of aligning with std conventions where it makes sense. I’ll go through your points one by one to clarify my design choices and address any concerns.

1. Following `std` conventions (`std.ArrayList`)

I understand the preference for following std conventions, particularly how std.ArrayList manages memory with a capacity field. However, my goal was not to create a drop-in replacement for std.ArrayList, but rather a specialized string-handling library optimized for Unicode.

std.ArrayList is great for general-purpose dynamic arrays, but strings have unique requirements, especially regarding efficient Unicode manipulation.
That said, I’m open to adjusting terminology (e.g., using capacity instead of len where applicable) to improve compatibility if it provides real benefits.

2. Why create new types instead of using `std.mem`, `std.unicode`, and `std.ArrayList`?

I see the logic behind this question, but my reasoning is as follows:

While std.unicode and std.ArrayList provide useful utilities, they are not optimized for seamless text handling.
std.unicode offers limited Unicode support, primarily at the Codepoint level, but working with Grapheme Clusters (which are essential for proper text rendering and manipulation) requires extra effort.
My library eliminates this complexity, providing a simple and efficient API to handle Unicode text correctly without requiring developers to compose multiple std functions manually.

3. “Saying the library supports Unicode is ambiguous”

I see why you’d bring this up, but the statement isn’t entirely accurate. My library fully supports Unicode, not just UTF-8.

The key distinction is that std.unicode only supports Codepoints, while my library adds Grapheme Cluster support, which is crucial for correct string processing.
Working with std.unicode at the Codepoint level can lead to incorrect results when dealing with complex characters (e.g., emoji sequences or modifier characters).
My library simplifies this significantly.

Example:

const txt = "Aأ你🌟☹️👨‍🏭@";
var iterator = try unicode.Iterator.init(txt);
while (iterator.nextGraphemeCluster()) |grapheme_cluster| {
    std.debug.print("[{s}]\n", .{grapheme_cluster});
}

// Output:
// [A]
// [أ]
// [你]
// [🌟]
// [☹️]
// [👨‍🏭]
// [@]

With std.unicode, achieving the same result would require manual processing, making things far more complex.

4. “I would prefer if Unicode handling was like `std.unicode` rather than custom string types”

This depends on the intended goal. std.unicode provides basic Unicode utilities, but it does not offer the kind of structured text handling that my library does.

std.unicode focuses on individual Codepoints, but real-world text often requires Grapheme Cluster awareness (especially for emoji, accented characters, and complex scripts).
If I relied solely on std.unicode, developers would still need to manually handle Grapheme Clusters, whereas my library provides this functionality out of the box.

Conclusion:

I did not reinvent the wheel—I improved Unicode handling in a practical way that std.unicode lacks.
My library provides direct support for Grapheme Clusters, making text processing easier and more accurate.
While std.ArrayList is great, my library is designed specifically for efficient string handling.
I’m open to aligning some terminology with std conventions where it improves compatibility.

Again, I appreciate your feedback! If you have further thoughts, I’m happy to discuss.

vulpesx · February 9, 2025, 5:47am

I overlooked the grapheme support, that is good. But it
still only supports one Unicode encoding, I would not describe that as full Unicode support. Again, supporting more encodings would be a selling point.

I wasn’t talking about using code points when saying I prefer std.unicode. I was talking about the container agnostic nature of its API.

I would prefer functions that operate on a slice, instead of being forced to use a container type that provides no other benefit.

maysara-elshewehy · February 9, 2025, 5:57am

I appreciate the discussion and your insights!

Regarding container-agnostic APIs:
My library already provides functions that operate directly on slices via utils.bytes and utils.unicode. Containers exist only to offer additional safety and memory management, but they are not required for working with strings.
Regarding Unicode support:
The library fully supports UTF-8, which is the dominant encoding today and the default in Zig. If you believe that supporting other encodings like UTF-16 or UTF-32 is necessary, could you share specific use cases where that would be useful?

My goal is to provide a safe, efficient, and unified way to handle text—assuming text is 8-bit byte sequences allows for a clean and robust implementation.

If you have concrete examples of missing functionality, I’d love to hear them and see if they fit within the scope of this library.

Thanks again for your feedback!

const-void · February 9, 2025, 6:44am

The library fully supports UTF-8 , which is the dominant encoding today and the default in Zig. If you believe that supporting other encodings like UTF-16 or UTF-32 is necessary, could you share specific use cases where that would be useful?

WinAPI WriteConsoleW requires utf-16 strings…sadly for us all.

Dok8tavo · February 9, 2025, 9:58am

The m_ prefix convention is an artifact of the hungarian notation in C that C++ and Microsoft carried to this day. This is terrible in my opinion.

In Zig, all fields are public by design. If you want to tweak the internal representation of an instance bit by bit, you always can and it’s valid.

By storing the capacity in the other field and the length in the slice, you allow Zig to add bound-checking on what would be uninitialized memory or garbage. Following your approach, accessing uninitialized bytes would still technically result in unwanted behavior (in any build mode), but following the std approach will result in safety checked illegal behavior.

I would strongly prefer a .bytes and a .capacity fields.

tgirod · February 9, 2025, 12:33pm

Hey, out of curiosity: do you use a LLM in any way when you post on this forum?

maysara-elshewehy · February 9, 2025, 2:00pm

Yes, after writing some of my responses, I occasionally ask an AI for improvements. In the end, the goal is to provide more precise and clear answers.

Do you see any issue with using available tools to enhance discussions?

Dynamic string done right

1. Following std conventions (std.ArrayList)

2. Why create new types instead of using std.mem, std.unicode, and std.ArrayList?

3. “Saying the library supports Unicode is ambiguous”

4. “I would prefer if Unicode handling was like std.unicode rather than custom string types”

Conclusion:

1. Following `std` conventions (`std.ArrayList`)

2. Why create new types instead of using `std.mem`, `std.unicode`, and `std.ArrayList`?

4. “I would prefer if Unicode handling was like `std.unicode` rather than custom string types”