There are some things in Zig that continue to elude my understanding. Here’s a simple problem: define two strings/char arrays, like “hello” and “world”, and a third one with 3 exclamation marks; concatenate them into an array, then change the first character of the latter to a capital letter, literally, by assigning ‘H’ to the zeroth element.
The documentation, in test_arrays.zig has something very similar, up to the point of concatenation, e.g.,
All that’s missing is the 3 bangs, but that’s not essential to the problem. The issue is all of that happens, by definition of the ++ operator, at comptime and although the data types are all inferred they are constant arrays of null-terminated u8’s, so there appears to be no way to tell the compiler that you want hello_world to be variable (I haven’t tried it, perhaps a @constCastmay let you sneak one by, but most likely not reliably).
So the only option seems to be to declare another variable, as an undefined vararray of u8with a predefined length of, presumably, hello.len + 1 + world.len + bangs.lenand then using @memcpyto copy the individual words at the right positions. Is there a better way?
Perhaps my example appears contrived and in a sense it is, but I’m pretty sure one can find similar situations, e.g., an application that has to construct a query from data entered on a screen.
If I may add, still on the issue of strings, is there anything comparable to C++ std::string find() method, or does have to rely on a third-party library for that?
You mostly have it right.
A function I used to use a lot is std.fmt.bufPrint(), since it makes it brain-dead easy to write multiple values into a string in a sensible manner.
As for string-matching, there are a lot of good functions in std.mem:
In general, this is an invalid string operation. One cannot simply mutate a string such that the first letter becomes capitalized. Show me a function that supposedly does this in any language and I’ll show you a string that behaves incorrectly.
Note that word is allocated on the stack, and would not perform any memory allocations. I cannot see how that could behave incorrectly, since the same thing could have been done by assigning “Hello”. Also, if you really want bounds checking, you could replace the second line with word.at(0) = ‘H’.
But perhaps you literally mean show me a function rather than some code that could exist in main. For that I guess I could argue that the std::string replace() methods could be used.
The “hard” part was allocating the right-size array and copying the correct characters in sequence (and it would’ve been harder if the string weren’t literals). Mutating the first character was trivial.
but, also, if we’re insisting on mutating buffers of bytes which we are treating as strings, may i introduce you to your new friends std.ArrayList(u8) and the useful and generic functions in std.mem
Yes, that’s what I was thinking later too, but unless the mutated first character occupies more or fewer bytes then it shouldn’t be a problem, even if using an auxiliary function equivalent to C’s toupper. A c converted to Cshould just be one byte being replaced by another. If you were to replace it by a Ç, then you would be in trouble.
The mutation of a character was simply an extra complication, but not the main issue. The issue is primarily the ease of use in concatenating strings in C++ vs. Zig, finding strings within strings (@tholmes mentioned findScalar above, but it seems that’s only in master ATM), erasing substrings, and other stuff like that. As far as my “new friends” I haven’t yet taken a look at ArrayListor much of the stdlibrary yet.
const hello = "hello";
const world = "world";
var hello_world = (hello ++ " " ++ world).*;
hello_world[0] = 'H';
String literals are typed as pointers to const byte arrays so you can just dereference them as arrays. This works for comptime-known strings of course (or at least strings of comptime-known length). For other cases you would either need a fixed size buffer and memcpy into it or use `std.mem.concat`.
Edit: to be clear, doing this on arbitrary strings can easily mangle UTF8 as others have pointed out but if these are all strings you control (e.g. to construct formatting strings) this is fine.
Yes, but I already pointed out that if we’re mutating a character that “occupies more or fewer bytes” then you could be in trouble. Your example is sneaky because what looks like a capital ‘b’ is actually the German eszett, which just happens to look like it. But what we’re talking about then is mutating UTF-8 strings in general. If I were to code that in C or C++ using wide-strings and called towupper(), the assignment would be fine (and would not give me an eszett).
It’s almost always a mistake to try to mutate within a string, neither UTF-8 nor UTF-16 properly supports this. Even if it did, the special case where the mutation happens to be exactly the size of the replaced part is just that, a weird special case.
Ergo, string building should not be approached this way. The question is about “modifying strings”, an extremely complex topic once you get into it, but right at the baseline: don’t do it by copying the entire string and changing individual bytes. It just doesn’t cover enough of the domain to be useful.
I was surprised to learn you can just “dereference away the const” like that. Is there an explanation in the docs for how this works?
It does make sense that it could work for comptime known values, but when reading code I think me and many others would think “surely, I can’t modify a string living in readonly memory” - which is true at runtime, but at compile-time it seems Zig gives you additional abilities.