Modifying strings

jmafc · March 3, 2026, 4:18am

There are some things in Zig that continue to elude my understanding. Here’s a simple problem: define two strings/char arrays, like “hello” and “world”, and a third one with 3 exclamation marks; concatenate them into an array, then change the first character of the latter to a capital letter, literally, by assigning ‘H’ to the zeroth element.

The documentation, in test_arrays.zig has something very similar, up to the point of concatenation, e.g.,

const hello = "hello";
const world = "world";
const hello_world = hello ++ " " ++ world;

All that’s missing is the 3 bangs, but that’s not essential to the problem. The issue is all of that happens, by definition of the ++ operator, at comptime and although the data types are all inferred they are constant arrays of null-terminated u8’s, so there appears to be no way to tell the compiler that you want hello_world to be variable (I haven’t tried it, perhaps a @constCastmay let you sneak one by, but most likely not reliably).

So the only option seems to be to declare another variable, as an undefined vararray of u8with a predefined length of, presumably, hello.len + 1 + world.len + bangs.lenand then using @memcpyto copy the individual words at the right positions. Is there a better way?

Perhaps my example appears contrived and in a sense it is, but I’m pretty sure one can find similar situations, e.g., an application that has to construct a query from data entered on a screen.

jmafc · March 3, 2026, 4:41am

If I may add, still on the issue of strings, is there anything comparable to C++ std::string find() method, or does have to rely on a third-party library for that?

tholmes · March 3, 2026, 4:48am

You mostly have it right.
A function I used to use a lot is std.fmt.bufPrint(), since it makes it brain-dead easy to write multiple values into a string in a sensible manner.

As for string-matching, there are a lot of good functions in std.mem:

std.mem.eql
std.mem.find
std.mem.findScalar or std.mem.sliceTo (for individual values)
std.mem.order

andrewrk · March 3, 2026, 5:01am

In general, this is an invalid string operation. One cannot simply mutate a string such that the first letter becomes capitalized. Show me a function that supposedly does this in any language and I’ll show you a string that behaves incorrectly.

jmafc · March 3, 2026, 5:10am

std::string word {"hello"};
word[0] = 'H';

Note that word is allocated on the stack, and would not perform any memory allocations. I cannot see how that could behave incorrectly, since the same thing could have been done by assigning “Hello”. Also, if you really want bounds checking, you could replace the second line with word.at(0) = ‘H’.

But perhaps you literally mean show me a function rather than some code that could exist in main. For that I guess I could argue that the std::string replace() methods could be used.

tholmes · March 3, 2026, 5:24am

I think his point was something like “memory-wise you’re allowed to do this, but it’s a bad idea, which is why Zig makes it so hard to do”.

jmafc · March 3, 2026, 5:32am

But it’s not so hard to do in Zig. Going back to my example,

const hello = "hello";
const world = "world";
var hello_world: [hello.len + 1 + world.len]u8 = undefined;
@memcpy(&hello_world, hello ++ " " ++ world);
hello_world[0] = 'H';

The “hard” part was allocating the right-size array and copying the correct characters in sequence (and it would’ve been harder if the string weren’t literals). Mutating the first character was trivial.

alanza · March 3, 2026, 5:41am

my understanding was that this is an observation about unicode. (sorry for the reply, andrew)

alanza · March 3, 2026, 5:43am

but, also, if we’re insisting on mutating buffers of bytes which we are treating as strings, may i introduce you to your new friends std.ArrayList(u8) and the useful and generic functions in std.mem

jmafc · March 3, 2026, 5:49am

Yes, that’s what I was thinking later too, but unless the mutated first character occupies more or fewer bytes then it shouldn’t be a problem, even if using an auxiliary function equivalent to C’s toupper. A c converted to Cshould just be one byte being replaced by another. If you were to replace it by a Ç, then you would be in trouble.

jmafc · March 3, 2026, 5:58am

The mutation of a character was simply an extra complication, but not the main issue. The issue is primarily the ease of use in concatenating strings in C++ vs. Zig, finding strings within strings (@tholmes mentioned findScalar above, but it seems that’s only in master ATM), erasing substrings, and other stuff like that. As far as my “new friends” I haven’t yet taken a look at ArrayListor much of the stdlibrary yet.

alanza · March 3, 2026, 6:01am

well, you’ll be glad to know that concatenating strings is a fantastic use case for std.ArrayList!

morezig · March 3, 2026, 6:04am

For the comptime strings you can use comptimePrint.

And for the mutation, you can use utility function like this:

const std = @import("std");
const comptimePrint = std.fmt.comptimePrint;

pub inline fn copyToStack(comptime str: [:0]const u8) [str.len:0]u8 {
    comptime {
        var buf: [str.len:0]u8 = undefined;
        @memcpy(&buf, str);
        buf[buf.len] = 0;
        return buf;
    }
}

pub fn main() void {
    var str = copyToStack(comptimePrint("{s}, {s}!!!", .{"hello", "world"}));
    str[0] = 'H';
    std.debug.print("{s}\n", .{str}); // Hello, world!!!
}

But I think using the proper unicode string library is the best way to handle strings.

andrewrk · March 3, 2026, 8:16am

Here’s your string: "ßad code"

lacc97 · March 3, 2026, 9:32am

Is this more or less what you are looking for?

const hello = "hello";
const world = "world";
var hello_world = (hello ++ " " ++ world).*;
hello_world[0] = 'H';

String literals are typed as pointers to const byte arrays so you can just dereference them as arrays. This works for comptime-known strings of course (or at least strings of comptime-known length). For other cases you would either need a fixed size buffer and memcpy into it or use `std.mem.concat`.

Edit: to be clear, doing this on arbitrary strings can easily mangle UTF8 as others have pointed out but if these are all strings you control (e.g. to construct formatting strings) this is fine.

jmafc · March 3, 2026, 12:39pm

Yes, but I already pointed out that if we’re mutating a character that “occupies more or fewer bytes” then you could be in trouble. Your example is sneaky because what looks like a capital ‘b’ is actually the German eszett, which just happens to look like it. But what we’re talking about then is mutating UTF-8 strings in general. If I were to code that in C or C++ using wide-strings and called towupper(), the assignment would be fine (and would not give me an eszett).

jmafc · March 3, 2026, 12:47pm

Yes, that is probably the succinctest solution in Zig for the given problem and without using@memcpy.

mnemnion · March 3, 2026, 3:52pm

Yes, that is what we’re talking about.

It’s almost always a mistake to try to mutate within a string, neither UTF-8 nor UTF-16 properly supports this. Even if it did, the special case where the mutation happens to be exactly the size of the replaced part is just that, a weird special case.

Ergo, string building should not be approached this way. The question is about “modifying strings”, an extremely complex topic once you get into it, but right at the baseline: don’t do it by copying the entire string and changing individual bytes. It just doesn’t cover enough of the domain to be useful.

dupdrop · March 3, 2026, 4:51pm

...
var hello_world = (hello ++ " " ++ world).*;

I was surprised to learn you can just “dereference away the const” like that. Is there an explanation in the docs for how this works?

It does make sense that it could work for comptime known values, but when reading code I think me and many others would think “surely, I can’t modify a string living in readonly memory” - which is true at runtime, but at compile-time it seems Zig gives you additional abilities.

Sze · March 3, 2026, 4:59pm

You don’t dereference away a const, the const is a pointer to an array, dereferencing just allows you to make a copy of that array.