Copying slices

I’m revisiting Zig by trying to update some very simple programs. I had originally tried it out in 2023 using version 0.10.1, but now I’m using 0.15.2. The base code, in C++, excluding prompting and other not relevant output, was this:

std::string name;
std::cin >> name;
std::cout >> name >> '\n';

An analogous C version would be something like this (excluding any error checking, of course):

char name[30];
scanf("%s", name);
printf("%s\n", name);

The Zig 0.10 version, which I believe worked fine back then, was somewhat as follows:

var name: [30]u8 = undefined;
const size = try stdin.read(&name);
if (size < name.len) {
    const line = std.mem.trimRight(u8, name[0..size], "\n\r");
    std.mem.copy(u8, &name, line);
}
try stdout.print("{}\n", .{name[0 .. size - 1]});

My current 0.15.2 version, is as follows:

var stdout_buffer: [1024]u8 = undefined;
var stdout_writer = std.fs.File.stdout().writer(&stdout_buffer);
const stdout = &stdout_writer.interface;

var stdin_buffer: [80]u8 = undefined;
var stdin_reader = std.fs.File.stdin().reader(&stdin_buffer);
const stdin = &stdin_reader.interface;

var name: [30:0]u8 = undefined;
const line = try stdin.takeDelimiterExclusive('\n');
const str = std.mem.trim(u8, line, " \n\r");
for (str, 0..) |c, i| {
    name[i] = c;
}

try stdout.print("{s}\n", .{name[0..str.len]});
try stdout.flush();

I understand the new I/O requirements and I’ve watched Andrew’s “Don’t forget to flush!”, but all the extra code is not my issue. What I struggled with was with copying from str to name. At first, it looked at std.membut there is no longer a copy() function there. It does have a copyForward() but it suggests using @memmovebecause the former is deprecated. Using @memcpy(name[0..str.len], str)does compile, but it gives a runtime error because “source and destination arguments have non-equal lengths”. So that’s why I ended up implementing a strncpy()-like loop by hand. I would think there must be an easier way, but I wasn’t able to find it.

A secondary issue, which was present in the 0.10 version also, is that to print just the nameand not any garbage leftover in the buffer, the variable has to be sliced using str.len (or size in the earlier code). In the actual test program, the name is printed a few times, so that’s also inconvenient (I sort of “fixed it” by declaring a constfor that slice).

1 Like

For your first issue:
There are multiple ways to copy some bytes. In the non-overlapping(noalias) case you can use @memcpy. If the memory regions do overleap you have copyForwards and copyBackwards in std.mem. depending on what should be overwritten in what order. In you case @memcpy should be enough.

For the second one you could maybe use std.mem.sliceTo to get the slice to the \0. Or maybe (I haven’t tested it) if you had a sentinel terminated slice this could also work automatically.

1 Like

Both your issues are just not accounting for the length of the data
You didn’t have to do that is c as that uses null termination and the buffer was filled with 0.
And cpp just handled it for you.

But copying into name is unnecessary in this example, you are forgetting that the reader has its own buffer and line is a slice into it. As long as you don’t read from it before your last use of line it is stable.

Since that won’t always be possible, here is an example:

const line = try stdin.takeDelimiterExclusive('\n');
var name_buf: [30]u8 = undefined;
const name = name_buf[0..@min(name_buf.len, line.len)];
@memcpy(name, line);
try stdout.print("{s}\n", .{name});

the triming of \n\r is unecessary since \n and anything after is not in the bounds of line. Its also not correct, you meant \r\n, in which case you would have to trim the \r

Lastly, I think you misunderstand what [n:0]u8 = undefined does. It does not fill the array with 0 rather it is filled with undefined data, with an extra 0 at the end. There was a bug that caused the trailing sentinel to also be undefined I can’t remember if it’s resolved in 0.15 or only on master.

Regardless, you can [n:0]u8 = @splat(0) to initialise with all 0. Or you can insert the 0 after you know the length of the data

const line = try stdin.takeDelimiterExclusive('\n');
var name_buf: [30:0]u8 = undefined;
const len = @min(name_buf.len, line.len);
// with sentinel terminated arrays/slices
// you are allowed to index up to and including the length
// which is where the sentinel is
name_buf[len] = 0; 
// creates a `[:0]` terminated slice, but **does not insert the sentinel**
// it only asserts that it is already there.
const name = name_buf[0..len :0]; 
@memcpy(name, line);
try stdout.print("{s}\n", .{name});

zigs printing logic does not consider sentinel termination unless no length information is available. So the length still needs to be correct. Or they pass a [*:0]u8 which has no length information.

10 Likes

The base C++ code was part of a larger introductory program of about 30 lines. Two other numeric variables were gotten first from the user, then the text string and finally some text displayed with all three. In Zig, I used var lineto deal with the numbers and then with the name input. I was aware that I could use the linevariable in the final text output, but it didn’t seem appropriate since in the final output that identifier would look out of place (I could use an alias but it didn’t seem reasonable as a “translation” of fhe C++ or C versions).

In a more complete or realistic program, it seems, in Zig, each variable would likely have to be dealt with separately, since in C++, std::cin >> var not only accepts the input but also trims leading spaces and if needed converts the characters to a numeric type.

No, originally that was defined as var name: [30]u8 = undefined. The sentinel notation was added later as part of various iterations trying to figure out what could and could not be done. I did insert a 0 after knowing str.len but it didn’t affect the printing because print("{s}"), unless explicitly told so, will happily print everything in that buffer, including non-visible NUL bytes, valid characters added manually after the 0 (or 0’s) and all the undefined bytes.

I think the “trick” that does it is that assignment to namefrom name_bufusing @min(), before being able to use @memcpy().
Thanks.

I figured, I just wanted to point out you don’t always need to copy it into a longer lived buffer.

It just needs a correct length, the separate assignment is unnecessary but convenient, you could instead do the slicing each time.

I have a slightly different problem now. In C++, there are two std::string variables, say var1 and var2 each of which is read with std::cin >> var and then compared in various ways to string literals, e.g., if (var1 == “abc” && var2 == “xyzzy). I tried doing something like this in Zig:

var var1: []const u8 = undefined;
var var2: []const u8 = undefined;
// ... set up stdin interface
// ... prompt for var1 using stdout
var1 = try stdin.takeDelimiterExclusive('\n');
stdin.toss(1);
// ... prompt for var2
var2 = try stdin.takeDelimiterExclusive('\n');
stdin.toss(1);
if (std.mem.eql(u8, var1, "abc" and std.mem.eql(u8, var2, "xyzzy"))
   // do something

The “do something” never happens if the correct values are entered. The problem appears to be that, although var1has the value “abc” before the prompt for var2, prior to the if, var1 has the value “xyz”, and retains the previous length. I could use the approach given previously, i.e., null-terminate an intermediate line buffer and then use a slice with a 0-sentinel in order to @memcpy() into var1, but that doesn’t look like the correct approach.

I realize that in C++, there could be a hidden memory allocation using the strings, but due to the small string optimization that probably never happens in 80% of the actual usage. Therefore, var1 and var2 should probably be declared as, say, [80]u8 arrays, but then I do have to copy from the stdin buffer into the variables and I think it should be possible to do that without using with null-termination.

Yes, the issue is the stdin buffer data doesn’t live long enough. You can either copy the data into another buffer that does live long enough, or ensure there is enough space in the stdin buffer. If the buffer is large enough, then all you should need to do stdin.rebase() move the data to the start of the buffer to ensure that doesn’t happen automatically, which would invalidate your slices.

Sentinel termination is very unnecessary, even in the previous approach you mentioned. Sentinels are only useful when the data may not always have its length available.

What has been frustrating is that the Zig code compiled and ran without any compile or runtime errors, yet it failed to work as expected, and I could only figure out what was wrong by using std.debug.printstatements (I now tried using gdbbut with start it complained that Function “main” not defined., had to use nmto figure out the real name for main; lldbwas even worse, since it couldn’t even print var1). So it doesn’t matter that there are no hidden allocations or no hidden control flow: one can still write wrong logic because the underlying system is not very (to use a cliché) “user-friendly”.

What is also disappointing is that Zig is still in, for lack of a better word, turmoil. Even if I were to learn well the 0.15 version of I/O, the preview of 0.16 “async” isn’t appealing. It seems that striving for the “perfect design” or “perfomance” gets in the way of providing basic abstractions similar to C++’s std::stringand std::cin’s operator>>, which can handle conversion to different variable types, skip over whitespace and even accept two (or more) variables input on the same line.

Sorry for the rant.

yeah it is a shame that logic bugs and a shifting API design are possible in every language.

2 Likes

Logic bugs definitely, and “shifting API design” may be “possible”, but major redesigns of a core subsystem from one minor release to the next is only typical in mostly “immature” languages.

For example, Go is still at version 1.x and I’m not sure if they’ll ever come out with a 2.0, and although at the beginning they had issues with builds/packaging and it took them a while to deliver generics, AFAIK it has been pretty stable for the past five years or so. C++ took a very long while to finalize C++11 but is now in a very gradual three-year improvement cycle. OTOH, although I haven’t followed Rust closely, last time I checked they were trying to revamp async wholesale (or at least some people were not happy with the existing framework).

Also, of some languages that are more or less at the same, let’s say experimental stage as Zig, e.g., Carbon, Odin and V, two of them have at least a roadmap so a newcomer can gauge the stability or have some idea of what is coming up. I couldn’t find something similar for Zig.

I’m surprised you didn’t know that Zig is unstable based on the pre-1.0 version, but I agree that something up-front mentioning this would help (if there is something, I can’t find it). There is a roadmap in the release notes that talks about the rework of the IO libraries:

1 Like

Because Zig is not at 1.0 yet, version changes from 0.14 to 0.15 are not considered minor releases. This is a very common practice and used in semver.

1 Like

I did know that it was unstable. I first looked at it almost three years ago at version 0.10. But there are varying degrees of “stability” and “breaking changes”. Famously, Python’s transition from 2.x to 3.x took about a decade. But other changes or enhancements in other languages, particularly in “standard” libraries tend not to be as disruptive, IMHO.

On the 0.15.1 roadmap, that only mentions “the 0.16 release cycle” or, I’m guessing, a period of less than a year (or maybe six months?). V, now at 0.5, had a roadmap apparently since 0.3 and expects one more (0.6) release before 1.0 [I only started looking at it, so I don’t know much more about V). Carbon also has a detailed roadmap up to 1.0 (in 2028) and beyond, although the doc is now a year old.

If it helps, one thing to know about the Zig std library is that it has, up until recently, been designed primarily to meet the needs of the compiler. The focus has been to stabilize the language, and recently it has been fairly stable. Now, there is more focus on preparing the std library for 1.0 so larger changes are being made there.

In general the Zig team is not afraid to change things prior to 1.0 if that will make it better and more stable when 1.0 does arrive, and to delay 1.0 until the designers are confident the language and std library is truly ready. Whether this is what other languages have done or not, this is very intentional. It has pros and cons; there is instability and it takes longer to complete, but the final result will be better for it. Knowing this, it is up to you whether you want to use the language prior to 1.0 or not.

This is all just my personal perception and viewpoint. I’m not a member of the Zig team.

2 Likes

Interesting take. Thanks.

OK, so now I’ve changed both the earlier example and the second one to essentially the following:

var var1_arr: [32]u8 = undefined;
// set up stdin, prompt
var line = try stdin.takeDelimiterExclusive('\n');
const var1 = var1_arr[0..line.len];
@memcpy(var1, line);
// use var1 to compare or output, as needed

Thanks again.

I have to point out the only language I know of that can prevent your logic error at compile time is rust, but it still only prevents a small subset of logic errors related to memory lifetimes and has false positives.

I also pointed out the interface buffer lifetimes, specifying you do need to copy into a buffer if you need to use the reader before you are done with the already gotten data.

Comparing an unstable language to stable languages is wildly unfair.

You’re allowed to be frustrated, but in this case your frustration is due to your own inaccurate, and unreasonable (imo) for a young unstable language, expectations.

Yes, but as I mentioned earlier, there are varying degrees of stability. You can compare the historical development of Go, Carbon, Rust and Zig and see widely different timelines with respect to start, public announcement, 1.0 release or expected release, and degree of stability after the 1.0 release.

My inaccuracies or misunderstandings are in part due to the terseness of the documentation. In particular, the section on Pointers could use clearer descriptions and examples to illustrate the differences between *T, [*]T, *[N]Tand []T, and the operator Precedence has no explanations (and it includes some “operators”, a!band x{}that are not in the Table of Operators).

“Young” depends on which other language you compare it with: Hoare started Rust in 2006 and 1.0 was published in 2015, Go was started in 2007 and 1.0 was released in 2012, Stroustrup started C with Classes in 1979 and the first C++ version came out in 1985.

I don’t think comparison with other languages makes sense, especially because those have widely different goals and also differ on what they actually deliver, for example if I applied my personal opinion on what a language should bring with it for 1.0 I could say that C++ shouldn’t be considered post 1.0 because it still doesn’t have a proper build system and instead just a mess of competing third party build systems.

But applying one language’s reasons to have certain features to some other language arbitrarily is just not a useful comparison, because those languages all also have widely different stories about who developed them in what context, with what different perspectives, resources and available talent and funding. Complaining while ignoring all those differences, seems unfair and unproductive.

I actually think pointing out things that could be better is good, but I would prefer it if it was done in the form of well described/researched and actually created issues on the issue tracker, so that it will be fixed eventually, instead of rants that will likely be mostly ignored.

2 Likes

also … if you go by all of these examples, Zig is still squarely within its “pre-1.0 deadline”?

i don’t intend to pile on, i know you’re already frustrated, and that my sometimes preferred choice of turning people’s bad vibes back at them in the same way that they emerged is not always … prosocial. but like. come on?