Sentinel or not sentinel?

Hello ziggers !

I’m experimenting with zig, and I don’t understand well what is happening under the hood here :

var buf: [64]u8 = undefined;
std.mem.copyForwards(u8, &buf, "Hello world!");
std.debug.print("-{s}- ({d})\n", .{&buf, buf.len});
// -Hello world!- (64)
  • I wonder why there’s no garbage after “Hello World!”. Does zig add a sentinel at some point of the process? Is my buf zeroed on initialization? Is that a guaranteed behaviour ?
  • is there a simple way to get my string length (ie until what looks like a sentinel) ?

Thank you

There is the garbage ( 0xaa bytes in Debug mode) it’s just not printable. Add 2>&1 | xxd (assuming you have vim installed) to your program invocation and see.

1 Like

Hey !

Should I understand it’s the compilation in debug mode which initialize my buffer with 0xaa ? Are there any circumstances my buffer could contain real garbage ? Or am I certain 0xaa will terminate my “Hello World!” under all circumstances ?

No you can’t depend on this, that is intended for debugging purposes :grin:

You should always think so in any case when undefined is used. Value is undefined and if your program depends on this your program is broken

Allright, thank you for your explanations !

1 Like

I think string literals are 0-terminated.

Even easier is just to use std.fmt.fmtSliceEscapeLower like so:

std.debug.print("-{s}- ({d})\n", .{
    std.fmt.fmtSliceEscapeLower(&buf),
    buf.len,
});

which makes the output (in Debug mode):

-Hello world!\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa- (64)
2 Likes

They are, but the copyForwards function has the following signature

pub fn copyForwards(comptime T: type, dest: []T, source: []const T) void

which coerces the source string literal to a slice, where the 0-terminator does not count towards the length (as seen in the output provided by @squeek502). So the 0-terminator is not copied to the output buffer.

3 Likes

Thanks, I should’ve checked!

Could be easier, but less visual I think. Also, xxd doesn’t require any code changes, and I think that helps to avoid additional confusion.

Little follow-up to swenninger’s post: even if \0 somehow ended up at the end of written string, the {s} formatting specifier when used with slices will result in printing the entire slice, including \0 and undefined bytes.

3 Likes

I wonder if {s} should do this by default, and formatting a buffer as is would be something else to avoid this common problem where terminal would hide this kind of a bug from the programmer.

This would have more drawbacks than positives IMO:

  • If std.fmt.fmtSliceEscapeLower was the default, it would output UTF-8 as escapes, e.g. "€" would be printed as \xe2\x82\xac
  • This could be gotten around by using std.unicode.fmtUtf8 instead, but then control characters would be printed unescaped, so, for example, printing "hello\x00\x00\x00" would still show up as just hello
  • There could be a combination of fmtSliceEscapeLower and fmtUtf8 that lets UTF-8 through but escapes ASCII control characters, but escaping control characters by default is not really desirable either. For example, let’s say you wanted to print ANSI escape sequences:
const color_green = "\x1b[32m"; 
const reset = "\x1b[0m";
std.debug.print("{s}abc{s}\n", .{ color_green, reset });

With default escaping, this would print \x1b[32mabc\x1b[0m instead of abc colored green.

3 Likes