What are sentinals?

I was reading the sentinal termination from zig.guide, bt the only thing I understood was that sentinal terminated array ends with the value I as the dev put at the end and a many item sentinal [*:0]const u8 is preffered over many item pointer [*]const u8 (Using [*:0]const u8 and [*]const u8 as examples). So I needed more help with explanations of the differences. Also some additional topics I need to clarify on was that could I append items to a many item sentinal, How it would work on a data type like a Union or Struct (I only got u8 as examples) and how optionals work with the concept. I also worked on ziglings nut I did not understand from it either.

2 Likes

Some links to other threads that might be helpful:

2 Likes

In C programming language, it is common to terminate arrays of integers or characters with 0, and arrays of pointers with null.
C strings are pointers to characters, that cannot contain 0, and there is a 0 marking the end of the string (0 terminated strings).

Sentinels are not useful in zig. It is a feature that makes sense only for C interfacing.

Zig string constants are actually zero terminated arrays and the only reason for that is C API usability.
For example:

const c = @cImport(@cInclude("stdio.h"));

pub fn main() void {
    const s: [:0]const u8 = "Hello World\n";
    _ = c.puts(s.ptr);
}

Run the example using: zig run example.zig -lc
C library puts function, prints a zero terminated string.
This works because "Hello World\n" is a zero terminated array.
s is a zero terminated slice (pointer and size) and s.ptr is the pointer to the zero terminated string.

See also: Conversions between slices and pointers with and without sentinel

2 Likes

I like your explanation but I have a doubt, if you suggest sentinals are inherently useless and it is only for c interoperability, then why is it used all over in the standard library. One example I remembered from the top of my head is the std.process.argsAlloc(); and also another use case for it is suggested by the documentation and guide itself, that being safe usage with multi item pointers.

For a follow up question can we add items to a many pointers sentinal like this:

var items: [*:0]const u8 = "Hello";
items[5] = '!';

Sentinels are used in the standard library because those functions ultimately need to talk with the OS in the form of C-like API’s that take and receive 0-terminated strings. For example, on most OSes (notably Linux and macOS (but not Windows)), the arguments passed to your process are 0-terminated, which is why argsAlloc returns a [][:0]u8. And for standard library functions that must ultimately pass a 0-terminated string to the OS, taking a parameter of type [:0]const u8 instead of []const u8 prevents needing to dupe the input string to append the sentinel before passing it on to the OS.

5 Likes

hello

I use [*;0] when I use pcre2posix.h or when I pass parameters to bash, etc. Otherwise, I don’t use this notation.

The reason is that there is a better way.
C strings are pointers that don’t know their size, they depend on the 0 terminator to calculate the size.
Zig strings know their size because they are slices with pointer and size.

@castholm explains their use in the zig library
Sentinels are also used in Windows as utf16 zero terminated strings ([*:0]const u16).

1 Like

items is a pointer that is variable, but it points to read-only characters (const u8).
Assigning a value to items is a compilation error, because of this const.
If zig compiler allowed you to assign to the string literal, your program will crash since the string literal contents are placed in a read only memory section.

Now lets assume there is no const:

var items: [*:0]u8 = undefined;

The declaration means that the variable items is a pointer to multiple u8, zero terminated.
You may allocate memory and assign it to items.
In this case you must maintain the 0 terminator.
Zig does not know its size, and to derive the size someone must count the characters up to 0.

1 Like

Nitpick: this isn’t true. Sentinel termination is still helpful for saving memory, or for simplifying code, in some cases. The Zig compiler itself uses null-terminated strings extensively – see InternPool.NullTerminatedString.

1 Like

Yes it is obvious that “Sentinels are not useful in zig” is not true, since the feature is there anyone can use it and it is useful. But its introduction in zig have nothing to do with zig, the reason is C interfacing.

Thank you for pointing to the InternPool. It is a really interesting intern case: having an arena for storage and the index for reference.
I am not sure if NullTerminatedString is only used for identifiers, if that is the case, a single byte for length in the head will be more useful that a 0 terminator at the end, but it is limiting the identifier size to 255.

Yes, but you can also do it this way:

const c = @cImport(@cInclude("stdio.h"));

pub fn main() void {
    _ = c.puts("Hello World\n");
}

Because the compiler always creates a null-terminated array and passes the pointer to the C function.

1 Like

Yes, that’s correct.

puts declaration is:

pub extern fn puts(s: [*c]const u8) c_int;

and "Hello World\n" type is [12:0]const u8.

Now I was just playing around a bit and found something that I can’t explain:

const std = @import("std");
const c = @cImport(@cInclude("stdio.h"));

pub fn main() !void {
    var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    defer arena.deinit();
    const allocator = arena.allocator();

    const cstr = "Hello world.";

    const str = try allocator.alloc(u8, 20);
    std.mem.copyForwards(u8, str, cstr);

    std.debug.print("{s}\n", .{str});

    str[cstr.len + 2] = '!';
    str[16] = '!';

    std.debug.print("{s}\n", .{str});
    _ = c.puts(str.ptr);
}
Hello world.
Hello world.!!
Hello world.!!

Any idea?

Allocations are initialized to undefined, which is 0xaa in Debug/ReleaseSafe.
In hex, your string consists of the bytes 48 65 6c 6c 6f 20 77 6f 72 6c 64 2e aa aa 21 aa 21 aa aa aa. It’s possible that your terminal is ignoring non-ASCII characters and/or invalid UTF-8 sequences.

1 Like

My terminal output:

Hello world.��������
Hello world.��!�!���
Hello world.��!�!���

When I run your example I get this in my terminal (20 characters):

Hello world.��������
Hello world.��!�!���
Hello world.��!�!���

I am guessing your terminal decides not to print the garbage characters.
When I initialize the str with zero like this:

const str = try allocator.alloc(u8, 20);
for (str) |*d| d.* = 0;
std.mem.copyForwards(u8, str, cstr);

I get:

Hello world.
Hello world.!!
Hello world.

Because Zig print prints the slice, where C prints until the first zero. (zero isn’t displayed by the terminal)

If I change it to:

const str = try allocator.alloc(u8, 20);
std.mem.copyForwards(u8, str, cstr);
str[cstr.len] = 0;

I get (19 characters for the first 2):

Hello world.�������
Hello world.�!�!���
Hello world.

The copyForwards doesn’t copy the sentinel value, so by setting it explicitly, the c code stops right after the Hello world., but Zig prints the slice of length 20 with 1 zero byte in it which isn’t shown by the terminal.

1 Like

this is valid in C /C++ Nim Pascal ect…
You must not reason like that, Practice passing parameters with bash or bat

I don’t understand, what is valid in these languages?
Can you rephrase your comment?

I was providing an example, with some explanations.


Ahh I think you are replying to @chrboesch ?

1 Like

Yes ,

this is to say that if we use other languages it is the same thing, that Zig is correct.

that as soon as we go through bash or bat we have the same problem

1 Like