Coercing zero length u8 arrays to slices

I cannot understand why only line 2 fails to compile while others succeed.

            const empty_str: []u8 = &[_]u8{};
            const empty_str_z: [:0]u8 = &[_:0]u8{};
            const c_empty_str: []const u8 = &[_]u8{};
            const c_empty_str_z: [:0]const u8 = &[_:0]u8{};

The error message is:

error: expected type '[:0]u8', found '*const [0:0]u8'

But if this is true, why line 1 compiles?

1 Like

Line 2 doesnā€™t compile because the right side is a pointer to constant data. In order for it to compile youā€™d need to discard the constness:

const empty_str_z: [:0]u8 = @constCast(&[_:0]u8{});

Line 1 shouldnā€™t compile, but it doesā€¦ Iā€™d expect it to error out with this message:

error: expected type '[]u8', found '*const [0]u8'

Similarly to line 2, Iā€™d expect line 1 to compile only once you discard constness:

const empty_str: []u8 = @constCast(&[_]u8{});

The first line compiles because there are no data.
If you add data in the table (e.g. a space) it fails:

error: expected type '[]u8', found '*const [1]u8'
    const empty_str: []u8 = &[_]u8{' '};
                            ^~~~~~~~~~~
test.zig:2:29: note: cast discards const qualifier
1 Like

But that feels inconsistent.
ā€œSentinelā€ doesnā€™t sound like data either.

I agree that it feels inconsistent, although I donā€™t see how the ā€œline 1ā€ edge-case can cause bugs.
Strictly speaking, sentinel is data, which has to be allocated, even though the array is empty.

Exactly that is the case; no allocation vs one byte allocation for sentinel which is not read-only.

1 Like

Sure a sentinel value will take memory.
But as far as I know there is no such thing as non-const sentinel.
Of course you can touch the memory location where sentinel value sits.
Thatā€™s not what I mean.

It looks just strange if the successful compilation of line 1 is not a bug.

Maybe Iā€™m missing something here, but I really donā€™t see the use case for initializing a const as an empty slice. Doing so means you canā€™t assign a new slice later on or modify the existing one ever. Itā€™s akin to
const a: u8 = undefined;, not much you can do from there on.

If itā€™s a var, an empty slice is only useful as a signal that thereā€™s no data or to be replaced with a new slice later on, or to be manipulated directly via the ptr and len fields.

IMO, an empty sentinel terminated slice should always be an error. The whole idea is that the sentinel value must be there withing the data to be sliced, so by definition it shouldnā€™t be empty.

That would be fairly inconvenient:

test "empty sentinel" {
    const empty = "";
    // -> *const [0:0]u8
    std.debug.print("{}\n", .{@TypeOf(empty)});
    // -> 0
    std.debug.print("{d}\n", .{empty[0]});
}

The key difference between a sentinel-terminated array and a normal one, is that itā€™s legal to read sentinel[sentinel.len]. This is crucial for C interop.

If this is cast to a slice, then no, you canā€™t read anything from it anymore. But the null byte is still there, and reading it directly is harmless, and allowed in Zig as well if you cast to a sentinel terminated slice:

test "empty sentinel" {
    const empty = "";
    const empty_sentinel_slice: [:0]const u8 = empty;
    // -> [:0]const u8
    std.debug.print("{}\n", .{@TypeOf(empty_sentinel_slice)});
    // C interop version: -> 0
    std.debug.print("{d}\n", .{empty_sentinel_slice.ptr[0]});
    // Also legal Zig: -> 0
    std.debug.print("{d}\n", .{empty_sentinel_slice[0]});
    const empty_slice: []const u8 = empty;
    // error: indexing into empty slice is not allowed
    // std.debug.print("{d}", .{empty_slice[0]});
    // But this is still allowed: -> 0
    std.debug.print("{d}\n", .{empty_slice.ptr[0]});
}

So by construction, a sentinel-terminated slice is never empty: a sentinel-terminated array always has at least one accessible element, coercing it to a slice hides that element, but itā€™s still there, and itā€™s still defined behavior to read it.

If you try and make a zero-width array into a sentinel-terminated slice, that doesnā€™t compile:

test "truly empty sentinel?" {
    const really_empty: [0]u8 = .{};
    // error: expected type '[:0]u8', found '*const [0]u8'
    // note: destination pointer requires '0' sentinel
    const very_empty_sentinel_slice: [:0]u8 = &really_empty;
    // never gets here
    std.debug.print("? {d}\n", .{very_empty_sentinel_slice[0]});
}

I havenā€™t been able to find a way to make a sentinel-terminated array or slice which isnā€™t actually sentinel-terminated (undefined doesnā€™t count), and thatā€™s good, because it would undermine the type system fairly badly to have a genuinely zero-width sentinel. I would read [0:0] as ā€œzero data, one sentinel 0ā€, the length is of the data but the sentinel is always there.

I may have missed a way of constructing an actually-zero-width sentinel, though.

3 Likes

Yes, line 1 and 2 are different in this respect, but Iā€™m just not sure it justifies line 1 compiling and, hence, being the only place in the language where the type system allows coercing a pointer to a constant array, albeit a zero-bit one, to a mutable slice without an explicit @constCast. Itā€™s a very minor, but still an inconsistent point that could be smoothed out for greater predictability.

Worth pointing out that itā€™s actually consistent with the main rule for type coercion:

Type coercions are only allowed when it is completely unambiguous how to get from one type to another, and the transformation is guaranteed to be safe.

Passes ambiguity: to go from []const to [], drop const. And passes safety: canā€™t index into it, so itā€™s as safe as any slice which you donā€™t do illegal things to: use the .ptr to index past .len - 1, mutate .len and then index into illegal memory, or just violate the established bounds.

I also see some utility in it: any function which takes a mutable slice of data canā€™t take a []const, so this is an easy way to spell ā€œempty mutable sliceā€ without having to ask for zero bytes of allocation (Iā€™ve done this but it seems like a pointless inefficiency and Iā€™ll probably do it this way next time).

This pattern is much safer than nullable pointers in C, imho, since ā€œdo something .lenā€ times is the natural shape of algorithms taking slices, and when .len is zero it just gets skipped, so no need for a guard clause which might be forgotten. A ?[]T makes sense, but only when the function has a semantics for the null parameter variant: sometimes itā€™s the calling context which finds itself needing to call a function, but with no data for one of the parameters. A ?[]T is the same size as a []T, so it really is about what makes sense for the function.

I do take your point, and if thereā€™s any mutable slice of T around already, itā€™s better to take [0..0] of that slice, easier to read for sure. But Iā€™m not convinced this is bad necessarily. Whether itā€™s more mental surface area or less kinda depends on if Zig can stick to the ā€œone ruleā€ philosophy of type coercion, which it has so far.

2 Likes

Donā€™t know if itā€™s ā€œofficiallyā€ idiomatic, but Iā€™ve seen this quite frequently done like this:

var empty_str: []u8 = &.{};
2 Likes

This is just as usable from Zig, and you can pass it to something which needs a mutable slice, and that something will handle it safely most of the time.

What it doesnā€™t have is a sentinel terminator, so for C interop itā€™s a no bueno:

test "empty slice" {
    var empty_str: []u8 = &.{};
    _ = &empty_str;
    std.debug.print("{d}\n", .{empty_str.ptr[0]});
}

This segfaults, because itā€™s no bytes of data with an invalid pointer: using "" gives one byte of data with a valid pointer, but a zero-length written into the slice.

I had to check this, but the ā€˜implicit const-cast for zero sliceā€™ thing works with "" as well:

test "'mutable' sentinel" {
    var empty_sentinel: []u8 = "";
    _ = &empty_sentinel;
    // -> 0;
    std.debug.print("{d}\n", .{empty_sentinel.ptr[0]});
}

I would prefer to use this personally, mostly because it looks nicer. It can spare some trouble when C gets involved, but as a principle, if we go passing Zig slices to C without ensuring a terminal zero with the type system, bad times will ensue. So Iā€™d suggest to our general audience to stick with [:0] slices if C interop is part of the program: taking a [] slice of a constant string doesnā€™t remove the 0 from the end, but it removes it from the type system, which gets dangerous.

1 Like