Coercing zero length u8 arrays to slices

I cannot understand why only line 2 fails to compile while others succeed.

            const empty_str: []u8 = &[_]u8{};
            const empty_str_z: [:0]u8 = &[_:0]u8{};
            const c_empty_str: []const u8 = &[_]u8{};
            const c_empty_str_z: [:0]const u8 = &[_:0]u8{};

The error message is:

error: expected type '[:0]u8', found '*const [0:0]u8'

But if this is true, why line 1 compiles?

1 Like

Line 2 doesn’t compile because the right side is a pointer to constant data. In order for it to compile you’d need to discard the constness:

const empty_str_z: [:0]u8 = @constCast(&[_:0]u8{});

Line 1 shouldn’t compile, but it does… I’d expect it to error out with this message:

error: expected type '[]u8', found '*const [0]u8'

Similarly to line 2, I’d expect line 1 to compile only once you discard constness:

const empty_str: []u8 = @constCast(&[_]u8{});

The first line compiles because there are no data.
If you add data in the table (e.g. a space) it fails:

error: expected type '[]u8', found '*const [1]u8'
    const empty_str: []u8 = &[_]u8{' '};
                            ^~~~~~~~~~~
test.zig:2:29: note: cast discards const qualifier
1 Like

But that feels inconsistent.
ā€œSentinelā€ doesn’t sound like data either.

I agree that it feels inconsistent, although I don’t see how the ā€œline 1ā€ edge-case can cause bugs.
Strictly speaking, sentinel is data, which has to be allocated, even though the array is empty.

Exactly that is the case; no allocation vs one byte allocation for sentinel which is not read-only.

1 Like

Sure a sentinel value will take memory.
But as far as I know there is no such thing as non-const sentinel.
Of course you can touch the memory location where sentinel value sits.
That’s not what I mean.

It looks just strange if the successful compilation of line 1 is not a bug.

Maybe I’m missing something here, but I really don’t see the use case for initializing a const as an empty slice. Doing so means you can’t assign a new slice later on or modify the existing one ever. It’s akin to
const a: u8 = undefined;, not much you can do from there on.

If it’s a var, an empty slice is only useful as a signal that there’s no data or to be replaced with a new slice later on, or to be manipulated directly via the ptr and len fields.

IMO, an empty sentinel terminated slice should always be an error. The whole idea is that the sentinel value must be there withing the data to be sliced, so by definition it shouldn’t be empty.

That would be fairly inconvenient:

test "empty sentinel" {
    const empty = "";
    // -> *const [0:0]u8
    std.debug.print("{}\n", .{@TypeOf(empty)});
    // -> 0
    std.debug.print("{d}\n", .{empty[0]});
}

The key difference between a sentinel-terminated array and a normal one, is that it’s legal to read sentinel[sentinel.len]. This is crucial for C interop.

If this is cast to a slice, then no, you can’t read anything from it anymore. But the null byte is still there, and reading it directly is harmless, and allowed in Zig as well if you cast to a sentinel terminated slice:

test "empty sentinel" {
    const empty = "";
    const empty_sentinel_slice: [:0]const u8 = empty;
    // -> [:0]const u8
    std.debug.print("{}\n", .{@TypeOf(empty_sentinel_slice)});
    // C interop version: -> 0
    std.debug.print("{d}\n", .{empty_sentinel_slice.ptr[0]});
    // Also legal Zig: -> 0
    std.debug.print("{d}\n", .{empty_sentinel_slice[0]});
    const empty_slice: []const u8 = empty;
    // error: indexing into empty slice is not allowed
    // std.debug.print("{d}", .{empty_slice[0]});
    // But this is still allowed: -> 0
    std.debug.print("{d}\n", .{empty_slice.ptr[0]});
}

So by construction, a sentinel-terminated slice is never empty: a sentinel-terminated array always has at least one accessible element, coercing it to a slice hides that element, but it’s still there, and it’s still defined behavior to read it.

If you try and make a zero-width array into a sentinel-terminated slice, that doesn’t compile:

test "truly empty sentinel?" {
    const really_empty: [0]u8 = .{};
    // error: expected type '[:0]u8', found '*const [0]u8'
    // note: destination pointer requires '0' sentinel
    const very_empty_sentinel_slice: [:0]u8 = &really_empty;
    // never gets here
    std.debug.print("? {d}\n", .{very_empty_sentinel_slice[0]});
}

I haven’t been able to find a way to make a sentinel-terminated array or slice which isn’t actually sentinel-terminated (undefined doesn’t count), and that’s good, because it would undermine the type system fairly badly to have a genuinely zero-width sentinel. I would read [0:0] as ā€œzero data, one sentinel 0ā€, the length is of the data but the sentinel is always there.

I may have missed a way of constructing an actually-zero-width sentinel, though.

3 Likes

Yes, line 1 and 2 are different in this respect, but I’m just not sure it justifies line 1 compiling and, hence, being the only place in the language where the type system allows coercing a pointer to a constant array, albeit a zero-bit one, to a mutable slice without an explicit @constCast. It’s a very minor, but still an inconsistent point that could be smoothed out for greater predictability.

Worth pointing out that it’s actually consistent with the main rule for type coercion:

Type coercions are only allowed when it is completely unambiguous how to get from one type to another, and the transformation is guaranteed to be safe.

Passes ambiguity: to go from []const to [], drop const. And passes safety: can’t index into it, so it’s as safe as any slice which you don’t do illegal things to: use the .ptr to index past .len - 1, mutate .len and then index into illegal memory, or just violate the established bounds.

I also see some utility in it: any function which takes a mutable slice of data can’t take a []const, so this is an easy way to spell ā€œempty mutable sliceā€ without having to ask for zero bytes of allocation (I’ve done this but it seems like a pointless inefficiency and I’ll probably do it this way next time).

This pattern is much safer than nullable pointers in C, imho, since ā€œdo something .lenā€ times is the natural shape of algorithms taking slices, and when .len is zero it just gets skipped, so no need for a guard clause which might be forgotten. A ?[]T makes sense, but only when the function has a semantics for the null parameter variant: sometimes it’s the calling context which finds itself needing to call a function, but with no data for one of the parameters. A ?[]T is the same size as a []T, so it really is about what makes sense for the function.

I do take your point, and if there’s any mutable slice of T around already, it’s better to take [0..0] of that slice, easier to read for sure. But I’m not convinced this is bad necessarily. Whether it’s more mental surface area or less kinda depends on if Zig can stick to the ā€œone ruleā€ philosophy of type coercion, which it has so far.

2 Likes

Don’t know if it’s ā€œofficiallyā€ idiomatic, but I’ve seen this quite frequently done like this:

var empty_str: []u8 = &.{};
2 Likes

This is just as usable from Zig, and you can pass it to something which needs a mutable slice, and that something will handle it safely most of the time.

What it doesn’t have is a sentinel terminator, so for C interop it’s a no bueno:

test "empty slice" {
    var empty_str: []u8 = &.{};
    _ = &empty_str;
    std.debug.print("{d}\n", .{empty_str.ptr[0]});
}

This segfaults, because it’s no bytes of data with an invalid pointer: using "" gives one byte of data with a valid pointer, but a zero-length written into the slice.

I had to check this, but the ā€˜implicit const-cast for zero slice’ thing works with "" as well:

test "'mutable' sentinel" {
    var empty_sentinel: []u8 = "";
    _ = &empty_sentinel;
    // -> 0;
    std.debug.print("{d}\n", .{empty_sentinel.ptr[0]});
}

I would prefer to use this personally, mostly because it looks nicer. It can spare some trouble when C gets involved, but as a principle, if we go passing Zig slices to C without ensuring a terminal zero with the type system, bad times will ensue. So I’d suggest to our general audience to stick with [:0] slices if C interop is part of the program: taking a [] slice of a constant string doesn’t remove the 0 from the end, but it removes it from the type system, which gets dangerous.

1 Like