Zig []const u8 string to C string [:0]const u8 - 0.14.1

My input is a Zig string, and I want to use it in a C function.
I saw quite a few variations on this basic question, but none seem to fit 0.14.1.

Hardcoded is ok. With:

const newPath: [:0]const u8 = "my-path.jpg"`

I can do:

 m.MagickReadImage(wand, newPath.ptr)

But I want to use a runtime variable path: []const u8 and produce this [:0]const u8.

I can iterate over an allocated buf and copy path and add 0 at the end. But I have a []u8. I looked into a few std.fmt functions without success so far.

Going from non-sentinel to sentinel is the direction of friction because that’s one more byte that’s just not accounted for in the slice. It might not even exist for all we know. There may be more ways to solve this, but here are the 3 solutions I know:

  1. Bite the bullet and allocate a new sentinel-terminated slice. The Allocator interface has a friendly dupeZ method.
  2. Assume the zig string already ends in a null character and force it. This invokes a runtime panic if you’re wrong. Example usage:
const slice = &[_]u8{ 'a', 'b', 'c', 0 };
const sentinel: [:0]const u8 = slice[0 .. slice.len - 1 :0];
  1. Simply use sentinel-terminated strings upstream.

I don’t know the exact context of your Zig string, but when in doubt, #1 is the safest choice. Also, if you use an arena for scratch space and reset it for this exact operation, it’s not a bad way to go in my opinion.

If you’re reading from stdin, you can perform some buffer trickery to swap out the byte of your delimiter (newline, presumably) with a null character. It’s a little hacky. :upside_down_face:

4 Likes

While extremely common, not all C functions assume zero-terminated strings.

There’s plenty of C code around that takes pointer and length, in order to be less vulnerable to buffer overflow attacks.

1 Like

Ah, thanks! so I had:

try allocator.alloc(u8, path.len+1);
defer allocator.free(buf);
for (buf, 0..) |*c, i| {
        if (i < path.len) {
            c.* = path[i];
        } else {
            c.* = 0;
        }
    }
const npath = buf[0..path.len :0];

but this seemed incredibly difficult,
but when I see dupeZ, it is pretty similar although better :slight_smile:
However, is std.Allocator.dupeZ the namespace?

The context is embeded Zig code using Imagemagick in Elixir via Zigler: it sends a non null terminated string (the path) to be consumed.

However, is std.Allocator.dupeZ the namespace?

dupeZ is a method on Allocator, which lives in the std.mem namespace: zig/lib/std/mem/Allocator.zig at master · ziglang/zig

I’d be curious if you’re able to use a null-terminated slice, even with embedded Zig. Is it Elixir calling Zig? However, the only time I’ve called Zig from external code, it’s been from C#. I’ve had no issues passing [*:0]u8 (or [*:0]u16 for UTF16 encoding) from the .NET side with a byte* (or char*), but that’s a different tech stack. If Elixir uses the C ABI for interop, I wonder if you can do something like that too. I know next to nothing about Elixir. :sweat_smile:

This is, of course, the right approach. Unfortunately not all commonly used C libraries accept ptr+len. Postgres’ libpq is one big culprit, last time I looked.

I tried to concatenate ‘0’ to the string on the Elixir side but it does not produce a sentinel, just another string. So indeed std.mem.Allocator.dupeZ.

The C interop in Elixir/Erlang uses NIF. Then the library Zigler lets you consume directly Zig code as a “NIF”. Quite powerful and makes C become nice.

1 Like

An '0' or an '\0'?

One of these is byte value of 48, the other is byte value of 0.

'\0' character: a u8 with the value zero.

Yes, hence the question.
I was ensuring that this was the value they were using to null-terminate, as they stated that they used '0' (without slash) and it “does not produce a sentinel, just another string.”
Adding a '\0' should provide a proper sentinel value, unless Elixer is doing something odd that I am unaware of when marshaling the value, which seems unlikely.