Pointer aliasing and reading/writing from/to overlapping memory, is this safe?

castholm · March 10, 2024, 4:37pm

This question requires a bit of context.

As an exercise, I am trying to optimize std.process.argsAlloc down to a single allocation that is exactly as small as it needs to be on WASI.

On WASI, the way you get the args for the current process is by first calling args_sizes_get(&argc, &argv_buf_size) to get the number of arguments and the total size of all argument strings (null terminators included), then allocating the necessary buffers and finally calling args_get(argv, argv_buf) to write the args and the strings to the provided buffers.

In Zig we would prefer to return a slice of slices [][:0]u8 over a slice of many-pointers [][*:0]u8. Because slices are fat pointers consisting of both a pointer and a length, this means that we need to allocate a bit extra memory. We still want to only allocate precisely the amount of memory necessary.

My idea is to allocate a (properly aligned) chunk of memory that will be used like this:

s = slice
m = many-pointer
argc = 3

                                 1
 0 1 2 3 4 5 6 7 8 9 a b c d e f 0 1 2 3 4 5 6 7 8 9 a b c d e f ...
+---------------+---------------+--------------------------------
|s[0]           |s[1]           |s[2]           |
+---------------+-------+-------+-------+------------------------
|                       |m[0]   |m[1]   |m[2]   |
+-----------------------+-------+-------+------------------------
|                                               |<string data>
+----------------------------------------------------------------

args_get will write its argv values starting at offset 0xc. Once written, we iterate over both the slice view and the many-pointer view and turn the pointers into slices:

for (args_slice, args_many) |*dst, src| {
    dst.* = mem.span(src);
}

About halfway through, we will start overwriting the data of the many-pointer view, but because we have already processed those elements this isn’t a problem.

My question is about the final iteration, however, where dst and src will point to overlapping regions of memory. Is this operation well-defined? Or is it undefined behavior that might result in corrupted values (perhaps depending on certain optimizations)?

I have tested the code and it produces the correct results both in debug and optimized, non-safety checked builds. I’d still like to know if this code like this is sound or if there are better alternative approaches.

Full code for context:

fn allocSliceWasi(allocator: Allocator) (Allocator.Error || os.UnexpectedError)![][:0]u8 {
    var args_len: usize = undefined;
    var buf_len: usize = undefined;
    switch (os.wasi.args_sizes_get(&args_len, &buf_len)) {
        .SUCCESS => {},
        else => |err| return os.unexpectedErrno(err),
    }
    if (args_len == 0) return &.{};
    const args_slice_bytes_len = @sizeOf([:0]u8) * args_len;
    const args_many_bytes_len = @sizeOf([*:0]u8) * args_len;
    const raw_len = args_slice_bytes_len + buf_len;

    const raw = try allocator.alignedAlloc(u8, @alignOf([:0]u8), raw_len);
    errdefer allocator.free(raw);

    const args_slice_bytes_start = 0;
    const args_slice = @as([*][:0]u8, @ptrCast(@alignCast(raw.ptr + args_slice_bytes_start)))[0..args_len];
    const args_many_bytes_start = args_slice_bytes_len - args_many_bytes_len;
    const args_many = @as([*][*:0]u8, @ptrCast(@alignCast(raw.ptr + args_many_bytes_start)))[0..args_len];
    const buf_start = args_slice_bytes_len;
    const buf = raw[buf_start..];
    switch (os.wasi.args_get(args_many.ptr, buf.ptr)) {
        .SUCCESS => {},
        else => |err| return os.unexpectedErrno(err),
    }
    for (args_slice, args_many) |*dst, src| {
        dst.* = mem.span(src);
    }
    return args_slice;
}

dimdin · March 10, 2024, 6:04pm

I don’t see any problem.

The source of the library function std.mem.copyForwards is:

pub fn copyForwards(comptime T: type, dest: []T, source: []const T) void {
    for (dest[0..source.len], source) |*d, s| {
        d.* = s;
    }
}

The only stated precondition is:

If the slices overlap, dest.ptr must be <= src.ptr.

You have the same way of copying with copyForwards and the precondition always holds (even for the last one, s[2].ptr < m[2].ptr).