Understanding when passing a pointer to a struct is needed and when its not

I was solving this problem in Zig and came up with the below function signature (I’m not sure of its correctness):

fn insertInterval(alloc: std.mem.Allocator, intervals: []Interval, new_interval: *Interval) ![]Interval {
    // implementation
}

I tried declaring the third function parameter (new_interval: *Interval), without a pointer, but running unit tests gave errors.

I was able to get around by changing things based on errors, but dont understand it clearly:

  • Why does the third parameter have to be new_interval: *Interval instead of new_interval: Interval

  • Why does the second parameter intervals: []Interval work without a pointer.


Also wanted to know if there’s a better way to go about it ?

Implementation & Tests
const std = @import("std");
const testing = std.testing;
const test_alloc = testing.allocator;

const Interval = struct {
    start: u16,
    end: u16,
};

fn insertInterval(alloc: std.mem.Allocator, intervals: []Interval, new_interval: *Interval) ![]Interval {
    var periods: std.ArrayListUnmanaged(Interval) = .empty;

    for (0..intervals.len) |i| {
        if (new_interval.end < intervals[i].start) {
            try periods.append(alloc, new_interval.*);
            new_interval.* = intervals[i];
        } else if (new_interval.start > intervals[i].end) {
            try periods.append(alloc, intervals[i]);
        } else {
            new_interval.*.start = @min(new_interval.start, intervals[i].start);
            new_interval.*.end = @max(new_interval.end, intervals[i].end);
        }
    }

    try periods.append(alloc, new_interval.*);

    return periods.toOwnedSlice(alloc);
}

test "new interval overlaps with the first interval" {
    var initial = [2]Interval{
        .{ .start = 1, .end = 3 },
        .{ .start = 6, .end = 9 },
    };

    var new = Interval{ .start = 2, .end = 5 };

    const expected = [2]Interval{
        .{ .start = 1, .end = 5 },
        .{ .start = 6, .end = 9 },
    };

    const result = try insertInterval(test_alloc, &initial, &new);
    defer test_alloc.free(result);

    try testing.expectEqualSlices(Interval, &expected, result);
}

test "new interval overlaps with three current intervals" {
    var initial = [5]Interval{
        .{ .start = 1, .end = 2 },
        .{ .start = 3, .end = 5 },
        .{ .start = 6, .end = 7 },
        .{ .start = 8, .end = 10 },
        .{ .start = 12, .end = 16 },
    };

    var new = Interval{ .start = 4, .end = 8 };

    const expected = [3]Interval{
        .{ .start = 1, .end = 2 },
        .{ .start = 3, .end = 10 },
        .{ .start = 12, .end = 16 },
    };

    const result = try insertInterval(test_alloc, &initial, &new);
    defer test_alloc.free(result);

    try testing.expectEqualSlices(Interval, &expected, result);
}

See:

Arguments are immutable. If you wish to mutate a value passed as an argument, you must pass a pointer to the value instead. Then you can mutate the dereferenced value.

the obvious answer then as to why you don’t need a *[]Interval for intervals is because you’re not mutating it. However, a slice is a pointer (and a length), and you would be able to modify the values pointed to by the slice. You would only require a *[]Interval to modify the slice itself (the length value and where the pointer points to).

3 Likes

Also see:

When you pass &initial as an argument, you are coercing the array to a slice.

1 Like

Just to clarify, []Interval is a pointer type (a slice is a pointer + a length).

2 Likes

Q: When is passing a pointer to a struct needed?
A1: When you need to modify it to be used in a higher scope.
A2: When you need to make a comparison using its address. E.g. say you wanted to know if the given *Interval is in the []Interval slice (by reference, not value).

Q: When don’t you need to pass a pointer to a struct?
A1: You only care about its value, not its reference. E.g. if you only needed to know if there was an existing Interval in an []Interval, but nothing is modified.
A2: You want to make a copy, or you want to ensure that no changes are made to the locally scoped parameter.

In addition: when it is a big struct and you are only interested in the value use:
*const Interval. That’s what I do.

Or not? What is the state of affairs around “passing by value or reference optiimization”?

I think it’s always safe to just pass the struct by value if you only want to read its fields and don’t need their original addresses. If the compiler decides that it’s too large to copy around it will just use a reference anyways.

In this little test the compiler even turns bsByValue into a call to bsByConstRef:

const BigStruct = extern struct {
    header: [100]u64,
    important_data: [100000]u32,

    pub fn byValue(bs: BigStruct, header_idx: usize) u32 {
        const data_idx = bs.header[header_idx];
        return bs.important_data[data_idx];
    }
    pub fn byConstRef(bs: *const BigStruct, header_idx: usize) u32 {
        const data_idx = bs.header[header_idx];
        return bs.important_data[data_idx];
    }
};

extern var big_struct: BigStruct;

export fn bsByValue(header_idx: usize) u32 {
    return big_struct.byValue(header_idx);
}

export fn bsByConstRef(header_idx: usize) u32 {
    return big_struct.byConstRef(header_idx);
}

(godbolt)

I personally only use *const ... when I need to read a value in the struct from its original address and a copy wouldn’t produce correct code (for example *const std.atomic.Value(usize)).

That way intent is always clear:

  • fn (t: T) → only needs to read raw value
  • fn (t: *const T) → needs to read value from its original address
  • fn (t: *T) → will potentially modify t
1 Like

But often it cannot do this because this reference might then alias something else. For example someone could call a function like fn insertInterval(alloc: Allocator, intervals: []Interval, new_interval: Interval) ![]Interval with a new_interval that is an element of the passed intervals slice. If new_interval is passed by value, it is a copy and remains constant in the function even if the function changes the elements of the slice. But if the compiler secretly replaces it with a reference, then the value of new_interval will suddenly change during the function and the behavior of the function might change.

I think this pass-by-reference-optimization will only be done for “pure” functions in the future (according to eliminate hidden pass-by-reference footguns · Issue #5973 · ziglang/zig · GitHub), so for big structs it is a good idea to pass them by reference (*const Interval) explicitly, maybe also with noalias.

1 Like