Pointer comparisons

nyc · April 22, 2024, 1:12pm

I’m moving through an array by pointer and stride and trying to detect when I’m past the end.

test "sdf" {
    var arr = [4]u32{ 1, 2, 3, 99 };
    var ptr: [*]u32 = &arr;
    const end = ptr + 3;

    var s: u32 = 0;
    while (ptr < end) { // Here errors on < not allowed
        s += ptr[0];    // not the real code on this line
        ptr += 2;
    }
    try std.testing.expect(s == 4);
}

(This isn’t the actual code, just an abbreviated version).

Moving by pointer generates better code than by index in performance critical paths, and I also need the address, not just the value.

dude_the_builder · April 22, 2024, 1:20pm

Yeah, multi-item pointers only support addition or subtraction, no other ops. But you can do what you want by converting them to ints and then compare:

while (@as(usize, @intFromPtr(ptr)) < @as(usize, @intFromPtr(end))) ...

or

while (true) {
    const ptr_int: usize = @intFromPtr(ptr);
    const end_int: usize = @intFromPtr(end);
    if (ptr_int >= end_int) break;
...

NOTE: @castholm clarifies (see below) that @intFromPtr always returns usize so the following is sufficient:

while (@intFromPtr(ptr) < @intFromPtr(end)) ...

nyc · April 22, 2024, 2:05pm

Thank you. Seems like a lot of friction for just a pointer comparison. The provenance is pretty trivial too if that is a worry. Is there a reason for this? I could understand not comparing single item pointers, but multi-item pointers what would seem rather natural. Addition assumes and ordering already.

chung-leong · April 22, 2024, 2:20pm

I don’t think so. Modern compilers will optimize this for you. To wit:

int sum(int* numbers, int count) {
    int total = 0;
    for (int i = 0; i < count; i++) {
        total += numbers[i];
    }
    return total;
}

Output from gcc at -O2:

sum:
        test    esi, esi
        jle     .L4
        movsx   rsi, esi
        xor     eax, eax
        lea     rdx, [rdi+rsi*4]
.L3:
        add     eax, DWORD PTR [rdi]
        add     rdi, 4
        cmp     rdi, rdx
        jne     .L3
        ret
.L4:
        xor     eax, eax
        ret

As you can see, the end address is calculated at the beginning of the loop (.L3). During each iteration only an addition is performed.

nyc · April 22, 2024, 2:22pm

the complex addressing mode ([mem+reg+reg] or [mem+reg+reg*imm]) in the lea instruction requires and extra 1-2 cycles per access and has to go the an ALU first. You only get simple [mem+reg] for free on the load port.

This is true for x64, but not necessarily anything else (there is some exotic hardware that has some crazy addressing mode and specialized hardware to handle them)

castholm · April 22, 2024, 3:26pm

@intFromPtr always returns usize so this can be reduced to

while (@intFromPtr(ptr) < @intFromPtr(end)) {}

samuel-fiedler · April 23, 2024, 6:31pm

I think that this @intFromPtr() may seem uncommon to people coming from something like C. But look at the output of zig zen:

Communicate intent precisely.
Edge cases matter.
Favor reading code over writing code.
Only one obvious way to do things.
Runtime crashes are better than bugs.
Compile errors are better than runtime crashes.
Incremental improvements.
Avoid local maximums.
Reduce the amount one must remember.
Focus on code rather than style.
Resource allocation may fail; resource deallocation must succeed.
Memory is a resource.
Together we serve the users.

The @intFromPtr thing addresses at least two things: “communicate intent precisely” and “reduce the amount one must remember”.
“Communicate intent precisely”: Pointers are meant to point to something (that’s literally the name) and numbers are meant to be operated with. Pointers are not intended to be compared. If you want to compare pointers, compare them as an integer because… well… integers are intended to point to something.
“Reduce the amount one must remember”: As C developers, we have to keep in our mind that the pointer variable is actually a pointer and that we can compare pointers, while in Zig we have only to keep in mind that we can compare numbers (integers) generated from @intFromPtr.
I recommend you a talk with Andrew Kelley about Zig (that’s around 5 years old, but still contains many true things): https://youtube.com/watch?v=Gv2I7qTux7g.

nyc · April 24, 2024, 6:07pm

It completely hides intention. It turns a pointer into an int and then you turn it back again. Passing it around as an int completely hides what is really is.

Pointer math is already defined for many-item pointers (ptr + int, ptr - int, ptr - ptr). Those implies an ordering aleady, so ptra < ptrb should be allowed. I hit this last week where I was striding through an array in chuncks ptr += 4 and testing when to terminate the loop.

pub fn next(s: *@This()) ?B.Out {
    const ptr_int: usize = @intFromPtr(s.ptr);
    const end_int: usize = @intFromPtr(s.ptr + s.len);
    while(ptr_int  < end_int) {
        const item_ptr: *B.In = @ptrFromInt(ptr_int);
        if(item_ptr.* == needle)
            return item_ptr.*;
        ptr_int += @sizeOf(B.In) * s.stride;
    }
    return null;
 }

I now have to deal with sizeof and other stuff that I destroys the simple meaning of the code. Now you always know about the functions intFromPtr and ptrFromInt and you have to know that int you are passing around is really a pointer (and don’t forget to convert it back). And if you store it as an Int in a struct to avoid all the casting then you have to really document and remember it.

pub fn next(s: *@This()) ?B.Out {
    var ptr = s.ptr;
    const end = s.ptr + s.len;
    while(ptr  < end) {
        if(ptr.* == needle)
            return ptr.*;
        ptr_int += s.stride;
    }
    return null;
 }

The second is way more clear of what my intentions are. Multi-item pointers clearly have an ordering. ptr < ptr+1 is always true except with wrap around but that could be tested for the same way as with int. It turns a trivial very well defined piece of code into a casting mess that obscures everytrhing else.

AndrewCodeDev · April 25, 2024, 2:25am

I agree with this.

In my opinion, it’s good that Zig has a pointer-taxonomy where different pointers have different characteristics and semantics associated with them. I don’t think that phrasing every problem in the language of unsigned ints is practical, more clear, or safer.

If you find that you’re doing a lot of odd and asymmetric math to get the result you want, consider a different approach. I found this was especially true when dealing with decrementing pointers and having to deal with unsigned values approaching zero.

Yes, you can do saturation operations in that specific direction but… why bother? They aren’t safer or more performant and imo introduce cognitive overhead to what is otherwise a very simple problem.

If you just need to compare pointers, I think this suggestion is perfectly sufficient and works well with the pointer taxonomy:

@intFromPtr(ptr) < @intFromPtr(end)

If you are moving pointers and need to introduce functionality, consider using a different tool for the job.

dude_the_builder · April 25, 2024, 12:05pm

I was just thinking, although I agree with you and recommend you open an issue (you make a very strong case); in this specific example, I think that a clearer solution that works is

while (ptr != end) {

nyc · April 25, 2024, 5:16pm

The step size is greater than 1, you can go past end without actually touching it. The only other way is to do a mod and calculate the value you will hit, that’s taking a performance hit in code I’m trying to shave cycles on.

dude_the_builder · April 25, 2024, 9:22pm

Totally forgot that detail. Bummer.