Set pointer to null

I wrote some iterator with a pointer like this:

ptr: ?[*]const Node,

get the next:

    pub fn next(self: *NodeIterator) ?*const Node {
        if (self.ptr) |p| {
            defer {
                if (!p[0].is_last) self.ptr.? += 1 else self.ptr = null;
            }
            return @ptrCast(p);
        }
        return null;
    }

Is it possible to do this without nullable? So with this:

ptr: [*]const Node

And a nice and legal way to set my ptr to “nothing”?

Or otherwise a nicer way to do it.
I don’t like that defer statement.

It seems that the isLast check is inexpensive (on par with null check), so I’d suggest making ptr non-nullable and check if the end has been reached on each call to next. I’d write it out as follows:

ptr: [*]const u32,
end: [*]const u32,

pub fn next(self: *NodeIterator) ?*const u32 {
    if (self.ptr == self.end) return null;
    defer self.ptr += 1;
    return @ptrCast(self.ptr); // or &self.ptr[0];
}

The defer statement here does the post-increment (see Zig defer Patterns).
In general, storing one-past-the-end pointers in iterators is common practice and is simpler as the mechanism for breaking at the last element is not needed.

Impossible. is_last is a little bitflag indicating “i am the last in this sequence” so we do not know end in advance. Otherwise that would be nice.

In any case, the defer here is unnecessary since p is constant and will not change until the return statement.

You could do this:

maybe_ptr: ?[*]const Node,

pub fn next(self: *NodeIterator) ?*const Node {
    const ptr = self.maybe_ptr orelse return null;
    self.maybe_ptr = if (ptr[0].is_last) null else ptr + 1;
    return &ptr[0];
}
2 Likes

No. [*] are, by design, not allowed to be null. Note that this doesn’t cost you anything, as optional pointers are special cased such that null is 0. You could point your pointer to some other address, like 1, to signify null, but you would not get anything out of this, only lose, as checking for 0 is faster than checking for any other number.

1 Like

Ok i like the orelse. Still I find - in general - actions with nullable’s a bit awkward - syntactal.

        const p = self.ptr orelse return null;
        if (!p[0].is_last) self.ptr.? += 1 else self.ptr = null;
        return @ptrCast(p);

That is why I collapsed it down into a single assignment to the optional, that uses an if expression.

1 Like

i very much the orelse though :slight_smile:

1 Like
const p = self.ptr orelse return null;
const result = &p[0];
self.ptr = if (result.is_last) null else p + 1;
return result;

Important Zig idioms:

  • Avoid .?
  • Avoid @ptrCast

Edit: oops, @Sze beat me to it!

3 Likes

Yes! That ? was bothering me as well!
Just use p

edit: i did not know this was possible… now i do.

const result = &p[0];

You could also store prev (non-optional) instead of next.

const result = if (self.ptr[0].is_last) return null else &self.ptr[1];
self.ptr = result;
return result;
1 Like

But the trouble with that is that you would need a pre-sentinel (sentinel node that is before the first node) that is in contiguous memory before the start of the array, which has .is_last set to false, otherwise the first element of the list is never yielded.

With linked lists the iterator could hold a dummy node that points to the first element and with that it would work, but I don’t know how to make it work with contiguous arrays, except having an extra element at the beginning.

If the to be iterated list has some sort of header struct, it might be possible to design the header in such a way that its memory layout happens to work as that dummy node.

You could store two nodes in the iterator like this:

curr: [*]const Node,
prev: *const Node,

pub fn next(self: *NodeIterator) ?*const Node {
    if (self.prev.is_last) return null;
    const result = &self.curr[0];
    self.prev = result;
    self.curr += 1;
    return result;
}

and start your iteration like this:

const never_last_node = Node{ .is_last = false };

pub fn init(node: *const Node) NodeIterator {
    return .{
        .curr = @ptrCast(node),
        .prev = &never_last_node,
    };
}

Then all the pointer arithmetic depends on curr and prev can point anywhere you want on the first iteration.
On the last iteration curr might point to an invalid memory location but that shouldn’t be a problem since you never dereference or return it.

1 Like

Philosofically it is maybe incorrect that this ptr is an optional while it never is null.
But it leads to the shortest and probably fastest code.

// optional
const ChildIterator1 = struct {
    ptr: ?[*]const Node,

    fn init_or_null(nodes: *const Nodes, parent: *const Node) ?ChildIterator1 {
        if (parent.childptr == 0) return null;
        return ChildIterator1 {
            .ptr = nodes.items[parent.childptr..].ptr,
        };
    }

    pub fn next(self: *ChildIterator1) ?*const Node {
        const p = self.ptr orelse return null;
        self.ptr = if (p[0].is_last) null else p + 1;
        return &p[0];
    }
};

// non-optional
const ChildIterator2 = struct {
    ptr: [*]const Node,
    end: bool,

    fn init_or_null(nodes: *const Nodes, parent: *const Node) ?ChildIterator2 {
        if (parent.childptr == 0) return null;
        return ChildIterator2 {
            .ptr = nodes.items[parent.childptr..].ptr,
            .end = false,
        };
    }

    pub fn next(self: *ChildIterator2) ?*const Node {
        if (self.end) return null;
        const p = self.ptr;
        self.end = p[0].is_last;
        self.ptr = p + 1;
        return &p[0];
    }
};

It is null once iteration stops.
I haven’t compared speed, but I like it because the code seems simpler, it also needs less memory.

I find the naming of

confusing when it is an index, I think child or child_id or child_idx would be better, or maybe even first_child, that way it would make more sense in combination with is_last.

Also I don’t think your init_or_null method makes a lot of sense, it should just be init and always return the ChildIterator (because I don’t see a good reason to make empty into a special case when it could just be the empty sequence, where the iterator’s next yields null immediately):

    fn init(nodes: *const Nodes, parent: *const Node) ChildIterator1 {
        return .{
            .ptr = if (parent.first_child == 0) null else nodes.items[parent.first_child..].ptr,
        };
    }

That way the usage of the iterator becomes simpler:

var it: ChildIterator1 = .init(nodes, parent);
while(it.next()) |node| { ... }

Using the else with while to do something once the iterator is done, is also useful.

If you need to write separate code for nodes that don’t have children, I think the intent would be more clear with something like parent.hasChildren() and branching on that, instead of making the iterator creation conditional (and less clear because it serves different purposes (iterator creation vs child-existence-check)).

Thanks. Yes, I was not 100% satisfied with the childptr name.
At the moment of writing or_null seemed the clearest to me. Not sure yet!
It reminds me of the std hash where there was (i believe) some ...OrNull and the name was changed.

In general I like one-liners

True but I think more cpu time. I need every inch of that.

But you also very right. the pre-check in my code node.has_chidren() is probably better.

this

// Try rack.
... else if (rack.len > 0)
{
    var iter = self.graph.get_child_iterator_or_null(node) orelse return;
    while (iter.next()) |child| {} // do my thing
}

i could change to

// Try rack.
... else if (rack.len > 0 and node.has_children())
{
    var iter = self.graph.get_child_iterator(node);
    while (iter.next()) |child| {} // do my thing
}

That being said: if there are no children the init() function should crash when there are no children :slight_smile:

One thing you could do here if you know that your nodes are always align>1 is storing is_last in the LSB of ptr to half the size of the non-optional ChildIterator version like this:

const ChildIterator2 = struct {
    ptr: [*]align(1) const Node,

    fn init_or_null(nodes: *const Nodes, parent: *const Node) ?ChildIterator2 {
        if (parent.childptr == 0) return null;
        return ChildIterator2 {
            .ptr = nodes.items[parent.childptr..].ptr,
        };
    }

    pub fn next(self: *ChildIterator2) ?*const Node {
        if ((@intFromPtr(self.ptr) & @as(usize, 1)) == 1) return null;
        const p = self.ptr
        self.ptr = @ptrFromInt(@intFromPtr(p + 1) | @as(usize, @intFromBool(p[0].is_last)));
        return @alignCast(&p[0]);
    }
};

I’m not sure if this could mess with the prefetcher in a bad way (maybe somebody here knows more about such things) but if not this might be faster than the optional version because there is only one very predictable branch per next() call and it takes up the same amount of memory.