Iterating a linked list of optional pointers

Question:

How does one iterate a (single) linked list structure and print the data of its members, when the members are linked through optional pointers like below?

Context:

Newbie to Zig here, I am trying to implement a member function which prints a single linked list by iterating over its elements.

I have the following structs, where Bord is a “list” of multiple Vakje structs:

pub const Vakje = struct {
    next: ?*Vakje = null,
    data: u8 = 0,

    pub fn addNext(vak: *Vakje, new: *Vakje) void {
        if (vak.next == null) {
            vak.next = new;
            std.debug.print("{s}\n", .{"added next!"});
        }
    }
};

pub const Bord = struct {
    name: []const u8 = undefined,
    first: ?*Vakje = null,
}

Most member functions are left out. I then tried to iterate over a Bord struct’s internal Vakje structs in some ways, which seems to not work properly (code most likely not useful for reader, can provide if wanted).
I have tried to adapt the linked list code from the zig std library to no avail;
https://github.com/ziglang/zig/blob/f9f894200891c8af6ce3a3ad222cd0bf1ee15587/lib/std/linked_list.zig#L56

  /// Iterate over each next node, returning the count of all nodes except the starting one.
  /// This operation is O(N).
  pub fn countChildren(node: *const Node) usize {
      var count: usize = 0;
      var it: ?*const Node = node.next;
      while (it) |n| : (it = n.next) {
          count += 1;
      }
      return count;
  }

I think this is due to a lack of understanding how optional pointers work. Unfortunately I have not been able to find an example, and something along the lines of https://ziggit.dev/t/iterating-optional-error-unions-i-e-t/5217
is above my current level of understanding, and does not seem to be the same question.

Question:

How does one iterate a (single) linked list structure and print the data of its members, when the members are linked through optional pointers like above?

I don’t understand how you intend to access the child elements with only the child pointer, you would at least need something like a next pointer that points to the next sibling from the first child.

Are you trying to create a tree structure or single linked list?
Is child used as first child and then next sibling in the later Vakje instances?

I guess the part that is confusing me is that Vakje can only have 0 or 1 children, unless that child node isn’t used somehow to point to a sibling.

1 Like

Hello @MeepMeep welcome to ziggit :slight_smile:

If child is the next item in the list, you can count the elements with the following function:

fn count(bord: Bord) usize {
    var next: ?*Vakje = bord.first;
    var result: usize = 0;
    // if next is null, while stops, otherwise vakje is the not null pointer
    while (next) |vakje| {
        // vakje type is *Vakje, we can access the members directly
        next = vakje.child;
        result += 1;
    }
    return result;
}
1 Like

Sorry for the confusion, as this is a pet project it went over my head that I was using confusing terminology :sweat_smile: .

My intention is to create a single linked list, taking heavy inspiration from here, in the std.
This means Bord is the surrounding struct containing Vakje structs, which point in one direction to each other. Bord keeps the first pointer as member variable, and we start traversing from there.

I have updated my original post, hopefully it is now more clear.

1 Like

Thanks @dimdin !
1. If I understand correctly, the payload capture is *Vakje and not ?*Vakje type as the while-condition guarantees that the captured payload is not a null pointer, yes?

While this answer helps, I would like to shamefully try to ask a bit more, as my original goal was to print the data of each vakje struct while iterating as well. At this point an MVE/MVP would probably be better forum etiquette, but I hope this passes too. In return I’ll try my best to update the question for feature forum readers when we find an answer :slight_smile:

Shamefully asking for more

My main function looks like this;

pub fn main() !void {
    var bord = Bord{ .name = "standaard bord" };
    _ = &bord;

    const values = [_]u8{1, 2, 3};

    for (values) |vv| {
        std.debug.print("{c}", .{vv});
    }

    bord.addFromArray(&values);
    std.debug.print("result: {d}", .{bord.count()});
}

Side question
2. Why does this code here not print the array elements. {c} seems to be a valid formatting, but what does it represent?
Initialising the board
The addFromArray member function then adds these values as vakje structs to a board, like so;

 pub fn addFromArray(bord: *Bord, vakjes: []const u8) void {
    if (vakjes.len > 0) {
        var vakje = Vakje{ .data = vakjes[0] };
        bord.first = &vakje;
    }
  
    var it = bord.first;
  
    for (vakjes[0 .. vakjes.len - 1]) |element| {
        var vakje = Vakje{ .data = element };
        it.?.addNext(&vakje);
        it = it.?.next;
    }
 }

I then call the count function, like you gave me;

// notice the * here, does that matter?
pub fn count(bord: *Bord) usize {
    var n: ?*Vakje = bord.first;
    var result: usize = 0;
    // if next is null, while breaks
    while (n) |vakje| {
        n = vakje.next;
        result += 1;
        // the second debug print causes a loop to occur,
        // while the first one "correctly" results in a segfault
        // after indeed every element has been iterated over
        std.debug.print( "result: {d}\n", .{result});
        // std.debug.print("result: {d} with data: {d}\n", .{ result, vakje.data });
    }
    return result;
}

3. It does not seem to make a difference if the parameter of count() is *Bord type of Bord type, but why? I would think one would need to deference as well if one only passes a non-pointer Bord type.

4. Remember that data has u8 type. Why does the currently not commented debug print give a “correct” output with a segfault

added next!
added next!
result: 1
result: 2
result: 3
Segmentation fault at address 0x16 @ n = vakje.next;
                                              ^

but the second debug print leads to a infinite loop of increasing “result” and overflowing “data” ?

1 Like

Welcome to Ziggit, @MeepMeep!

It does make a difference but that difference isn’t always important. Since count doesn’t mutate Bord, you can use count(bord: *const Bord) or count(bord: Bord), and the result will usually be the same.

Zig has Parameter Reference Optimization, so when a parameter is constant, the ‘preferred’ style is to pass as a reference: count(bord: Bord). This gives the compiler the choice of whether to pass by pointer or pass by value. Accessing the struct works the same way in both cases, the mutable pointer case comes with some differences you can read about in the documentation.

This comes with a caveat, however: there are some unsolved aliasing issues which can arise from the ambiguity between a reference and a value using the reference form count(bord: Bord). There’s a good video about this if you want to know more. I use references in my own code, TigerBeetle, which has very high requirements for robustness and reliability, currently has a policy of pass-by-constant-pointer. It’s just something to be aware of, if you learn how aliasing works, you can code around the possibility of it. There are plans afoot to address this directly at some point.

2 Likes

Yes, the condition type is with optional and the payload type is without optional.
See also while with Optionals and if with Optionals

It needs {d} in format to display an integer.
{c} means print a character, and ascii codes for 1,2 and 3 are invisible.
See: std.fmt.format

Normally you want to use *Bord if you need to modify it. Otherwise you use *const Bord or Bord. When Bord is used, zig decides if the call is by value or by reference.
See: Functions / Pass by value Parameters

  1. In addFromArray
    for (vakjes[0 .. vakjes.len - 1]) |element| {
        var vakje = Vakje{ .data = element };
        it.?.addNext(&vakje);
        it = it.?.next;
    }

vakje variable is local, when the block exits its lifetime ends and the same storage is reused for the next loop iteration. Actually each &vakje used in addNext must have the same address, in the stack.
When the function exits, the stack storage is replaced by the next function local variables that turns the pointers to garbage.

2 Likes

@mnemnion , thanks for the great references! I got some studying to do I see : )

@dimdin , thanks for the great answers! Regarding 4., I unfortunately can fully follow your line of reasoning (or at least apply it in practice). If I understand you correctly, you mean with

When the function exits, …

that the address passed to the function addNext is different on each for loop iteration, which leads to the pointer corruption?

I’ve tried making vakje not local to the while loop;

var it = bord.first;
var vakje = Vakje{ .data = 0 }; //dummy struct

for (vakjes[0 .. vakjes.len - 1]) |element| {
    vakje = Vakje{ .data = element };
    it.?.addNext(&vakje);
    // it.?.addNext(&Vakje{ .data = element});
    it = it.?.next;
}

I’ve also tried to pass an argument to addNext in place,

// -- it.?.addNext(&vakje);
// ++
it.?.addNext(&Vakje{ .data = element});

but this fails because the argument passed must be a non-const pointer as addNext mutates it.

Questions

5. How would one (preferably with an code example) fix
the addFromArray (and possibly addNext) function in order to not get the addition of the next pointer get turned to garbage by the stack changing?

6. Am I correct in stating that the problem was that the variable vakje was local to the for loop, and moving its initialisation to lower (see table) in the table is the correct way to alleviate this?

1 2 3 4
addNext
for vakjes for vakjes
addFromArray addFromArray addFromArray count
main main main main
1 Like

Using local variables is not appropriate for your case, you have two problems:

  • There is only one instance (every time you call &vakje you get the same location and you override the same data).
  • The life of the variable ends when the code exits the declared block (the function in the latest case).

What you need is an allocator. You call var vakje = try allocator.create(Vakje); to allocate memory and allocator.destroy(vakje); to free the memory.
You can start with the GeneralPurposeAllocator:

    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

Something important to remember is that an allocator allocates memory but does not initialize the memory. With var vakje: *Vakje = try allocator.create(Vakje); you get memory, and vakje is a pointer to Vakje, but vakje.next is not initialized to null and vakje.data is not initialized to 0.