Algore: A collection of small modules

7R35C0 · November 21, 2024, 12:44pm

It’s been a while since my last post, and first of all a warm greeting to everyone!

In the meantime, I’ve been working on a module with various common functions used in singly linked lists.

I know Zig already has SinglyLinkedList and for many it may not seem like an achievement, but considering that I started from scratch without any information about what these lists are, I am very happy with the result.

I don’t want to waste your time and I’ll get straight to the point.

About algore

Algore is a collection of small modules implemented in the Zig programming language. If you want, you can call it a library, but this title seems too much to me, compared to what it is now.

Each module has its own repo, for standalone use case, see the links below.

All code has been tested only with zig version 0.13.0 on Linux Fedora 39.

Examples

//:========================================================================
//: Licensed under the [MIT License](LICENSE).
//:
//: Tested with zig version 0.13.0 on Linux Fedora 39.
//:========================================================================

const std = @import("std");
const assert = std.debug.assert;

const countDigit = @import("algore").digit.countIterative;
const SinglyLinkedList = @import("algore").singly_linked_list.SinglyLinkedList;

pub fn main() !void {

    //^--------------------------------------------------------------
    //^ Module `digit`
    //^--------------------------------------------------------------

    {
        //+ Supported types.
        {
            assert(countDigit(0) == 1);
            assert(countDigit(0.0) == 1);
            assert(countDigit(@as(u32, 0)) == 1);
            assert(countDigit(@as(i32, 0)) == 1);
            assert(countDigit(@as(f32, 0.0)) == 1);
        }

        //+ Unsupported types, returns `null`.
        {
            assert(countDigit(std.math.maxInt(u129)) == null);
            assert(countDigit(@as(u129, std.math.maxInt(u129))) == null);
            assert(countDigit(std.math.minInt(i130)) == null);
            assert(countDigit(@as(i130, std.math.minInt(i130))) == null);
            assert(countDigit(@as(f16, 2050)) == null);
        }
    }

    //^--------------------------------------------------------------
    //^ Module `singly_linked_list`
    //^--------------------------------------------------------------

    {
        const Typ = usize;
        const Lst = SinglyLinkedList(Typ);
        const Nod = Lst.Node;

        var nod0 = Nod{ .data = 0, .next = null };
        var nod1 = Nod{ .data = 1, .next = null };
        var nod2 = Nod{ .data = 2, .next = null };
        var nod3 = Nod{ .data = 3, .next = null };
        var nod4 = Nod{ .data = 4, .next = null };

        var lst0 = Lst{};

        //+ Node `data` field type.
        {
            assert(Nod.Data == Typ);
            assert(@TypeOf(nod0).Data == Typ);
        }

        //+ Node `link()` and `unlink()` functions
        {
            assert(nod0.length() == 1);

            nod0.link(&nod1);
            nod0.link(&nod2);
            nod0.link(&nod3);
            nod0.link(&nod4);

            assert(nod0.length() == 5);

            nod0.unlink(&nod4);
            nod0.unlink(&nod3);
            nod0.unlink(&nod2);
            nod0.unlink(&nod1);

            assert(nod0.length() == 1);
        }

        //+ List `data` field type.
        {
            assert(Lst.Node.Data == Typ);
            assert(@TypeOf(lst0).Node.Data == Typ);
        }

        //+ List `insertLast()` and `removeLast()` functions
        {
            assert(lst0.length() == 0);
            assert(nod0.length() == 1);

            lst0.insertLast(&nod0);
            lst0.insertLast(&nod1);
            lst0.insertLast(&nod2);
            lst0.insertLast(&nod3);
            lst0.insertLast(&nod4);

            assert(lst0.length() == 5);
            assert(nod0.length() == 5);

            lst0.removeLast();
            lst0.removeLast();
            lst0.removeLast();
            lst0.removeLast();
            lst0.removeLast();

            assert(lst0.length() == 0);
            assert(nod0.length() == 1);
        }
    }
}

Module singly_linked_list

The module contains functions used in singly linked lists.

These functions do not make assertions, can lead to infinite loops in code and cycles in nodes or lists.

As a general rule, cycles should be broken before any other changes. Almost all functions are candidates for applying this rule. Keep this in mind when infinite loops occur in code and their causes are not very obvious.

Node functions

link() - link the node to another node
unlink() - unlink the node from another node
insert() - insert the node in node, before tail node
remove() - remove the node from node, before tail node
last() - return the last node of node
find() - return the node, if node is in node, otherwise null
previous() - return the previous node of node, if node is in node, otherwise null
loop() - return the loop node of node, if node has a loop, otherwise null
breakLoop() - break the node cycle
reset() - reset the node and its nodes
reverse() - reverse the link order of nodes, from a node
length() - return the node length
isLinked() - assert the link status of node
hasNode() - assert the node status in node
hasLoop() - assert the loop status of node
printNode() - print the node data values
toList() - convert the node to a list

List functions

insertFirst() - insert the node at first position in list
insertLast() - insert the node at last position in list
insertBefore() - insert the node in list, before tail node
removeFirst() - remove the first node from list
removeLast() - remove the last node from list
removeBefore() - remove the node from list, before tail node
removeNode() - remove the node from list
removeAll() - remove all nodes from the list
last() - return the last node of list, if list is not empty, otherwise null
find() - return the node, if node is in list, otherwise null
previous() - return the previous node of node, if node is in list, otherwise null
loop() - return the loop node of list, if list has a loop, otherwise null
breakLoop() - break the list cycle
reset() - reset the list and its nodes
reverse() - reverse the list in-place
length() - return the list length
isEmpty() - assert the empty status of list
isLinked() - assert the link status of list
hasNode() - assert the node status in list
hasLoop() - assert the loop status of list
printList() - print the list data values
toNode() - convert the list to a node, if the list is not empty, otherwise return null

Module digit

The module contains functions that return the number of digits for a number, using various algorithms.

All functions return the number of digits for a supported number, otherwise null.

After working quite a bit with the optional types in the singly_linked_list module, I started to like them.

I can mention a few advantages I have seen in using optionals:

makes the code more clearer and the development process much, much faster
- no more struggles to make error messages, which aren’t very helpful anyway, most users end up checking the code itself
makes the testing process safer and closer to real use cases
- the error cases return null, which can be tested very easily
indirectly makes you write a better and more on-topic documentation
- it’s easier to write a few lines of what it does than to cover all error cases with messages what it doesn't

I’m not saying that error sets and messages don’t have their place, but in general, optionals can save you a lot of time and hassle.

Back on track, these functions are actually the old digit counting functions that now return null and are in a single file and module.

If this change bothers anyone, I’m very sorry.

Supported number types and ranges:

.ComptimeInt, .Int: ∓340282366920938463463374607431768211455
- min(i129) = -340282366920938463463374607431768211456 it is accepted to cover the type
.ComptimeFloat, .Float:
- comptime_float: ∓10384593717069655257060992658440193
- f16: ∓2049
- f32: ∓16777217
- f64: ∓9007199254740993
- f80: ∓36893488147419103234
- f128: ∓10384593717069655257060992658440193

Note that for float numbers the type is important, for example:

2050 as f16: returns null
2050 as f32: returns 4, same as for any type greater than f16

Negative numbers are converted to their absolute value.

Digit functions

countIterative() - uses an iterative algorithm
countLogarithmic() - uses a logarithmic algorithm
countLookup() - uses a lookup algorithm
countRecursive() - uses a recursive algorithm
countStringify() - uses a stringify algorithm
countSwitcher() - uses a switcher algorithm

Final note

More detailed examples can be found in the demo files from each repo. Some parts of the demo code are inactive (commented) to clarify what it does, just choose what you want to run.

Regarding the singly_linked_list module, any feedback is greatly appreciated, especially if a common function is missing. I’m very interested in cases that cannot be solved with the existing functions.

For example, sorting nodes is not a common function, it depends on the data type which can be a number, a string, … even a structure, so it’s outside of the module scope.

Regarding algore itself, I noticed that the documentation is disconnected, the breadcrumbs at the top of webpages. A solution can be an intermediate file for each module, if I remember correctly, but I want to avoid doing that unless it is functionally necessary. If there is no another solution, it will remain like this for now.

Anyway, overall the modules work pretty well and any feedback is appreciated.

As usual, I wish you all the best!

gonzo · November 21, 2024, 6:58pm

The name cracked me up!

mnemnion · November 22, 2024, 2:33am

Great project! What I would find very interesting is a version which creates these functions for an intrusive linked list. So instead of having a data pointer to T, you pass in a type T and a string field_name for the name of the field which holds the linked list. Access to the field can be done generically using @field(instance, field_name).

Example use:

const IntrusiveType = struct {
    some_data: []const u8,
    etc: usize,
    etc2: bool,
    prev: ?*IntrusiveType,

    pub usingnamespace SingleLinked(IntrusiveType, "prev");

    // The rest of the functions go here
};

So just returning a namespace struct (all declarations, no fields) which implement the link functions taking an *T (*IntrusiveType in this case), or several as the case may be. Then pub usingnamespace adds them all as declarations to the type, and user code can just write the data-specific parts of working with linked lists, getting a rich vocabulary for doing all the linked-list-specific things, and avoiding the extra dereferencing to access the data.

Edit: some miscellaneous feedback: I’m not convinced that you really need a separate List type and Node type here, It seems like the functions which take *Self just check for the node at .head for the most part, and you could do those things directly to Nodes. For instance, List.isEmpty() can just be an optional Node, ?Node or ?*Node for a pointer.

Although (this brings me to my second observation) if you preserve the Node / List distinction, then the List might be a plausible place to store an Allocator. Several of your functions dispose of Nodes by setting the field to null, this won’t work if it’s heap allocated memory. That you’ll have to free (and allocate in the first place), and those things call for an Allocator in general (it’s possible to have a global Allocator but library code shouldn’t do that).

You could just as soon pass an Allocator in to Node functions which need to use it. That would be my pick, YMMV.

Last bit: you have some functions called printNode and printList. The idiomatic way to do this in Zig is to have a format function, that way if you want to debug print, you just do this:

std.debug.print("A node: {}" .{a_node});

Except this way works with any Writer, so it’s much more flexible, you’re not limited to debug printing to stdout.

Sze · November 22, 2024, 3:05am

I think this could be interesting, but I think whatever interface you end up with should support adding multiple intrusive data structures, without getting collisions/namespacing conflicts.

I am not sure what the best design for that would be, especially if you not only support linked lists, but maybe also other intrusive data structures. (Not quite sure which, maybe trees, double-linked-lists, something graph-like??)

If usingnamespace with multiple such intrusive data structures within one struct doesn’t quite work (without complicated naming conventions), maybe zero sized fields could be used instead to add an arbitrary amount of intrusive data structures linked to specific regular fields?

Just thinking out loud here, definitely a cool idea to explore.
I haven’t used intrusive data structures a lot, but I have seen them being used in a game.

The “adding one node to multiple collections”-feature seems like one of the major benefits of intrusive data structures and is something that probably could be useful in a lot of contexts. I think intrusive data structures are relatively unknown and a good library for them could help make them more popular, so that more people actually consider the option of using intrusively designed data (me included, I tend to forget about them until I get reminded that that is an option too).

mnemnion · November 22, 2024, 3:18am

Good point, this approach would limit it to one field, and because of naming conflicts there’s no good way to reuse the library for a second field.

I implemented an intrusive single-linked list just recently, so it’s been on my mind.

Comptime metaprogramming is very powerful, but I don’t think there’s a way to synthesize a name for a declaration. Still, a second field could create the namespace (without usingnamespace) and then do a list of declarations pointing to that set of functions. Doesn’t scale well, that might be ok though.

I did some somewhat fancy stuff using enums which double as field names, I’ve been meaning to write that up actually, I was pleased with how it turned out. That has potential here, making it a proper generic library is giving me a headache just thinking about it. It could be done.

I’m not quite getting this picture, sounds promising though. Other than void I haven’t found use for a zero-sized field yet.

My impression is that this gets done more often with double-linked lists, I could be wrong, it isn’t a pattern I’ve had reason to use.

I like all these ideas though. Intrusive data structures are a Zig-friendly thing to do, and it sounds like a nice project.

7R35C0 · November 22, 2024, 8:34am

Why?!
Did you want to use the name algore for something or did it not meet your expectations?

gonzo · November 22, 2024, 8:42am

It made me think of the guy who invented the Internet… (old joke, and false rumor after all).

7R35C0 · November 22, 2024, 8:59am

Well… as always, having no programming experience, please give me a bit of time to search for some information on what these intrusive link lists are.

" I’ll be back "

7R35C0 · November 22, 2024, 9:22am

I would have bet that’s what it’s about (Al Gore) . The first time I chose algor (algorithms) but every time, in the code I wrote algore, the correction became annoying and I said “that must be a sign” and it’s stayed algore ever since.

7R35C0 · November 22, 2024, 1:09pm

If I understand correctly, your proposal is related to something similar to array of structures (AoS) , structure of arrays (SoA), array of structures of arrays (AoSoA).

Honestly, I don’t see why such an approach wouldn’t already work. Essentially, a node is a list and vice versa. Even something like some_data: []const u8, etc: usize, etc2: bool, can be described with a single node. Furthermore, if I’m not mistaken, T can be of type pointer and this leads to whatever someone wants to combine behind those pointers, even another list or node of pointers.

The only problem would be that there is no such structure in the module, but something like this could be done, not a real code just as a concept:

const IntrusiveType = struct {
    linked_list: *ListOfPointers,
    next: ?*IntrusiveType,

    // Maybe some checks for the list type beyond pointers for the namespaces
    pub usingnamespace SingleLinked(IntrusiveType, "next");
    pub usingnamespace DoubleLinked(IntrusiveType, "next");

    // The rest of the functions go here
};

Of course, I might be missing important things related to implementing such a structure, but essentially I don’t see why it wouldn’t work already.

I’m sorry if I didn’t understand your proposal.

At first I started with the original code. After a while I realized that the code is somewhat duplicated, apart from checking list.head == null, and can be further optimized by omitting node functions. Moreover, considering that I want to do something similar for doubly_linked_list in the near future, I will have to decide how to get rid of this code duplication.

One solution can be to move the node code into list, so that the node no longer exists, which results in a bit of speed in execution.
Another solution could be to make the list without a head, promoting the node to the list level, the list is the node itself. I have a feeling that the code will become more complicated in terms of usage. A node cannot be null, it only exists or not.

I’m not sure what the best solution is, at least for now.

As I said, I want to do something similar for doubly_linked_list, possibly a linked_list module that contains both modules. If all goes well, I’ll see how the print functions can be better integrated. At this point, I don’t know much about debug, format, writer, and so on. I’m referring to how it actually works, beyond a simple usage like std.debug.print().

Regarding allocators, honestly I don’t see the need for two versions, with and without allocators, but I don’t have the experience to give a relevant opinion.

My idea would be to integrate an allocator into the parent linked_list module and make it optional. If it is null, it works as unmanaged, otherwise it uses an allocator.

Anyway, that’s pretty much the plan for the near future, but from wanting to doing it (if possible), it’s a long way.

Thank you very much for the feedback!

mnemnion · November 22, 2024, 6:40pm

Nope, something else.

No problem, let me take another crack at it.

A normal linked list looks like this:

pub const DataType = struct {
   something: []const u8,
   counter: usize,
   flag: bool,
   // Declarations etc.
};

pub const NodeType = struct {
    data_ptr: *DataType,
    next: ?*NodeType, // or prev, whatever
    // Declarations for node stuff (e.g. your library)
};

This is an intrusive linked list:

pub const DataType2 = struct {
    something: []const u8,
    counter: usize,
    flag: bool,
    next: ?*DataType2, // <- The link

    // Data operations *and* list operations go here

    // We'll write this next
    pub usingnamespace LinkedFns(DataType2, "next");
};

So-called because the list intrudes into the data struct. These are often nicer to work with, because all the data is already right there, no deferencing, and memory for nodes and data doesn’t have to be handled separately. Generally this is more efficient, and I would go so far as to say it’s the better way to do linked lists under most circumstances. A counterexample would be a Lisp interpreter.

The idea is something a little like this:

pub fn LinkedFns(T: type, comptime fieldname: []const u8) type {
    return struct {

        /// Links the last node of node to the first node of `node`.
        pub fn link(self: *T, node: *T) void {
            @field(self.last(), fieldname) = node;
        }

       // And so on
    };
};

So here, we pass in a type like DataType2 and a field name, "next". The function returns a pure namespace of list operations which use that field, and this is included within the DataType2 namespace with pub usingnamespace.

Writing @field(val, "name") is the same as writing val.name, but you can do it with arbitrary (comptime known) strings. Powerful stuff.

I hope that makes more sense. I jumped straight into advanced Zig stuff, because I got excited and I think the idea has a lot of potential, but I skipped too many details in the process.

In Zig these are the same thing. You can have a Node, which must exist, or a ?Node, which can exist or not, when it doesn’t exist, the value is always null. Same thing with a pointer, or any other type you’d like.

So if you decide to consolidate on one Node data structure, rather than a Node and a List (this is my suggestion, but it’s just a suggestion) then a Node which can be an empty list is just ?Node, and the link is ?*Node. You won’t need a function to tell a populated list from an empty one, the type system can do that for you.

This would be a bit tricky to achieve. Zig doesn’t have function overloading, and to do memory stuff when the allocator was null, you’d need to pass one in. There are ways to make this work but it’s a bit tricky and awkward.

I suggest getting the memory stuff written and correct using a passed-in Allocator, and then you can decide if you want to add more logic to handle lists with an allocator included on a field. But again, I’m making suggestions, you should do what makes sense to you.

You’re welcome. Good luck!

7R35C0 · November 23, 2024, 9:45am

I don’t think it’s a function overloading. It’s just a single function that checks, where necessary, the “null” allocator.

pub fn SinglyLinkedList(comptime T: type, allocator: ?Allocator) type {

    // Functions that check the allocator, and use it or not

}

Another way may be to pass an allocator as optional only for relevant functions.

But you’re right, the code can get very complicated and hard to understand. A clear separation from the beginning is a better way.

Regarding clear separation and taking into account:

I should change the name to something different from the zig standard library?

Regarding the other topics, I need to dig more and it will probably take longer.
Anyway, thanks for everything, some aspects I really didn’t know or missed completely.

Sze · November 23, 2024, 12:38pm

No I was talking to @mnemnion about the intrusive data structures idea and being able to add multiple of them to a single struct.

mnemnion · November 23, 2024, 2:19pm

This was the main thing I wanted to convey: getting the allocation logic correct is going to be work, it can be more subtle than it looks with linked lists (in particular), and my suggestion is to write it with a passed-in Allocator parameter.

After that you can explore different ways of handling managed vs. unmanaged, you’ll see a few data structures in the standard library with an Unmanaged variant if you want to go that route.

jibal · November 26, 2024, 6:22am

As someone who worked on developing the ARPANET at UCLA (where I knew Vint Cerf, co-author of the TCP/IP protocols that are the basis of the internet) back in 1969 I know that Al Gore didn’t invent the internet (and didn’t claim to) but he had a lot to do with turning the government-managed NSFnet into the internet and getting it into the hands of the public (that’s what his comment about “taking the initiative to create the internet” was about): Al Gore and information technology - Wikipedia

gonzo · November 26, 2024, 8:06am

Yup. That’s why I linked to that Snopes article on the story – it was rather unfair to say that he claimed to have “invented the Internet”.

The library name still cracks me up.

7R35C0 · November 28, 2024, 1:27pm

Following your hints, in ztester is what I have done so far.

Give or take, the main idea of code is:

functions make no assertions about the data types stored in list
the list does not store data, there is no memory for the actual data in list
a clear separation is made between data and nodes (lists)
for a node (list) the actual data type does not matter
the same data can be part of different nodes (lists)
the list itself is a double circular linked list

The demos are very long prints and running with zig build run -- src/demo_name.zig &>demo.txt is highly recommended.

Still are basic functions missing, but I posted to check if this time I’m on the right track.

Thank you in advance for your time!

mnemnion · November 28, 2024, 4:12pm

This is another, different thing.

What we have here:

pub const Node = struct {
    data: *anyopaque,
    prev: ?*Node = null,
    next: ?*Node = null,
};

Is called type erasure. data is a pointer to literally anything, and we don’t know what: the type of the pointer is erased, hence the name.

Which is probably not what you want. In order to use the *anyopaque, one would need to cast it back to what it originally was, which means that code has to track that manually somehow. If you get it wrong, then your program will exhibit illegal behavior, and no one wants that.

If you built a list of *Foo, and consed it to a list of *Bar, there’s no way to tell which is which, because both are *anyopaque now. This is not a fun type of bug to have.

This post covers some of the differences between an opaque linked list and an intrusive one. I probably shouldn’t have mentioned intrusive linked lists in the first place, I had an idea and just sort of blurted it out. But that idea relies on understanding comptime type reflection fairly well, and it’s a distraction from what you were doing. A classic linked list such as you originally posted is a valid and reasonable data structure; what I was talking about is really a new idea, it isn’t a more advanced or inherently better version of what you were already working on.

If you want to do a deep dive on intrusive linked lists, of course, feel free, there’s a lot of information out there about them. But in terms of the Algore collection, I suggest dropping the *anyopaque idea, and just continuing to tinker with and refine what you already have until you’re happy with it.

7R35C0 · November 30, 2024, 5:13am

I realized that *anyopaque might be a problem, but I couldn’t find another solution.

The first time I did something like this:

pub const Node = struct {
    next: ?*Node = null,
    prev: ?*Node = null,
};

pub const List = struct {
    head: Node,

    // bla bla bla implementation
}

This requires the user to have the structure with the node in it or to embed their structure in another intermediate structure.

const Data = struct { data: usize, node: Node };
var nod0 = Node{};
var dat0 = Data{ .data = 0, .node = nod0 };

// or with an intermediate structure

const Data = struct { data: usize};
const Wrap = struct { data: Data, node: Node };
var nod0 = Node{};
var dat0 = Data{ .data = 0 };
var wrp0 = Wrap{ .data = dat0, .node = nod0 };

For some reason, wrong of course, I said it was too complicated and there should be pointers, users know what type of data they are using, right?

As if it wasn’t complicated enough with *anyopaque, I also removed list.head as a dummy node at the beginning of the list. Very, very bad idea.

Anyway, I’ll probably continue with something similar to the first attempt, it seems closest to your clues.