Beginner Algorithms to get a feel for Zig

dynamTypedDisaster · February 11, 2024, 8:15am

I have only ever used high level languages but am trying to rewrite some of the algorithms from CLRS in Zig. So far going terribly, but quite fun.
I need to read docs on how to actually compile so this is my best guess at syntax for the naive string matching algorithm (p. 988, CLRS). I plan to write a little bit most days so hopefully tomorrow will have read through docs and understand how to build and try code below.

const std = @import("std");

pub fn main() void {
    
    fn naiveString(the_string: [_]u8, the_pattern: [_]u8) {
        
        var n: u8 = the_string.len;
        var m: u8 = the_pattern.len;
        var upper_lim: u8 = n-m+1;
        var start: u8 = 0;

        while (i < upper_lim): (i+=1) {
            if (the_pattern == the_string[i..i+m]) {
                return std.log.info(i)
            }
        }    
    }
}

Current uncertainties/questions:

No ranges to iterate over with for loops, is this to prevent getting range wrong?
First page on strings I read mentions pointers, and I don’t know working with pointers so this is worrying
How to decide which primitive type to use, so many options
So much I take for granted in Python

kristoff · February 11, 2024, 8:39am

Welcome to Zig & congratulations on starting your journey to learn more about lower level programming.

I will give you brief suggestions that address the main issues with the code snippet you posted. If those are enough to get you started and learn how to fix the problem on your own, then great. If instead you find that you get stuck even after pursuing any of these suggestions, feel free to come back and ask follow up questions.

make sure you are aware of the main learning resources listed here https://ziglang.org/learn/. In particular you are interested in the language reference, the stdlib docs, zig.guide and ziglings. I would recommend trying out ziglings as soon as you’re given a cursory read to the language reference (which you probably already did).
naiveString() cannot be defined inside of main() like this. place the definition of naiveString below main and then call it from main.
the parameters passed to naiveString have the wrong type. more specifically the underscore inside the brackets is special syntax that can’t be used in that context. Check out https://zig.news/kristoff/what-s-a-string-literal-in-zig-31e9 including the video embedded at the end of the article, and https://zig.news/david_vanderson/beginner-s-notes-on-slices-arrays-strings-5b67. You will want to have your function accept const slices.
testing two ranges of bytes for equality cannot be done with ==. Look in the stdlib docs for std.mem.eql
in Zig basic scanning over slices is done by using for. As you mentioned, this is less error prone, and also more concise to express.

dee0xeed · February 11, 2024, 10:42am

@dynamTypedDisaster - I’ve prepared an implementation of Naive String Matcher in Zig,
but do not want to spoil beforehand. However, if you do not mind, I can post it here.
Or maybe, I’ll post it only after you’ll have something that works, for comparison.

dynamTypedDisaster · February 12, 2024, 4:08am

Thank you for your very helpful reply @kristoff!

@dee0xeed that sounds like a great way for me to learn and see idiomatic examples if you are willing to.

Tried to change a few things after commencing some ziglings but it is clear I don’t understand enough about what is and isn’t permissible with static types. I tried to minimise things I was doing wrong by returning a single integer if there is a match but there are still some error’s for me to go through. Small steps

const std = @import("std");

pub fn main() !void {
    naiveString("bcdas", "cda");
}

fn naiveString(the_string: []const u8, the_pattern: []const u8) u8 {
    const n: u8 = the_string.len;
    const m: u8 = the_pattern.len;
    const upper_lim: u8 = n - m + 1;

    for (the_string, 0..upper_lim) |_, i| {
        if (std.mem.eql([]const u8, the_pattern, the_string[i..i + m])) {
            std.debug.print(1);
        }
    }
}

AndrewCodeDev · February 12, 2024, 5:15am

Hey @dynamTypedDisaster, glad to have you!

BT’W, that is an awesome user name… just saying…

Static types are tough to get your head around at first, but just keep at it.

I’m hoping you’ll communicate some of your challenges because I’m assembling Docs for our community - I’m sure other people will have similar question to you. It’s great that you’re sharing your learning experience because it can help strengthen our content here on the forum.

One thing to point out here - your for loop declaration can just be this:

for (0..upper_lim) |i| {

Permit me to reimagine a few things about your function to help you out. I see you’re returning a u8 which is an unsigned (only zero and above) integer that’s 8 bits wide. It looks like what you’re doing with your print statement is counting the number of matches. Let’s turn that u8 into a usize so we can count matches (that’s a typical data-type you see for counting).

fn naiveString(the_string: []const u8, the_pattern: []const u8) usize {

I know I’m probably getting ahead of myself, but let’s add a check here while we’re at it…

if (the_string.len < the_pattern.len) 
    return 0; // the pattern is larger than the string

Now let’s make a counter and bring it all together here…

fn naiveString(the_string: []const u8, the_pattern: []const u8) usize {
    const n: usize = the_string.len;
    const m: usize = the_pattern.len;

    if (n < m)
        return 0;

    // I always add parenthesis to express intent...
    // if the expression was n - (m + 1), we could go
    // below zero if n == m. This is a personal choice, 
    // but always check your code
    const upper_lim: usize = (n - m) + 1;

    var count: usize = 0;

    for (0..upper_lim) |i| {
        if (std.mem.eql(u8, the_pattern, the_string[i..i + m])) {
            count += 1;
        }
    }

    return count;
}

Let’s take a look at std.mem.eql to understand why I only needed to specify u8 there:

pub fn eql(comptime T: type, a: []const T, b: []const T) bool {

You can see that if we specify T, the next two arguments get filled in as []const T.

So by specifying u8, the arguments of std.mem.eql become []const u8.

You can run it like so:

pub fn main() !void {

    // this automatically deduces that count is usize
    const count = naiveString("catcat", "cat");

    std.debug.print("\nNumber of matches: {}\n", .{ count });
}

dee0xeed · February 12, 2024, 7:06am

Your second iteration is almost okay, and after @AndrewCodeDev’s code review my implementation is no longer a spoiler, so here it is:

const std = @import("std");
const eql = std.mem.eql;
const log = std.debug.print;

/// returns the number of matches
fn naiveStringMatcher(s: []const u8, p: []const u8) usize {

    const n = s.len;
    const m = p.len;
    var counter: usize = 0;

    if (n < m)
        return 0;

    for (0 .. n - m + 1) |pos| {
        if (std.mem.eql(u8, s[pos .. pos + m], p)) {
            log("match @ position {}\n", .{pos});
            counter += 1;
        }
    }
    return counter;
}
pub fn main() !void {
    const s = "hay hay hay needle hay hay hay need needle hay hay hay needle";
    const p = "needle";
    log("looking for '{s}' in '{s}'...\n", .{p, s});
    var n = naiveStringMatcher(s, p);
    log("found {} matches\n", .{n});

    const s1 = "hay";
    const p1 = "needle";
    log("looking for '{s}' in '{s}'...\n", .{p1, s1});
    n = naiveStringMatcher(s1, p1);
    log("found {} matches\n", .{n});
}

Also there was a version, where n < m case was considered as an error.
It does not make mush sense for string matcher, just demonstrates error handling in Zig.

error.StringIsShorterThanPattern

const std = @import("std");
const eql = std.mem.eql;
const log = std.debug.print;

/// returns the number of matches or error if s.len < p.len
fn naiveStringMatcher(s: []const u8, p: []const u8) !usize {

    const n = s.len;
    const m = p.len;
    var counter: usize = 0;

    if (n < m)
        return error.StringIsShorterThanPattern;

    for (0 .. n - m + 1) |pos| {
        if (std.mem.eql(u8, s[pos .. pos + m], p)) {
            log("match @ position {}\n", .{pos});
            counter += 1;
        }
    }
    return counter;
}

pub fn main() !void {
    const s = "hay hay hay needle hay hay hay need needle hay hay hay needle";
    const p = "needle";
    log("looking for '{s}' in '{s}'...\n", .{p, s});
    var n = try naiveStringMatcher(s, p);
    log("found {} matches\n", .{n});

    const s1 = "hay";
    const p1 = "needle";
    log("looking for '{s}' in '{s}'...\n", .{p1, s1});
    _ = naiveStringMatcher(s1, p1) catch |err| {
        log("failed: {}\n", .{err});
    };
}

dynamTypedDisaster · February 13, 2024, 9:33pm

Great to see two approaches and some extra tid-bits of code, very helpful.
One naive question, when working in python I try to essentially avoid for loops for anything serious and also consider ds/algos for the task at hand but I can’t lie I usually just see which approach is written in C and use that: so the question is when using zig is ‘speed’ always reliably estimated by the number of operations now?

AndrewCodeDev · February 13, 2024, 9:44pm

In short, yeah - you don’t need to be afraid of loop performance. It’s very fast.

Number of operations isn’t a bad place to start - there’s some caveats to making things run super fast (like SIMD etc), but at this point I’d recommend coming back to that when you’ve really become comfortable with the type system and have had some time to play around with the data structures in the standard library.

dynamTypedDisaster · February 15, 2024, 7:52am

I have another round of short, embarrassing, code that I tried to ‘translate’ from python.
A few random notes:

Ziglings is very enjoyable and easily the most accessible and comprehensive learning resource I’ve currently tried for this language.
Dude the builder videos have been helpful too.
ZLS has been troublesome to get working.

const std = @import("std");
const print = std.debug.print;
const append = std.ArrayList.append;

fn largestPerimeter(nums: [_]i32) usize {
    
    var negative_array: std.ArrayList(i32) = undefined;
    var array_total: usize = 0;
    
    for (nums) |each_item {
        negative_array.append(-each_item);
    }

    for (nums) |each_item| {
        array_total += each_item;
    }

    //var nums_heap: // heapify (so heap queue algo which is well beyond my current ability)

    while (nums_heap.len >= 3) {
        var current = //pop from heap ;

        if (array_total + current > -current) {
            return array_total;
        }
        array_total += current;
    }
    return -1;

}

dee0xeed · February 15, 2024, 8:05am

Same error here - [_] is not what you want, explained above.

Sze · February 15, 2024, 5:47pm

missing |

dynamTypedDisaster · February 16, 2024, 8:26am

I’ve spent a little while reading the documentation today and have a single question I couldn’t find the answer to (I think because it may be obvious to experienced developers):

Are the few builtin functions with a capitalised first letter capitalised for a reason/to indicate something about them? e.g. @This()

dee0xeed · February 16, 2024, 8:30am

Types are PascalCase, functions which return a type, also PascalCased.
See style guide.

If x is callable, and x 's return type is type , then x should be TitleCase

An example from stdlib:

pub fn ArrayList(comptime T: type) type {

This function returns some type, hence it’s name is in PascalCase/TitleCase.

dynamTypedDisaster · February 17, 2024, 8:19pm

I have been recommended the book “Understanding and Using C Pointers”, by Richard Reese to understand what pointers are all about (by a C programmer) and was wondering if someone from this community has read it and would be able to weigh in on whether it would ultimately be unhelpful in understanding how pointers work in Zig or if it would be ok?

pierrelgol · February 17, 2024, 8:30pm

Would absolutely recommend, It’s one of my favourite books about programming, it really helped me reason about pointers in general, so go for it.

dynamTypedDisaster · February 17, 2024, 8:31pm

Thank you and for such a speedy reply too.

dynamTypedDisaster · February 25, 2024, 1:56am

I (naively) thought this experience would involve more typing and less reading but it has been the exact opposite. Coming from a high level perspective, attempting to learn Zig has led to a lot of reading on topics I have never gone beyond surface level before (ASCII, Unicode, and some of the history of character encoding, LLVM, pointers, macros, so on…).

Is someone willing to weigh in on whether I have understood the following points correctly:

Some of the reasons Zig wants to divorce LLVM is because it is slow to compile and one of the aims of Zig is to reduce compilation time to speed up development process. Another is that it would be prohibitive of incremental compilation? And it also slows down Zig development trajectory due to the LLVM development cycles?
Macros are excluded from Zig because even though they can be useful in solving specific problems, those cases don’t warrant the outcome of unreadable/uninterpretable code for either the author 6 months down the track or others looking at the codebase?
The C ABI has some parts which are ‘fixed’ and other parts which are determined by the computer’s specs?
I know this is a lot to ask but would someone be willing to give their take of what might end up at, Zig Documentation, as I have been having trouble with io.getStdIn.

My long term goals at this stage are to write:

An old school text adventure game to get experience working with strings,
Maybe write something that might display information about sys calls.
Eventually write a very basic first person shooter outline world to understand how that works.

Keen for input if any sound ill advised.

AndrewCodeDev · February 25, 2024, 5:02am

Macros allow you to write a sub-language in the parent language. They also work in the opposite direction of functions. Classic example…

fn foo(x: i32) i32 { return x * x; }

foo(2 + 5); // becomes 49... 7^2

#define FOO(X) (X * X)

FOO(2 + 5)  // becomes 29... why? It expanded out the call as...

2 + 5 * 5 + 2... operator precedence says multiplaction comes first.

// fixed macro:
#define FOO(X) ((X) * (X))

In other words, with functions, the arguments resolve first before they enter the function. Macros resolve the text replacement first before the arguments. Exactly backwards and full of awful behavioral consequences.

I remember a long time ago I wanted lambda functions like Java in C++, so I made a set of macros with a ton of templates to makes it happen. It worked… mostly…

// this was the result...
result = obj.reduce(x == 42);

// really rough picture, but something like with a lot more machinery...
#define reduce(X) Reduce([&](auto x){ return (X); })

In other words, I defined my own language using them. Nobody, except for me, would realize they were reading C++.

Basically yes. LLVM is slow - very highly developed but terribly slow. Zig also cannot own it’s own faults - many things end up being LLVM’s issue and has to go through that pipeline to get handled.

Practically speaking, any compiler that implements the C language by it’s specifications could produce a different application binary interface. The OS will have a big role to play in how that turns out as well but some parts of it are processor dependent instead. Here’s a good StackOverflow on that: Does C have a standard ABI? - Stack Overflow.

That’s some helpful documentation if I’ve ever seen any XD

Let me link you to some code that may help. Sometimes an example is worth a thousand words… this has to do with std.io.getStderr.

/// The default implementation for the log function, custom log functions may
/// forward log messages to this function.
pub fn defaultLog(
    comptime message_level: Level,
    comptime scope: @Type(.EnumLiteral),
    comptime format: []const u8,
    args: anytype,
) void {
    const level_txt = comptime message_level.asText();
    const prefix2 = if (scope == .default) ": " else "(" ++ @tagName(scope) ++ "): ";
    const stderr = std.io.getStdErr().writer();
    var bw = std.io.bufferedWriter(stderr);
    const writer = bw.writer();

    std.debug.getStderrMutex().lock();
    defer std.debug.getStderrMutex().unlock();
    nosuspend {
        writer.print(level_txt ++ prefix2 ++ format ++ "\n", args) catch return;
        bw.flush() catch return;
    }
}

You can see how they instantiate it. They invoke the getStderr function, wrap it in a buffered writer and get the writer interface to do the work. Then, at the bottom, they do writer.print to start using it. Not sure if that helps, but that’s a common way for doing standard io.

AndrewCodeDev · February 25, 2024, 5:16am

To make that last example less murky… it could have been this:

    // get your io object
    const stderr = std.io.getStdErr().writer();
    var bw = std.io.bufferedWriter(stderr);

    // this is the actual thing that does writing... 
    // it's an interface similar to "Allocator" 
    const writer = bw.writer();

    // more on this in the next code block...
    writer.print("Hello... World?\n", .{)) catch return;

    // more on this in the code block after that...
    bw.flush() catch return;

// buffered writer tries to write to a buffer first (surprise lol)...
// if that fails, it try to flush what's in the buffer out and then
// if it still can't print the thing because it's too big, it calls
// its internal "unbuffered_writer" instead.

        pub fn write(self: *Self, bytes: []const u8) Error!usize {
            if (self.end + bytes.len > self.buf.len) {
                try self.flush();
                if (bytes.len > self.buf.len)
                    return self.unbuffered_writer.write(bytes);
            }

            const new_end = self.end + bytes.len;
            @memcpy(self.buf[self.end..new_end], bytes);
            self.end = new_end;
            return bytes.len;
        }

// it just calls writeAll and sets its end back to zero
pub fn flush(self: *Self) !void {
    try self.unbuffered_writer.writeAll(self.buf[0..self.end]);
    self.end = 0;
}

AndrewCodeDev · February 25, 2024, 5:22am

At this point, we should probably consider opening a new topic if we want to continue.