Code Review: String sorting

Hello everyone,

I am new to the zig language and trying toy exercises like string_sorting etc.
I would like to get my below code reviewed, please feel free to let me know any and every improvement I can make to it.

const std = @import("std");

pub fn main() void {
    const str = "cbawdd";
    std.debug.print("unsorted: {s}\n", .{str});
    const sort_str = sort_string(str);
    std.debug.print("sorted: {s}\n", .{sort_str});
}

fn sort_string(ss: []const u8) [6]u8 {
    // Function to sort the given 6 characters string.
    // in an ascending order.

    // Make the immutable string into mutable string
    // so that it can be manipulated.
    var buf: [6]u8 = undefined;
    for (0..ss.len) |i| {
        buf[i] = ss[i];
    }

    // sort the string
    for (0..buf.len) |i| {
        for (0..i) |k| {
            const j = i - k;
            if (buf[j] < buf[j - 1]) {
                const tmp = buf[j];
                buf[j] = buf[j - 1];
                buf[j - 1] = tmp;
            }
        }
    }
    return buf;
}

I do have a few questions:

  1. The above only works for six character strings, how can I make it to work with a string of any length?
  2. In the sort_string function I have to first make []const u8 to a mutable [6]u8 type and to do so I read we can use allocator too, which method is better the above for loop or the allocator one?
  3. As the str const is a pointer to the actual bytes cbawdd, I would like to edit the actual string using the pointer instead of storing it in a new const (sort_str), how can I do so?

Please let me know your thoughts.

Cheers,
DD.

1 Like

Hello,
For sorting strings of an arbitrary length, you are gonna want to allocate memory on the heap (or write the string into a pre-reserved mutable buffer; it doesn’t really matter which).
std.mem.sortUnstable should be perfect for your purposes, since it always sorts in-place (and doesn’t preserve order of equal elements, unlike std.mem.sort).

Here’s an example test I wrote:

pub fn sort_chars(_: void, lhs: u8, rhs: u8) bool {
	return lhs < rhs;
}

test sort_chars {
	const str = try std.testing.allocator.dupe(u8, "cbawdd");
	defer std.testing.allocator.free(str);
    std.debug.print("unsorted: {s}\n", .{str});
	std.mem.sortUnstable(u8, str, {}, sort_chars);
    std.debug.print("sorted: {s}\n", .{str});
}

It’s also worth noting that you don’t need to use a for-loop to write your slice into a fixed buffer; you can just do this:

var buf: [6]u8 = undefined;
@memcpy(buf, ss[0..6]); // We slice the string to ensure it never copies out of bounds

This will automatically be optimised with SIMD instructions if possible.
Though, the compiler will probably vectorise your for-loop anyway, since it doesn’t have any side effects.

If you want to alter the sorting implementation such that uppercase characters are considered equal to lowercase characters, you can simply alter the sort_chars function from the test.

1 Like

Here are two functions that might be useful or helpful:

// Lexicographic asc order
pub fn lessThan(_: void, a: []const u8, b: []const u8) bool {
    return std.mem.order(u8, a, b).compare(.lt);
}

// Natural/Alphanumeric asc order following ICU standard
pub fn naturalLess(_: void, a: []const u8, b: []const u8) bool {
    var i: usize = 0;
    var j: usize = 0;
    var a_zero_count: usize = 0;
    var b_zero_count: usize = 0;

    while (i < a.len and j < b.len) {
        const ca = a[i];
        const cb = b[j];
        // If both are digits, compare the full number
        if (std.ascii.isDigit(ca) and std.ascii.isDigit(cb)) {
            // Ignore starting zeros from a
            var start_i = i;
            while (i < a.len and a[i] == '0') : (i += 1) {}
            a_zero_count += i - start_i;
            start_i = i;

            // Create integer slice from a
            while (i < a.len and std.ascii.isDigit(a[i])) : (i += 1) {}
            const da = a[start_i..i];

            // Ignore starting zeors from b
            var start_j = j;
            while (j < b.len and b[j] == '0') : (j += 1) {}
            b_zero_count += j - start_j;
            start_j = j;

            // Create integer slice from b
            while (j < b.len and std.ascii.isDigit(b[j])) : (j += 1) {}
            const db = b[start_j..j];

            // Compare numeric size
            if (da.len != db.len)
                return da.len < db.len;

            switch (std.mem.order(u8, da, db)) {
                .lt => return true,
                .gt => return false,
                .eq => continue,
            }
        }

        // If characters differ, normal ASCII comparison
        if (ca != cb) {
            return ca < cb;
        }

        // Otherwise continue scanning one char at a time
        i += 1;
        j += 1;
    }

    // After the loop, compare string length ignoring starting zeros
    return a.len - a_zero_count < b.len - b_zero_count;
}

First function is a byte comparison in a string, second one follows natural sorting that ICU follows, but only for ASCII characters. There might be a better way to cut off starting zeroes and calculate their length