Canonical loop structure for reductions

One thing I like about Zig is that, while in general the language is minimal, it has a very rich vocabulary of control flow constructs (labeled block, break out of loop with value, for/else).

So, for example, expressing search in Zig is very natural:

const found = for(items) |item| {
    if (predicate(item)) break item;
} else null;

what bugs me, however, is that I can’t quite find an equally compelling form for reductions, like finding a sum or a maximal element.

The most straightforward way to write it is of course something like this:

var max: u32 = 0;
for (items) |item| max = @max(max, item);

The problem here is that max remains mutable after the loop, which of course isn’t a big problem, but still feels somewhat unsatisfactory. You can do something like

var max_running: u32 = 0;
for ...
const max = max_running;

but then you have an extra variable in scope…

The “clean” way to write it would be:

const max = b: {
    var max: u32 = 0;
    for (items) |item| max = @max(max, item);
    break b: max;
};

But then it reads too busy, with an extra indentation level, a block label and a break… Not really worth it, compared to the simple solution of making the max mutable.

Is there a nicer way to express this I am missing?

7 Likes

It doesn’t read too busy to me; maybe getting friendly with the block is one way forward?

2 Likes

If not, breaking out a good ol’ function for each reduction isn’t so bad, yeah?

fn maxOf(items: anytype) std.meta.Child(@TypeOf(items)) {
    var max: std.meta.Child(@TypeOf(items)) = 0; // or more metaprogramming, or an argument, for the minimum value
    for (items) |item| max = @max(max, item);
    return max;
}

or something like that

3 Likes

I wouldn’t exactly call it “nicer”, but I came up with this:

test "max of an array" {
	const items: [3]u32 = .{0, 0x7FFF, 0x2009};
	const max = outer: for(0..items.len) |i| {
		for(items[i + 1..]) |item| {
			if(item > items[i]) continue :outer;
		}
		break items[i];
	} else 0; 
	try std.testing.expectEqual(@max(items[0], items[1], items[2]), max);
	
	// Try some random arrays
	for(0..1000) |_| {
		var random_items: [4]u32 = undefined;
		try std.testing.io.randomSecure(std.mem.sliceAsBytes(random_items[0..]));
		
		const random_max = outer: for(0..random_items.len) |i| {
			for(random_items[i + 1..]) |item| {
				if(item > random_items[i]) continue :outer;
			}
			break random_items[i];
		} else 0;
		try std.testing.expectEqual(
			@max(random_items[0], random_items[1], random_items[2], random_items[3]),
			random_max,
		);
	}
}

There have been a few proposals for syntax to “close” a var declaration so that it’s no longer mutable after a certain point. With that you could do something like this:

var max: u32 = 0;
for (items) |item| max = @max(max, item);
const max;

Personally I wouldn’t mind a feature like this being added, but I think Andrew’s view is that the added language complexity isn’t worth the minor conveniences it enables, so I wouldn’t hold my breath.

2 Likes

One pattern that might be better sometimes is to name the intermediate value differently:

var acc: u32 = 0;
for (items) |item| acc = @max(acc, item);
const max = acc;

This leaves a “garbage” variable in scope, but you get the final result as const and not var without extra indentation.

If it gets a bit bigger though, I think I will often go for the labeled block instead to not pollute the scope needlessly.

Edit: I actually skimmed over the part of your post where you mention this so my comment doesn’t add much… oh well :roll_eyes:

Not cleaner, but I dislike labels.

    const max = while(true) {
        var max: u32 = 0;
        for (items) |item| max = @max(max, item);
        break max;
    };
3 Likes

I think when one runs into an extra indent level and label/break, that may be a signal that one might want to put that into a function. This provides the scoping and addresses the mutable concern as well.

That was my immediate thought, and, like @alanza and others, I like the block; it’s still skim-readable. I’m guessing that formatting to compress isn’t attractive to you? This is a bit overkill, but…

// zig fmt: off
const max = b: { var r: u32 = 0;
    for (items) |i|  r = @max(r, i);    break b: r;
};
// zig fmt: on

I think it may make sense to have a single function that takes the operation as a comptime enum parameter, then you can use a switch to implement the operation and don’t have to repeat the setup for each operation and the compiler will generate the separate versions. (allowing you to write one function that contains a switch instead of many)

I think I prefer accumulate when it makes sense to write the code so that it is aware of empty sequences and avoids trying to accumulate them (which often could be done by having an early return soon enough). Not having to deal with empty sequences makes the code simpler.
(something that sums to zero might be different from not having anything to sum, depending on what you are implementing)

const std = @import("std");

pub const Op = enum { max, min, sum };
fn accumulate(comptime op: Op, items: anytype) std.meta.Elem(@TypeOf(items)) {
    std.debug.assert(items.len > 0);
    var accum = items[0];
    for (items[1..]) |i| accum = switch (op) {
        .max => @max(accum, i),
        .min => @min(accum, i),
        .sum => accum + i,
    };
    return accum;
}

// if you want to allow empty collections
fn accumulate2(comptime op: Op, items: anytype) ?std.meta.Elem(@TypeOf(items)) {
    // NOTE also could return accum for empty collections but null retains the information that the input was empty
    if (items.len == 0) return null;
    const E = std.meta.Elem(@TypeOf(items));
    var accum: E = switch (op) {
        .max => std.math.minInt(E),
        .min => std.math.maxInt(E),
        .sum => 0,
    };
    for (items[0..]) |i| accum = switch (op) {
        .max => @max(accum, i),
        .min => @min(accum, i),
        .sum => accum + i,
    };
    return accum;
}

pub fn main() !void {
    const numbers = [_]i32{ -20, 3, 6, 22, 54, 28, -4, 7 };
    // const numbers = [_]i32{};

    const accum = accumulate;

    const max = accum(.max, &numbers);
    const min = accum(.min, &numbers);
    const sum = accum(.sum, &numbers);

    std.debug.print("max: {any}\n", .{max});
    std.debug.print("min: {any}\n", .{min});
    std.debug.print("sum: {any}\n", .{sum});
}
2 Likes

Overall, I am a fan of block expressions. The only problem is that if we need to reduce a large struct, block expression semantics involve an extra copy and cannot construct the reduced result in place.

I do not have a solution to the mutability problem, but by careful program design, it is possible to make the inherent bug obvious and limit the mutable variable to <= 1 in a file.

First, clear naming for the mutable variable:

var reusable_variable: u32 = 0
for (items) |item| reusable_variable = @max(reusable_variable, item);
const max = reusable_variable;

This way, if we have to loop later on, we don’t have to create another dangling variable, and if reusable_variable gets assigned outside a loop region, it becomes obvious why that will lead to a bug.

Maybe a little, but honestly not too bad. I personally would appreciate syntax more like this:

    const items = ...
    const max_item = {
        var max: u32 = 0;
        for (items) |item| max = @max(max, item);
        return max;
    };

    ... do something with max_item

But it’s not really a big deal for me.

TBH I like the extra indentation level.

1 Like

This is why Zig should add shadowing.

2 Likes