Break statements in labelled switches, footgun?

Consider the following example:

const std = @import("std");

test "nested breaks with labelled switches" {
    var counter: u8 = 0;
    for (0..2) |_| {
        const State = enum { first_state, increment_counter };
        state: switch (State.first_state) {
            .first_state => continue :state .increment_counter,
            .increment_counter => {
                counter += 1;
                break;
            },
        }
    }
    try std.testing.expectEqual(@as(u8, 2), counter);
}

Would you expect this test to pass?

Answer: The test does not pass, because the break statement breaks the for loop and doesnā€™t break the labelled switch. So the counter is only incremented once.

expected 2, found 1

I recently hit this footgun when writing some state machines, my brain was in ā€œloop modeā€ when using labelled switches and I used a break statement to break this loop.

Perhaps zig should only allow labelled breaks in labelled switches? What do you think?

And if you label the break with break :state you can get the test to pass since the break will break the labelled switch and not the for loop.

Note that unlabeled break is never a valid way to exit a switch. If you remove the for loop from your code it will fail to compile with something like:

switchbreak.zig:5:13: error: break expression outside loop
            break;
            ^~~~~
6 Likes

Found the C programmer :slight_smile:

2 Likes

Whatā€™s the point using break :state here?

1 Like

I remember having managed to shoot my foot the other way 'round in go, where a ā€˜breakā€™ in a switch or select would break the switch / select, not the for loop around it. Even though the docs are pretty clear, ā€˜breakā€™ always refers to the inner structure unless you use labels.

Since in Zig (0.14) you can label both switches and loops, why not use labels for both, to make clear what ā€˜breakā€™ refers to? Still, I understand that it might be preferable that the language pushes you towards something less ā€œfootgunnableā€ā€¦

1 Like

Sometimes you are deep as hell into implementing esoteric german industrial protocols and the labelled switch is a godsend for parsing byte streams :slight_smile:

pub fn readSMBitLengths(
    port: *nic.Port,
    station_address: u16,
    recv_timeout_us: u32,
    eeprom_timeout_us: u32,
) !?SMBitLengths {
    const sm_catagory = try readSMCatagory(
        port,
        station_address,
        recv_timeout_us,
        eeprom_timeout_us,
    ) orelse return null;

    var res = SMBitLengths{};
    for (sm_catagory.slice()) |sm| {
        try res.append(SMBitLength{ .sm_config = sm, .bit_length = 0 });
    }

    for ([2]CatagoryType{ .TXPDO, .RXPDO }) |catagory_type| {
        const catagory = try findCatagoryFP(
            port,
            station_address,
            catagory_type,
            recv_timeout_us,
            eeprom_timeout_us,
        ) orelse {
            std.log.info("station_addr: 0x{x}, couldnt find cat: {}, skipping", .{ station_address, catagory_type });
            continue;
        };
        std.log.info("trying: {}", .{catagory_type});

        // entries are 8 bytes, pdo header is 8 bytes, so
        // this should be a multiple of eight.
        if (catagory.byte_length % 8 != 0) return error.InvalidEEPROM;

        var stream = SIIStream.init(
            port,
            station_address,
            catagory.word_address,
            recv_timeout_us,
            eeprom_timeout_us,
        );
        var limited_reader = std.io.limitedReader(stream.reader(), catagory.byte_length);
        const reader = limited_reader.reader();

        const State = enum {
            pdo_header,
            entries,
            entries_skip,
        };
        var pdo_header: PDO.Header = undefined;
        var entries_remaining: u8 = 0;
        var current_sm: u8 = 0;
        state: switch (State.pdo_header) {
            .pdo_header => {
                assert(entries_remaining == 0);
                pdo_header = wire.packFromECatReader(PDO.Header, reader) catch |err| switch (err) {
                    error.EndOfStream => break :state,
                    error.LinkError => return error.LinkError,
                    error.Timeout => return error.Timeout,
                };
                if (pdo_header.n_entries > PDO.max_entries) return error.InvalidEEPROM;
                current_sm = pdo_header.syncM;
                entries_remaining = pdo_header.n_entries;
                if (pdo_header.isUsed()) continue :state .entries else continue :state .entries_skip;
            },
            .entries => {
                if (pdo_header.syncM + 1 > res.len) return error.InvalidEEPROM;
                switch (res.get(pdo_header.syncM).sm_config.syncM_type) {
                    .mailbox_in, .mailbox_out, .not_used_or_unknown => return error.InvalidEEPROM,
                    _ => return error.InvalidEEPROM,
                    .process_data_inputs => {
                        if (catagory_type == .RXPDO) return error.InvalidEEPROM;
                    },
                    .process_data_outputs => {
                        if (catagory_type == .TXPDO) return error.InvalidEEPROM;
                    },
                }
                assert(current_sm <= max_sm);
                assert(current_sm <= res.len);
                if (entries_remaining == 0) continue :state .pdo_header;
                const entry = try wire.packFromECatReader(PDO.Entry, reader);
                entries_remaining -= 1;
                res.slice()[current_sm].bit_length += entry.bit_length;
                continue :state .entries;
            },
            .entries_skip => {
                if (entries_remaining == 0) continue :state .pdo_header;
                _ = try wire.packFromECatReader(PDO.Entry, reader);
                entries_remaining -= 1;
                continue :state .entries_skip;
            },
        }
    }
    return res;
}

I could also add an .end state to my state machine, which is probably more readable.

Iā€™m starting to think that maybe labelled switch should be axed from the language, as the feature seems to steer people towards writing long switch loops (the correct term in this case) that are difficult to test.

Complex state machines can be more effectively implemented using continuation-passing style:

const std = @import("std");
const expect = std.testing.expect;

const StateMachine = struct {
    state_a: usize = 0,
    state_b: usize = 0,
    state_c: usize = 0,

    fn performA(self: *@This(), comptime Next: type) usize {
        self.state_a += 1;
        if (self.state_a & 1 == 0) {
            return @call(.always_tail, Next.performB, .{ self, Next });
        } else {
            return @call(.always_tail, Next.performC, .{ self, Next });
        }
    }

    test "performA" {
        const Test = struct {
            var last_called: u8 = 0;

            fn performB(_: *StateMachine, comptime _: type) usize {
                last_called = 'B';
                return 0;
            }

            fn performC(_: *StateMachine, comptime _: type) usize {
                last_called = 'C';
                return 0;
            }
        };
        // increment state_a, go to B afterward if it's even
        var state1: StateMachine = .{ .state_a = 1 };
        _ = state1.performA(Test);
        try expect(state1.state_a == 2);
        try expect(Test.last_called == 'B');
        // go to C otherwise
        var state2: StateMachine = .{ .state_a = 2 };
        _ = state2.performA(Test);
        try expect(state2.state_a == 3);
        try expect(Test.last_called == 'C');
    }

    fn performB(self: *@This(), comptime Next: type) usize {
        self.state_b += 1;
        return @call(.always_tail, Next.performC, .{ self, Next });
    }

    test "performB" {
        // ...
    }

    fn performC(self: *@This(), comptime Next: type) usize {
        self.state_c += 1;
        if (self.state_c % 10 != 0) {
            return @call(.always_tail, Next.performC, .{ self, Next });
        } else {
            return @call(.always_tail, Next.performD, .{ self, Next });
        }
    }

    test "performC" {
        // ...
    }

    fn performD(self: *@This(), comptime _: type) usize {
        return self.state_a + self.state_b + self.state_c;
    }

    fn compute(self: *@This()) usize {
        return self.performA(@This());
    }

    test "start" {
        var sm1: StateMachine = .{ .state_a = 0 };
        const result1 = sm1.compute();
        try expect(result1 == 11);
        var sm2: StateMachine = .{ .state_a = 1 };
        const result2 = sm2.compute();
        try expect(result2 == 13);
    }
};

test {
    _ = StateMachine;
}

Instead a big opaque ā€œmachineā€, we have have a number of smaller parts that can be tested individually. Doing things this way results in more performant code too, since here weā€™re doing real, hardware goto. With labelled switch youā€™re basically emulating goto using a make-shift program counter.

1 Like

I havenā€™t compared the assembly, but wasnā€™t the whole point of the labeled switch statement that the compiler is adding dedicated support for creating more optimal machine code for that?

And because it is one construct that contains all the cases, I also imagine it is easier to optimize that well in the compiler then having to follow a bunch of functions, figure out that they all form a unit that resembles a state machine and then perform optimizations. I think creating optimizations for labeled switch would be more straightforward and achieving the same with functions would require more analysis, before you can reach a similar level of optimization.

These are the things that I suspect, I would have to investigate in more detail, before I can be confident about that, but I would be surprised if the function version is easier to optimize then the switch version, because the switch version syntactically closes over all the code that is involved, while the function version first needs to be analyzed that it forms a closed island of a bunch of connected functions.

I think tail calls are good and useful, I think even the issue about adding labeled switch contained a discussion about tail call constructs, however it seems to me that tail calls are more general and having a construct that is specifically useful to create state machines makes sense to me.

Also I am not convinced that there isnā€™t any good way to write tests for a switch based state machine, for example I imagine you could use comptime to add/remove a field (switch between void and a debug mode field) that is used to track invariants and if they are violated exit with an error, because I havenā€™t had time to use the construct in detail I canā€™t present you an example right now, however I think it is to soon to just say the feature should be removed, before people even had time to figure out the different ways it can be used.

Once people are familiar with the construct, they also will find out when to use it and when not to use it.

3 Likes

Youā€™re referring to labelled switch prongs. What is being discussed here is labelling the top level switch statement. This has existed in the language for a long time. In fact, itā€™s just a labelled block in disguise. These two are equivalent:

blk: switch (a){
  1 => break :blk;
  else => return;
}

blk: {
  switch (a){
    1 => break :blk;
    else => return;
  }
}
1 Like

I meant this implementation introduce labeled continue syntax inside a switch expression Ā· Issue #8220 Ā· ziglang/zig Ā· GitHub
specifically continue from inside of a prong with the new value for the switch statement.

I am a bit confused, does this only work on switch expressions or also on switch statements? no, it seems it also works on switch statements

Lets all read the Langref - Labeled-switch to get more common ground on terminology and usecases/facts. (I hadnā€™t yet since it was added)

2 Likes

Optimal code generation for interpreters (goto & computed goto):

If the operand to continue is comptime-known, then it can be lowered to an unconditional branch to the relevant case. Such a branch is perfectly predicted, and hence typically very fast to execute.

If the operand is runtime-known, each continue can embed a conditional branch inline (ideally through a jump table), which allows a CPU to predict its target independently of any other prong. A loop-based lowering would force every branch through the same dispatch point, hindering branch prediction.

2 Likes

Maybe we could remove all unlabelled breaksā€¦

Eh itā€™s a code style thing Iā€™ll just do it for my own code.

There certainly are good usage scenarios for labelled switch. For example, Iā€™m working a function that accepts both functions and function pointers. The following is nice:

    const fn_info = comptime fn_type: switch (@typeInfo(T)) {
        .pointer => |pt| continue :fn_type @typeInfo(pt.child),
        .@"fn" => |f| f,
        else => @compileError("Not a function"),
    };

This happens quite frequently with types like ?T and !T, which can typically be handled in the manner as the base type. As a mean to deal with scenario where in C weā€™d use fall-through, labelled switch is a powerful mechanism. Iā€™m willing to forgo that capability though in order to avoid the confusing situation where switch statements may or may not be loops depending on the presence of a label somewhere.

Could you explain what the issue is with some switches being loops and others containing only forward branches? I donā€™t see what would be confusing about it.

If it helps make a difference, a switch is only a loop if it both has a label (label: switch (x) {}) and contains a matching labeled continue with a value (continue :label y). If both of these conditions are not met, the switch is not a loop.

Well, the OP of this thread could also have typed in the label for the break and we wouldnā€™t be having this conversation. But weā€™re human and humans make mistakes. We have neither perfect memory or perception. While browsing through code itā€™s easy to miss the label. Why would my brain be on the lookout for this information anyway when switch has always meant until now what the name describesā€“a switch.

1 Like

But that still doesnā€™t explain what the problem with switch-as-loop itself is. The problem in the OP with break not targeting the intended block appears to be mostly orthogonal from switch statements as a looping construct. Even before labeled switches, you could already run into similar problems if you had an outer loop containing a nested labeled block:

for (values) |x| {
    blk: {
        if (x == 100) break :blk;
        doSomething(x);
        break; // oops, we forgot the label
    }
    doSomeOtherThing(x);
}

I donā€™t think that means that labeled blocks should be removed. Perhaps it would be useful if it was a compile error to use unlabeled continues or breaks in situations where there could be multiple intended targets and the authorā€™s intent is ambiguous?

If what you mean by this is ā€œZig switches should not loop because switches in C and C-influenced languages donā€™t loopā€, then that seems like a good example of a local maximum, which is something Zig deliberately strives to avoid.

3 Likes

I label breaks unless itā€™s trivially obvious what they apply to, my criterion for ā€œtrivially obviousā€ is a loop which isnā€™t embedded in any larger control structure other than the function, and fits on about half of the screen.

Labeling all breaks and continues as a matter of policy is a defensible choice, I think if we had to label every break weā€™d end up with a bunch of this:

loop: for(slices) |slice| {
     if (slice.a == 5 and slice.b == 9) {
         found_it = true;
         break :loop;
     }
}

Which isnā€™t bad necessarily, but it is a bit tedious. This kind of short loop with a break condition is common enough that I favor keeping the anonymous form.

Labeling every use of break and continue is a surefire way of making sure they go where you intend though.

4 Likes

THIS! Iā€™ve been trying to draft a proposal to this effect for months - that break (or continue, mutatis mutandis) may only be unlabeled if it is inside exactly one break-able block, which is itself unlabeled. The somewhat arbitrary ā€œapplies to the innermost blockā€ convention strikes me as a necessity invented in the pre-labeled days of BCPL which, despite there now being clearer alternatives, lives on due to familiarity and ease of writing.

Iā€™ve been bitten by this in both C and Zig, since a change in one line of code (e.g. changing an if to a while) can affect the behavior of a break that is several screens away. With a restriction on unlabeled breaks, my change would have caused a compile error showing exactly where the problem was, instead of forcing me to spend time diagnosing runtime misbehavior.

3 Likes