here is a zig rendition of a string-scanning state machine used as part of the CoreMark benchmark suite:
const StringBuf = [*]u8;
const State = enum {
START,
INVALID,
S1,
S2,
INT,
FLOAT,
EXPONENT,
SCIENTIFIC,
};
...
fn nextState(pstr: *StringBuf, transcnt: [*]u32) State {
var str = pstr.*;
var state = State.START;
while (str[0] != 0 and state != State.INVALID) : (str += 1) {
asm volatile ("bkpt");
const ch = str[0];
if (ch == ',') {
str += 1;
break;
}
switch (state) {
.START => {
if (isDigit(ch)) {
state = .INT;
} else if (ch == '+' or ch == '-') {
state = .S1;
} else if (ch == '.') {
state = .FLOAT;
} else {
state = .INVALID;
transcnt[ord(.INVALID)] += 1;
}
transcnt[ord(.START)] += 1;
},
.S1 => {
if (isDigit(ch)) {
state = .INT;
transcnt[ord(.S1)] += 1;
} else if (ch == '.') {
state = .FLOAT;
transcnt[ord(.S1)] += 1;
} else {
state = .INVALID;
transcnt[ord(.S1)] += 1;
}
},
.INT => {
if (ch == '.') {
state = .FLOAT;
transcnt[ord(.INT)] += 1;
} else if (!isDigit(ch)) {
state = .INVALID;
transcnt[ord(.INT)] += 1;
}
},
.FLOAT => {
if (ch == 'E' or ch == 'e') {
state = .S2;
transcnt[ord(.FLOAT)] += 1;
} else if (!isDigit(ch)) {
state = .INVALID;
transcnt[ord(.FLOAT)] += 1;
}
},
.S2 => {
if (ch == '+' or ch == '-') {
state = .EXPONENT;
transcnt[ord(.S2)] += 1;
} else {
state = .INVALID;
transcnt[ord(.S2)] += 1;
}
},
.EXPONENT => {
if (isDigit(ch)) {
state = .SCIENTIFIC;
transcnt[ord(.EXPONENT)] += 1;
} else {
state = .INVALID;
transcnt[ord(.EXPONENT)] += 1;
}
},
.SCIENTIFIC => {
if (!isDigit(ch)) {
state = .INVALID;
transcnt[ord(.INVALID)] += 1;
}
},
.INVALID => {},
}
}
pstr.* = str;
return state;
}
in the official CoreMark suite, the buffer contains a variety of comma-separated numeric strings which nextState
will scan, track state transitions, update a pointer to the buffer, and return the next state…
the buffer begins with the characters "5012, ..."
, so it will visit the states START
, INT
, INT
, INT
, INT
before hitting INVALID
which forces the function to return…
i’ve cross-compiled this code for a cortex-m0+ MCU (ReleaseSmall
), and can easily measure execution time with my logic analyzer; i can also single step through the generated asm using a debugger…
so what’s the problem??? well, i have a c++ version of the same function compiled for the same MCU – albeit using’s segger’s LLVM-based compiler… and the c++ version performs SLIGHTLY BETTER for reasons i don’t fully understand…
here’s the c++ code, which i should note is actually GENERATED by a higher-level programming tool i’m in the process of “zigify-ing”:
State nextState(em_char** pStr, em$ArrC< em_uint32 > transCnt) {
auto str = *pStr;
auto state = State::START;
for ( ; *str && state != State::INVALID; str++) {
asm ("bkpt");
auto ch = *str;
if (ch == ',') {
str++;
break;
}
switch (state) {
case State::START: {
if (isDigit(ch)) {
state = State::INT;
}
else if (ch == '+' || ch == '-') {
state = State::S1;
}
else if (ch == '.') {
state = State::FLOAT;
}
else {
state = State::INVALID;
transCnt[ord(State::INVALID)] += 1;
}
transCnt[ord(State::START)] += 1;
break;
break;
}
case State::S1: {
if (isDigit(ch)) {
state = State::INT;
transCnt[ord(State::S1)] += 1;
}
else if (ch == '.') {
state = State::FLOAT;
transCnt[ord(State::S1)] += 1;
}
else {
state = State::INVALID;
transCnt[ord(State::S1)] += 1;
}
break;
break;
}
case State::INT: {
if (ch == '.') {
state = State::FLOAT;
transCnt[ord(State::INT)] += 1;
}
else if (!isDigit(ch)) {
state = State::INVALID;
transCnt[ord(State::INT)] += 1;
}
break;
break;
}
case State::FLOAT: {
if (ch == 'E' || ch == 'e') {
state = State::S2;
transCnt[ord(State::FLOAT)] += 1;
}
else if (!isDigit(ch)) {
state = State::INVALID;
transCnt[ord(State::FLOAT)] += 1;
}
break;
break;
}
case State::S2: {
if (ch == '+' || ch == '-') {
state = State::EXPONENT;
transCnt[ord(State::S2)] += 1;
}
else {
state = State::INVALID;
transCnt[ord(State::S2)] += 1;
}
break;
break;
}
case State::EXPONENT: {
if (isDigit(ch)) {
state = State::SCIENTIFIC;
transCnt[ord(State::EXPONENT)] += 1;
}
else {
state = State::INVALID;
transCnt[ord(State::EXPONENT)] += 1;
}
break;
break;
}
case State::SCIENTIFIC: {
if (!isDigit(ch)) {
state = State::INVALID;
transCnt[ord(State::INVALID)] += 1;
}
break;
break;
}
}
}
*pStr = str;
return state;
}
simply counting instructions from "bkpt"
to "bkpt"
, the first five state transitions in the zig version respectively execute [32,26,26,26,29] instructions… by contrast, the c++ version only executes [26,24,24,24,24,19] instructions…
(yes, some of these instructions take multiple cycles; but the disparity in instruction counts does line up with the slightly faster execution time of c++ over zig…)
some very simple experiements with basic string traversal suggest that zig is on parity with c/c++… perhaps the problem is in the implementation of the switch
statement??? does zig do an extra runtime check of the state
variable to ensure it remains in range???
you’ll also note that the c++ switch
does NOT handle the INVALID
case – since in fact this is already filtered in the top-level while
loop… zig, of course, requires me to cover the entire domain… is this the problem???
any thoughts/insights/suggestions would be much appreciated…
why is this important??? because the zig version otherwise OUTPERFORMS c++ in the CoreMark list-processing and matrix-manipulation algorithms…
if we can address this problem, our OVERALL CoreMark score will be the clear winner… right now, the impact of the state-machine is putting zig slightly behind c/c++…
help me make the case that zig offers higher-level programming AND higher-levels of performance just like those old bud-lite ads – tastes great AND less filling