Edit: moved from previous thread.
Hey, @endlessly_amused, I’d recommend trying it out on godbolt before going to github. In terms of branches, it’s not doing quite what I’d expect.
First, let’s reduce the function you’ve provided by directly returning the results:
fn isDigit_new(c: u8) bool {
return switch (c>>3) {
6 => true,
7 => c & 6 == 0,
else => false,
};
}
Either way, the generated assembly has quite a few jumps/labels (here’s your original version on release fast):
isDigit_new:
push rbp
mov rbp, rsp
sub rsp, 3
mov al, dil
mov byte ptr [rbp - 3], al
mov byte ptr [rbp - 1], al
shr al, 3
mov byte ptr [rbp - 2], al
sub al, 6
je .LBB0_2
jmp .LBB0_6
.LBB0_6:
mov al, byte ptr [rbp - 2]
sub al, 7
je .LBB0_3
jmp .LBB0_1
.LBB0_1:
xor eax, eax
and al, 1
movzx eax, al
add rsp, 3
pop rbp
ret
.LBB0_2:
mov al, 1
and al, 1
movzx eax, al
add rsp, 3
pop rbp
ret
.LBB0_3:
mov al, byte ptr [rbp - 3]
and al, 6
cmp al, 0
jne .LBB0_5
mov al, 1
and al, 1
movzx eax, al
add rsp, 3
pop rbp
ret
.LBB0_5:
xor eax, eax
and al, 1
movzx eax, al
add rsp, 3
pop rbp
ret
Meanwhile, here’s what’s getting generated for the original version:
isDigit:
push rbp
mov rbp, rsp
sub rsp, 3
mov al, dil
mov byte ptr [rbp - 2], al
mov byte ptr [rbp - 1], al
jmp .LBB0_2
.LBB0_1:
mov al, byte ptr [rbp - 3]
and al, 1
movzx eax, al
add rsp, 3
pop rbp
ret
.LBB0_2:
mov cl, byte ptr [rbp - 2]
cmp cl, 48
setae al
cmp cl, 57
setbe cl
and al, cl
test al, 1
jne .LBB0_3
jmp .LBB0_4
.LBB0_3:
mov al, 1
mov byte ptr [rbp - 3], al
jmp .LBB0_1
.LBB0_4:
xor eax, eax
mov byte ptr [rbp - 3], al
jmp .LBB0_1
The one in the standard is generating less lables and jumps. Is that really indicative of performance? I’m happy to hear other takes on this, but instruction count does matter for simple things. The one in the standard generates 33 total while the one you’re proposing generates 49… we’d have to look at those extra instructions and the operations being generated to see if they’re worth it.
I’d recommend going one lower than this actually:
export fn isDigit_new(c: u8) bool {
return ('0' <= c and c <= '9');
}
Gets about 28 instructions
Now, again… is that actually better? I’m just counting instructions and paying attention to the number of labels and jumps but eh… probably not that big of a difference.
Edit: See @Eisenhauer’s answer for the results on ReleaseFast: I am a noob. How do I submit new code to the standard library of zig? :) - #5