Hi all.
Since I was in the mood to learn something about the zig compiler, which has always seemed somewhat impenetrable, I thought I’d pick an issue and try to follow all the parts that are relevant to it. Randomly I picked https://codeberg.org/ziglang/zig/issues/31133.
With 0.16.0, it indeed reproduces, and with this diff:
diff --git a/src/codegen/x86_64/CodeGen.zig b/src/codegen/x86_64/CodeGen.zig
index 452d7e2136..44e291b419 100644
--- a/src/codegen/x86_64/CodeGen.zig
+++ b/src/codegen/x86_64/CodeGen.zig
@@ -176430,8 +176430,9 @@ fn airCondBr(self: *CodeGen, inst: Air.Inst.Index) !void {
if (cond_br.condition.toIndex()) |op_inst| try self.processDeath(op_inst, .{});
}
- const state = try self.saveState();
+ // Since genCondBrMir can allocate a register, we saveState after it's done, not before.
const reloc = try self.genCondBrMir(cond_ty, cond);
+ const state = try self.saveState();
for (liveness_cond_br.then_deaths) |death| try self.processDeath(death, .{});
try self.genBodyBlock(then_body);
it seems to compile correctly. Now, the reason I tagged this as “explain” is because I’m of course out of my depth here and I’d like to get some second opinion(s) on my thoughts.
What I’ll do is just explain in chronological order what I did and how I went about it (I took some notes), and then hopefully if something is clearly wrong or a bad idea someone can point it out to me. I had never done it before so I thought I could do it (:
It may be a long post (so I put the diff at the start) and mostly stream-of-consciousness. Maybe it’s interesting.
The first thing I try is to fiddle with the repro a bunch, but it seems to be very precise, and reduced to a minimal example pretty well already. -fstrip? It compiles correctly. One less or dummy3 != 0? It works (but more are OK). And so on.
Some more attempts and I get it to look like this:
const std = @import("std");
var min: usize = 100;
var dummy7: usize = 0;
pub fn main() void {
std.debug.print("s={}\n", .{repro()});
}
fn repro() usize {
var sum: usize = 0;
for (min..110) |i| {
{
const dummy0: usize = dummy7 + 1;
const dummy1: i64 = @as(i64, @intCast(dummy0)) - 1;
for (dummy7..dummy0) |dummy8| {
const dummy2: i64 = dummy1 + 0;
const dummy3: i64 = @intCast(dummy8);
const dummy4 = dummy3 != 0 or dummy3 != 0;
const dummy5 = dummy3 != 0 or dummy3 != 0 or dummy3 != 0;
if (dummy2 != 0) {}
if (!dummy4) {}
_ = dummy5;
const dummy6: usize = dummy0 + dummy8;
_ = dummy6;
}
}
sum += i;
}
return sum;
}
which is a bit simpler. This wrongly prints 145 (instead of 1045).
When I change the first line of the inner loop to const dummy2: i64 = dummy1; (no + 0), it compiles correctly, so I decide this is close enough to compare against each other.
It seems the parameter didn’t matter, it just had to be a runtime thing.
What seems to be going on is that the variable i is being overwritten from 100 → 0, somehow.
I note it’s tagged as a backend bug, and I know roughly that this means something goes wrong “after Air”, but I’ll look at that just to be sure. So I build a debug build of the compiler (with just zig build) and then use --verbose-air. Indeed the only difference is an extra add_with_overflow(var, 0) stored in an extra variable and the rest seems to be the same. So that doesn’t do much.
Then I compare the asm for both (with objdump), which is indeed substantially different. Since I can use the practice, I go through this line by line and try to annotate what’s going on.
if you're curious
0000000001149ab0 <bad.repro>:
1149ab0: 55 push %rbp
1149ab1: 48 89 e5 mov %rsp,%rbp
1149ab4: 41 57 push %r15
1149ab6: 41 56 push %r14
1149ab8: 41 55 push %r13
1149aba: 41 54 push %r12
1149abc: 41 53 push %r11
1149abe: 48 81 ec 88 01 00 00 sub $0x188,%rsp
1149ac5: 48 c7 04 24 00 00 00 movq $0x0,(%rsp) ; sum = 0
1149acc: 00
1149acd: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp) ; loop idx = 0
1149ad4: 00 00
1149ad6: 48 c7 c2 00 1c 20 01 mov $0x1201c00,%rdx ; addr of min
1149add: 48 8b 02 mov (%rdx),%rax ; rax = min
1149ae0: ba 6e 00 00 00 mov $0x6e,%edx ; edx = 110
1149ae5: 48 29 c2 sub %rax,%rdx ; rdx = 110 - min
1149ae8: 48 89 54 24 10 mov %rdx,0x10(%rsp) ; save it
1149aed: 0f 92 44 24 18 setb 0x18(%rsp) ; save CF
1149af2: 8a 5c 24 18 mov 0x18(%rsp),%bl ; move CF back
1149af6: 48 89 54 24 20 mov %rdx,0x20(%rsp) ; save rdx *again*
1149afb: 0f 92 44 24 28 setb 0x28(%rsp) ; save CF *again*
1149b00: 80 fb 01 cmp $0x1,%bl ; check for underflow in 110-min calculation
1149b03: 0f 85 05 00 00 00 jne 1149b0e <bad.repro+0x5e> ; if not 1 (no underflow), skip panic call
1149b09: e8 b2 eb ed ff call 10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
1149b0e: 48 8b 54 24 20 mov 0x20(%rsp),%rdx ; continue with rdx
1149b13: e9 00 00 00 00 jmp 1149b18 <bad.repro+0x68> ; skip nothing ( enter the loop I guess -- actually, enter the `i` computation block)
1149b18: 48 8b 5c 24 08 mov 0x8(%rsp),%rbx ; move loop idx into rbx
1149b1d: 48 89 d9 mov %rbx,%rcx ; also into rcx
1149b20: 48 89 d6 mov %rdx,%rsi ; rsi = 110 - min
1149b23: 48 39 f1 cmp %rsi,%rcx ; compare from rsi to rcx (i.e. rcx - rsi)
1149b26: 0f 83 b3 03 00 00 jae 1149edf <bad.repro+0x42f> ; check if rcx - rsi (= idx - 10) is >= 0 i.e. idx >= 10 i.e. we're done looping, if so jump out
1149b2c: 48 89 c1 mov %rax,%rcx ; rcx = min again
1149b2f: 48 01 d9 add %rbx,%rcx ; rcx = min+idx (= i)
1149b32: 48 89 4c 24 30 mov %rcx,0x30(%rsp) ; save to stack
1149b37: 0f 92 44 24 38 setb 0x38(%rsp) ; with overflow bit
1149b3c: 40 8a 74 24 38 mov 0x38(%rsp),%sil ; move overflow bit to sil
1149b41: 48 89 4c 24 40 mov %rcx,0x40(%rsp) ; save to stack again
1149b46: 0f 92 44 24 48 setb 0x48(%rsp) ; ditto
1149b4b: 40 80 fe 01 cmp $0x1,%sil ; check if overflow
1149b4f: 0f 85 05 00 00 00 jne 1149b5a <bad.repro+0xaa> ; if no overflow, skip panic
1149b55: e8 66 eb ed ff call 10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
1149b5a: 48 8b 4c 24 40 mov 0x40(%rsp),%rcx ; back
1149b5f: e9 00 00 00 00 jmp 1149b64 <bad.repro+0xb4> ; jump into loop body, finally
1149b64: 48 89 4c 24 50 mov %rcx,0x50(%rsp) ; save stuff
1149b69: 48 c7 c7 80 af 21 01 mov $0x121af80,%rdi ; load addr of dummy7
1149b70: 48 8b 37 mov (%rdi),%rsi ; load dummy7
1149b73: 48 83 c6 01 add $0x1,%rsi ; add 1 (this is dummy0)
1149b77: 48 89 74 24 58 mov %rsi,0x58(%rsp) ; save to stack (this first one is for the panic check)
1149b7c: 0f 92 44 24 60 setb 0x60(%rsp) ; save CF
1149b81: 40 8a 7c 24 60 mov 0x60(%rsp),%dil ; load CF
1149b86: 48 89 74 24 68 mov %rsi,0x68(%rsp) ; idem (this save is for the later use in the fn)
1149b8b: 0f 92 44 24 70 setb 0x70(%rsp)
1149b90: 40 80 ff 01 cmp $0x1,%dil ; overflow?
1149b94: 0f 85 05 00 00 00 jne 1149b9f <bad.repro+0xef>
1149b9a: e8 21 eb ed ff call 10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
1149b9f: 48 8b 74 24 68 mov 0x68(%rsp),%rsi ; ...which we load here to continue (rsi = dummy0 = 1 now)
1149ba4: e9 00 00 00 00 jmp 1149ba9 <bad.repro+0xf9>
1149ba9: 48 89 74 24 78 mov %rsi,0x78(%rsp)
1149bae: 48 bf ff ff ff ff ff movabs $0x7fffffffffffffff,%rdi ; i64max in rdi
1149bb5: ff ff 7f
1149bb8: 48 39 f7 cmp %rsi,%rdi ; rdi - rsi
1149bbb: 0f 83 05 00 00 00 jae 1149bc6 <bad.repro+0x116> >= 0? i.e. is rsi <= i64max? if so, skip out of bounds panic
1149bc1: e8 4a eb ed ff call 1028710 <debug.FullPanic((function 'defaultPanic')).integerOutOfBounds>
1149bc6: 48 89 f7 mov %rsi,%rdi ; rsi = dummy0 gets moved to rdi, then we sub1 for the dummy1 computation, then signed overflow check (seto this time)
1149bc9: e9 00 00 00 00 jmp 1149bce <bad.repro+0x11e>
1149bce: 48 83 ef 01 sub $0x1,%rdi
1149bd2: 48 89 bc 24 80 00 00 mov %rdi,0x80(%rsp)
1149bd9: 00
1149bda: 0f 90 84 24 88 00 00 seto 0x88(%rsp)
1149be1: 00
1149be2: 44 8a 84 24 88 00 00 mov 0x88(%rsp),%r8b
1149be9: 00
1149bea: 48 89 bc 24 90 00 00 mov %rdi,0x90(%rsp)
1149bf1: 00
1149bf2: 0f 90 84 24 98 00 00 seto 0x98(%rsp)
1149bf9: 00
1149bfa: 41 80 f8 01 cmp $0x1,%r8b
1149bfe: 0f 85 05 00 00 00 jne 1149c09 <bad.repro+0x159>
1149c04: e8 b7 ea ed ff call 10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
1149c09: 48 8b bc 24 90 00 00 mov 0x90(%rsp),%rdi ; we continue with dummy1 in rdi (which is 0)
1149c10: 00
1149c11: e9 00 00 00 00 jmp 1149c16 <bad.repro+0x166>
1149c16: 48 89 bc 24 a0 00 00 mov %rdi,0xa0(%rsp) ; save rdi again, get another 0 for the loop idx
1149c1d: 00
1149c1e: 48 c7 84 24 a8 00 00 movq $0x0,0xa8(%rsp)
1149c25: 00 00 00 00 00
1149c2a: 49 c7 c2 80 af 21 01 mov $0x121af80,%r10
1149c31: 4d 8b 02 mov (%r10),%r8
1149c34: 49 89 f2 mov %rsi,%r10
1149c37: 4d 29 c2 sub %r8,%r10
1149c3a: 4c 89 94 24 b0 00 00 mov %r10,0xb0(%rsp)
1149c41: 00
1149c42: 0f 92 84 24 b8 00 00 setb 0xb8(%rsp)
1149c49: 00
1149c4a: 44 8a 9c 24 b8 00 00 mov 0xb8(%rsp),%r11b
1149c51: 00
1149c52: 4c 89 94 24 c0 00 00 mov %r10,0xc0(%rsp)
1149c59: 00
1149c5a: 0f 92 84 24 c8 00 00 setb 0xc8(%rsp)
1149c61: 00
1149c62: 41 80 fb 01 cmp $0x1,%r11b
1149c66: 0f 85 05 00 00 00 jne 1149c71 <bad.repro+0x1c1>
1149c6c: e8 4f ea ed ff call 10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
1149c71: 4c 8b 94 24 c0 00 00 mov 0xc0(%rsp),%r10
1149c78: 00
1149c79: e9 00 00 00 00 jmp 1149c7e <bad.repro+0x1ce>
1149c7e: 4c 8b 9c 24 a8 00 00 mov 0xa8(%rsp),%r11 ; inner loop index
1149c85: 00
1149c86: 4d 89 dc mov %r11,%r12
1149c89: 4d 89 d5 mov %r10,%r13
1149c8c: 4d 39 ec cmp %r13,%r12
1149c8f: 0f 83 d9 01 00 00 jae 1149e6e <bad.repro+0x3be> ; maybe jump out of the loop after our iterations
1149c95: 4d 89 c4 mov %r8,%r12
1149c98: 4d 01 dc add %r11,%r12
1149c9b: 4c 89 a4 24 d0 00 00 mov %r12,0xd0(%rsp)
1149ca2: 00
1149ca3: 0f 92 84 24 d8 00 00 setb 0xd8(%rsp)
1149caa: 00
1149cab: 44 8a ac 24 d8 00 00 mov 0xd8(%rsp),%r13b
1149cb2: 00
1149cb3: 4c 89 a4 24 e0 00 00 mov %r12,0xe0(%rsp)
1149cba: 00
1149cbb: 0f 92 84 24 e8 00 00 setb 0xe8(%rsp)
1149cc2: 00
1149cc3: 41 80 fd 01 cmp $0x1,%r13b
1149cc7: 0f 85 05 00 00 00 jne 1149cd2 <bad.repro+0x222> ; this must be the check for dummy8 (the loop value)
1149ccd: e8 ee e9 ed ff call 10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
1149cd2: 4c 8b a4 24 e0 00 00 mov 0xe0(%rsp),%r12 ; stored in r12
1149cd9: 00
1149cda: e9 00 00 00 00 jmp 1149cdf <bad.repro+0x22f>
1149cdf: 4c 89 a4 24 f0 00 00 mov %r12,0xf0(%rsp) ; store dummy8 (=0) and start computing dummy2
1149ce6: 00
1149ce7: 49 89 fd mov %rdi,%r13
1149cea: 49 83 c5 00 add $0x0,%r13 ; compute dummy2
1149cee: 4c 89 ac 24 f8 00 00 mov %r13,0xf8(%rsp)
1149cf5: 00
1149cf6: 0f 90 84 24 00 01 00 seto 0x100(%rsp) ; ov check
1149cfd: 00
1149cfe: 44 8a b4 24 00 01 00 mov 0x100(%rsp),%r14b
1149d05: 00
1149d06: 4c 89 ac 24 08 01 00 mov %r13,0x108(%rsp)
1149d0d: 00
1149d0e: 0f 90 84 24 10 01 00 seto 0x110(%rsp)
1149d15: 00
1149d16: 41 80 fe 01 cmp $0x1,%r14b ; did we overflow?
1149d1a: 0f 85 05 00 00 00 jne 1149d25 <bad.repro+0x275>
1149d20: e8 9b e9 ed ff call 10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
1149d25: 4c 8b ac 24 08 01 00 mov 0x108(%rsp),%r13 ; store r13 = dummy2 = 0
1149d2c: 00
1149d2d: e9 00 00 00 00 jmp 1149d32 <bad.repro+0x282>
1149d32: 4c 89 ac 24 18 01 00 mov %r13,0x118(%rsp)
1149d39: 00
1149d3a: 49 be ff ff ff ff ff movabs $0x7fffffffffffffff,%r14
1149d41: ff ff 7f
1149d44: 4d 39 e6 cmp %r12,%r14
1149d47: 0f 83 05 00 00 00 jae 1149d52 <bad.repro+0x2a2>
1149d4d: e8 be e9 ed ff call 1028710 <debug.FullPanic((function 'defaultPanic')).integerOutOfBounds>
1149d52: 4d 89 e6 mov %r12,%r14 ; store r14 = dummy3 = 0
1149d55: e9 00 00 00 00 jmp 1149d5a <bad.repro+0x2aa> ; noop
1149d5a: 4c 89 b4 24 20 01 00 mov %r14,0x120(%rsp) ; and save it
1149d61: 00
1149d62: 49 83 fe 00 cmp $0x0,%r14 ; compare dummy3 to 0
1149d66: 0f 84 08 00 00 00 je 1149d74 <bad.repro+0x2c4> ; if equal, we need to do the second check (since dummy3 != 0 isn't true so we can't early-out)
1149d6c: 41 b7 01 mov $0x1,%r15b ; otherwise, we move true/1 to r15b
1149d6f: e9 0d 00 00 00 jmp 1149d81 <bad.repro+0x2d1> ; and we can early out
1149d74: 49 83 fe 00 cmp $0x0,%r14 ; now we need the second check
1149d78: 41 0f 95 c7 setne %r15b ; r14 = dummy3 != 0?
1149d7c: e9 00 00 00 00 jmp 1149d81 <bad.repro+0x2d1> ; done
1149d81: 44 88 bc 24 78 01 00 mov %r15b,0x178(%rsp) ; we save r15b (the result of the check , dummy4)
1149d88: 00
1149d89: 49 83 fe 00 cmp $0x0,%r14 ;;;; now we start with dummy5, which has THREE d3 != 0 checks. (check 1) d3 == 0?
1149d8d: 0f 84 0d 00 00 00 je 1149da0 <bad.repro+0x2f0> ; if true, go to da0 (another check, no early out)
1149d93: c6 84 24 79 01 00 00 movb $0x1,0x179(%rsp) ; otherwise we move true to stack
1149d9a: 01
1149d9b: e9 11 00 00 00 jmp 1149db1 <bad.repro+0x301> ; and leave
1149da0: 49 83 fe 00 cmp $0x0,%r14 ; (check 2) dummy 3 = 0?
1149da4: 0f 95 84 24 79 01 00 setne 0x179(%rsp) ; move dummy3 != 0 to stack
1149dab: 00
1149dac: e9 00 00 00 00 jmp 1149db1 <bad.repro+0x301> ; noop
1149db1: 48 89 84 24 28 01 00 mov %rax,0x128(%rsp) ; (we jumped here from the first early out) save min (to 0x128!)
1149db8: 00
1149db9: 8a 84 24 79 01 00 00 mov 0x179(%rsp),%al ; take dummy3 != 0 again from stack. at this point we should have short circuited I think, but it seems we continue anyways and just keep superfluously overwriting al = 1.
1149dc0: a8 01 test $0x1,%al ; does NOT compare al to 1, but computes al & 1 and sets ZF if we get 0 (?))
1149dc2: 0f 84 07 00 00 00 je 1149dcf <bad.repro+0x31f> ; thus we jump if al = 0
1149dc8: b0 01 mov $0x1,%al ; otherwise we set al=1
1149dca: e9 14 00 00 00 jmp 1149de3 <bad.repro+0x333> ; jump to end of dummy5 computation
1149dcf: 49 83 fe 00 cmp $0x0,%r14 ; (check 3) is dummy3 == 0?
1149dd3: 48 89 84 24 28 01 00 mov %rax,0x128(%rsp) ; save rax again? note THIS OVERWRITES MIN, which explains the bug (i will now be computed as 0)
1149dda: 00
1149ddb: 0f 95 c0 setne %al ; if dummy3 != 0, al=1
1149dde: e9 00 00 00 00 jmp 1149de3 <bad.repro+0x333> ; noop
1149de3: 88 84 24 7a 01 00 00 mov %al,0x17a(%rsp) ; save computation result (dummy5) to stack
1149dea: 49 83 fd 00 cmp $0x0,%r13 ; dummy2 = 0?
1149dee: 0f 84 05 00 00 00 je 1149df9 <bad.repro+0x349> ; this must be the if (dummy2 != 0) {} noop jumps
1149df4: e9 05 00 00 00 jmp 1149dfe <bad.repro+0x34e>
1149df9: e9 00 00 00 00 jmp 1149dfe <bad.repro+0x34e>
1149dfe: 41 80 f7 01 xor $0x1,%r15b
1149e02: 41 f6 c7 01 test $0x1,%r15b
1149e06: 0f 84 05 00 00 00 je 1149e11 <bad.repro+0x361>
1149e0c: e9 05 00 00 00 jmp 1149e16 <bad.repro+0x366>
1149e11: e9 00 00 00 00 jmp 1149e16 <bad.repro+0x366>
1149e16: 48 89 f0 mov %rsi,%rax
1149e19: 4c 01 e0 add %r12,%rax
1149e1c: 48 89 84 24 30 01 00 mov %rax,0x130(%rsp)
1149e23: 00
1149e24: 0f 92 84 24 38 01 00 setb 0x138(%rsp)
1149e2b: 00
1149e2c: 44 8a a4 24 38 01 00 mov 0x138(%rsp),%r12b
1149e33: 00
1149e34: 48 89 84 24 40 01 00 mov %rax,0x140(%rsp)
1149e3b: 00
1149e3c: 0f 92 84 24 48 01 00 setb 0x148(%rsp)
1149e43: 00
1149e44: 41 80 fc 01 cmp $0x1,%r12b
1149e48: 0f 85 05 00 00 00 jne 1149e53 <bad.repro+0x3a3>
1149e4e: e8 6d e8 ed ff call 10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
1149e53: 48 8b 84 24 40 01 00 mov 0x140(%rsp),%rax
1149e5a: 00
1149e5b: e9 00 00 00 00 jmp 1149e60 <bad.repro+0x3b0>
1149e60: 48 89 84 24 50 01 00 mov %rax,0x150(%rsp)
1149e67: 00
1149e68: 90 nop
1149e69: e9 05 00 00 00 jmp 1149e73 <bad.repro+0x3c3>
1149e6e: e9 19 00 00 00 jmp 1149e8c <bad.repro+0x3dc>
1149e73: 49 83 c3 01 add $0x1,%r11
1149e77: 4c 89 9c 24 a8 00 00 mov %r11,0xa8(%rsp)
1149e7e: 00
1149e7f: 48 8b 84 24 28 01 00 mov 0x128(%rsp),%rax ; load the wrong value into rax
1149e86: 00
1149e87: e9 f2 fd ff ff jmp 1149c7e <bad.repro+0x1ce>
1149e8c: 48 8b 34 24 mov (%rsp),%rsi
1149e90: 48 01 ce add %rcx,%rsi
1149e93: 48 89 b4 24 58 01 00 mov %rsi,0x158(%rsp)
1149e9a: 00
1149e9b: 0f 92 84 24 60 01 00 setb 0x160(%rsp)
1149ea2: 00
1149ea3: 8a 8c 24 60 01 00 00 mov 0x160(%rsp),%cl
1149eaa: 48 89 b4 24 68 01 00 mov %rsi,0x168(%rsp)
1149eb1: 00
1149eb2: 0f 92 84 24 70 01 00 setb 0x170(%rsp)
1149eb9: 00
1149eba: 80 f9 01 cmp $0x1,%cl
1149ebd: 0f 85 05 00 00 00 jne 1149ec8 <bad.repro+0x418>
1149ec3: e8 f8 e7 ed ff call 10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
1149ec8: 48 8b 8c 24 68 01 00 mov 0x168(%rsp),%rcx
1149ecf: 00
1149ed0: e9 00 00 00 00 jmp 1149ed5 <bad.repro+0x425>
1149ed5: 48 89 0c 24 mov %rcx,(%rsp)
1149ed9: 90 nop
1149eda: e9 05 00 00 00 jmp 1149ee4 <bad.repro+0x434>
1149edf: e9 0e 00 00 00 jmp 1149ef2 <bad.repro+0x442>
1149ee4: 48 83 c3 01 add $0x1,%rbx
1149ee8: 48 89 5c 24 08 mov %rbx,0x8(%rsp)
1149eed: e9 26 fc ff ff jmp 1149b18 <bad.repro+0x68>
1149ef2: 48 8b 04 24 mov (%rsp),%rax
1149ef6: 48 8d 65 d8 lea -0x28(%rbp),%rsp
1149efa: 41 5b pop %r11
1149efc: 41 5c pop %r12
1149efe: 41 5d pop %r13
1149f00: 41 5e pop %r14
1149f02: 41 5f pop %r15
1149f04: 5d pop %rbp
1149f05: c3 ret
The asm seems to be confusingly inefficient (at least, so it looks to me). Variables and their overflow bits are saved twice, I guess since each has two “uses”; one for checking to panic and one for the computations later on. It’s also littered with a lot of (superfluous?) copies of all the variables to stack and back again, multiple times. I don’t understand most of the why behind the codegen, but I can guess it’s to ease debugging or to make the backend simpler or more performant(?).
In any case, the fact that i was set from 100 to 0 was important, since in fact it is min which is overwritten in the triple dummy3 != 0 checks (at 0x128(%rsp)). It looks like that’s a consequence of the very confusing asm for calculating dummy5 (really, why is the control flow so complicated?).
Now I’ve found the result of the bug, but of course I’m clueless what causes it in codegen. Looking at the Air again, probably it happens when generating for cond_br, br or cmp_neq. I know one of the codegen files is huge, 200k lines or so, so I grep that one since it surely must contain all the interesting logic. The first hit is airCondBr.
It starts with finding a bunch of instruction data that is tightly encoded in the slices. Looks precise but probably not the problem. Then there is something about liveness, which I only know to be a compiler thing to track, well, lifetimes and such. Probably this is the problem, since we “forgot” that we had saved rax at a certain place? I also note there’s a special case with a comment, but the commit it was introduced in seems unrelated. Still I’ll try to see what it does.
The function operandDies() seems very low level and just checks whether a certain flag was set, probably earlier on. It seems like liveness_cond_br is important for that but I guess not; that’s only used later in the function. Looking at what liveness_cond_br does, it has deaths for “then” and “else” cases.
Thinking about this for a bit, maybe the distinction is whether a variable dies in the body of a condition, or in one of the cases after (as in, whether it dies in the () of an if statement, or in one of the {}'s). This agrees with what the function seems to do: first generate code for the check, then for the “then case” and then the “else case”. Then I guess doing this liveness check first is just a maybe-register-saving optimization to not needlessly spill to memory. Ok.
Moving on, I see there is a call to “save state” and later state is restored. This may be the problem since we were overwriting a register which was saved in memory twice. It calls some specialized “retroactive” function which seemingly just saves anything to do with the current register state, to then restore later after the body is generated.
Then we enter a genCondBrMir call. I note what we do depends on mcv, but some of those have useful doc comments. There’s only four possibilities for mcv, and load_frame makes the most sense since we were overwriting something at an offset from the stack pointer. This case allocates a “temporary register” and then calls itself again with that reg as new mcv. The copyToTmpRegister function has a useful doc comment that we may be spilling instructions to the stack to free up a register. Exactly what was happening. Then there’s a call to jccReloc.
So, presumably we spilled rax and forgot to notify anyone or update the state we’d just saved. Maybe we can update the state somehow? Let’s see what the rest of CodeGen does with saved state.
Grepping for “self.saveState” there are only four calls. The first one in airMulDivBinOp has the same pattern: first save state, then call asmJccReloc.
The second one in genTry does it in reverse: first genCondBrMir, then check the liveness of the operand, then save state after that. Call 3 is our airCondBr and the last call in airLoopSwitchBr has nothing to do with condBrMir but some things are similar. We first do the operand liveness check, then generate a register, then save state. (This is different from what happened in genCondBrMir, where we also allocated a register, but did that after saving state.) This function enters lowerSwitchBr which seems quite complicated, but I notice there’s also a bunch of calls to “cg.saveState” – the same thing.
There’s only two calls to that; one when generating a loop instruction and one for a cmp_eq instruction (!!).
This latter one does some things I don’t understand and then again goes for asmJccReloc. Looking at that function it seems to not mess with registers and instead just adds a mir instruction.
So, probably the problem was indeed that genCondBrMir allocated a register after the register state was saved, invalidating the saved state. This invalid state is then loaded later and on the basis of that we overwrite rax/al with 0, which then gets used as the value of min.
To try and verify my idea, I try to just move the saveState call after genCondBrMir. I rebuild the compiler, one recompilation of the repro, and… 1045. So it seems like that was indeed it.
Some questions: is this the “proper dev workflow”? Looking at the assembly and then working backwards to think what part of codegen was probably responsible for it based on what the asm is doing? I see something about dev.check() in codegen.generateFunction but I’m clueless how to use that. I imagine it’s not documented.
Is there a way to view Mir output, like with Air? Would that be useful? Reading the asm and comparing it to the source code took me some time, but probably I’d go faster if I did it more often.
Also, probably it’s time to open a book on compilers. Any recommendations?