Chasing a miscompilation (31133)

Hi all.

Since I was in the mood to learn something about the zig compiler, which has always seemed somewhat impenetrable, I thought I’d pick an issue and try to follow all the parts that are relevant to it. Randomly I picked https://codeberg.org/ziglang/zig/issues/31133.

With 0.16.0, it indeed reproduces, and with this diff:

diff --git a/src/codegen/x86_64/CodeGen.zig b/src/codegen/x86_64/CodeGen.zig
index 452d7e2136..44e291b419 100644
--- a/src/codegen/x86_64/CodeGen.zig
+++ b/src/codegen/x86_64/CodeGen.zig
@@ -176430,8 +176430,9 @@ fn airCondBr(self: *CodeGen, inst: Air.Inst.Index) !void {
         if (cond_br.condition.toIndex()) |op_inst| try self.processDeath(op_inst, .{});
     }

-    const state = try self.saveState();
+    // Since genCondBrMir can allocate a register, we saveState after it's done, not before.
     const reloc = try self.genCondBrMir(cond_ty, cond);
+    const state = try self.saveState();

     for (liveness_cond_br.then_deaths) |death| try self.processDeath(death, .{});
     try self.genBodyBlock(then_body);

it seems to compile correctly. Now, the reason I tagged this as “explain” is because I’m of course out of my depth here and I’d like to get some second opinion(s) on my thoughts.
What I’ll do is just explain in chronological order what I did and how I went about it (I took some notes), and then hopefully if something is clearly wrong or a bad idea someone can point it out to me. I had never done it before so I thought I could do it (:

It may be a long post (so I put the diff at the start) and mostly stream-of-consciousness. Maybe it’s interesting.

The first thing I try is to fiddle with the repro a bunch, but it seems to be very precise, and reduced to a minimal example pretty well already. -fstrip? It compiles correctly. One less or dummy3 != 0? It works (but more are OK). And so on.
Some more attempts and I get it to look like this:

const std = @import("std");

var min: usize = 100;
var dummy7: usize = 0;

pub fn main() void {
    std.debug.print("s={}\n", .{repro()});
}

fn repro() usize {
    var sum: usize = 0;
    for (min..110) |i| {
        {
            const dummy0: usize = dummy7 + 1;
            const dummy1: i64 = @as(i64, @intCast(dummy0)) - 1;
            for (dummy7..dummy0) |dummy8| {
                const dummy2: i64 = dummy1 + 0;
                const dummy3: i64 = @intCast(dummy8);
                const dummy4 = dummy3 != 0 or dummy3 != 0;
                const dummy5 = dummy3 != 0 or dummy3 != 0 or dummy3 != 0;
                if (dummy2 != 0) {}
                if (!dummy4) {}
                _ = dummy5;
                const dummy6: usize = dummy0 + dummy8;
                _ = dummy6;
            }
        }
        sum += i;
    }
    return sum;
}

which is a bit simpler. This wrongly prints 145 (instead of 1045).
When I change the first line of the inner loop to const dummy2: i64 = dummy1; (no + 0), it compiles correctly, so I decide this is close enough to compare against each other.

It seems the parameter didn’t matter, it just had to be a runtime thing.
What seems to be going on is that the variable i is being overwritten from 100 → 0, somehow.

I note it’s tagged as a backend bug, and I know roughly that this means something goes wrong “after Air”, but I’ll look at that just to be sure. So I build a debug build of the compiler (with just zig build) and then use --verbose-air. Indeed the only difference is an extra add_with_overflow(var, 0) stored in an extra variable and the rest seems to be the same. So that doesn’t do much.

Then I compare the asm for both (with objdump), which is indeed substantially different. Since I can use the practice, I go through this line by line and try to annotate what’s going on.

if you're curious
0000000001149ab0 <bad.repro>:
 1149ab0:	55                   	push   %rbp
 1149ab1:	48 89 e5             	mov    %rsp,%rbp
 1149ab4:	41 57                	push   %r15
 1149ab6:	41 56                	push   %r14
 1149ab8:	41 55                	push   %r13
 1149aba:	41 54                	push   %r12
 1149abc:	41 53                	push   %r11
 1149abe:	48 81 ec 88 01 00 00 	sub    $0x188,%rsp
 1149ac5:	48 c7 04 24 00 00 00 	movq   $0x0,(%rsp) ; sum = 0
 1149acc:	00 
 1149acd:	48 c7 44 24 08 00 00 	movq   $0x0,0x8(%rsp) ; loop idx = 0
 1149ad4:	00 00 
 1149ad6:	48 c7 c2 00 1c 20 01 	mov    $0x1201c00,%rdx ; addr of min
 1149add:	48 8b 02             	mov    (%rdx),%rax ; rax = min
 1149ae0:	ba 6e 00 00 00       	mov    $0x6e,%edx ; edx = 110
 1149ae5:	48 29 c2             	sub    %rax,%rdx ; rdx = 110 - min
 1149ae8:	48 89 54 24 10       	mov    %rdx,0x10(%rsp) ; save it
 1149aed:	0f 92 44 24 18       	setb   0x18(%rsp) ; save CF
 1149af2:	8a 5c 24 18          	mov    0x18(%rsp),%bl ; move CF back
 1149af6:	48 89 54 24 20       	mov    %rdx,0x20(%rsp) ; save rdx *again*
 1149afb:	0f 92 44 24 28       	setb   0x28(%rsp) ; save CF *again*
 1149b00:	80 fb 01             	cmp    $0x1,%bl ; check for underflow in 110-min calculation
 1149b03:	0f 85 05 00 00 00    	jne    1149b0e <bad.repro+0x5e> ; if not 1 (no underflow), skip panic call
 1149b09:	e8 b2 eb ed ff       	call   10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
 1149b0e:	48 8b 54 24 20       	mov    0x20(%rsp),%rdx ; continue with rdx
 1149b13:	e9 00 00 00 00       	jmp    1149b18 <bad.repro+0x68> ; skip nothing ( enter the loop I guess -- actually, enter the `i` computation block)
 1149b18:	48 8b 5c 24 08       	mov    0x8(%rsp),%rbx ; move loop idx into rbx
 1149b1d:	48 89 d9             	mov    %rbx,%rcx ; also into rcx
 1149b20:	48 89 d6             	mov    %rdx,%rsi ; rsi = 110 - min
 1149b23:	48 39 f1             	cmp    %rsi,%rcx ; compare from rsi to rcx (i.e. rcx - rsi)
 1149b26:	0f 83 b3 03 00 00    	jae    1149edf <bad.repro+0x42f> ; check if rcx - rsi (= idx - 10) is >= 0 i.e. idx >= 10 i.e. we're done looping, if so jump out
 1149b2c:	48 89 c1             	mov    %rax,%rcx ; rcx = min again
 1149b2f:	48 01 d9             	add    %rbx,%rcx ; rcx = min+idx (= i)
 1149b32:	48 89 4c 24 30       	mov    %rcx,0x30(%rsp) ; save to stack
 1149b37:	0f 92 44 24 38       	setb   0x38(%rsp) ; with overflow bit
 1149b3c:	40 8a 74 24 38       	mov    0x38(%rsp),%sil ; move overflow bit to sil
 1149b41:	48 89 4c 24 40       	mov    %rcx,0x40(%rsp) ; save to stack again
 1149b46:	0f 92 44 24 48       	setb   0x48(%rsp) ; ditto
 1149b4b:	40 80 fe 01          	cmp    $0x1,%sil ; check if overflow
 1149b4f:	0f 85 05 00 00 00    	jne    1149b5a <bad.repro+0xaa> ; if no overflow, skip panic
 1149b55:	e8 66 eb ed ff       	call   10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
 1149b5a:	48 8b 4c 24 40       	mov    0x40(%rsp),%rcx ; back
 1149b5f:	e9 00 00 00 00       	jmp    1149b64 <bad.repro+0xb4> ; jump into loop body, finally
 1149b64:	48 89 4c 24 50       	mov    %rcx,0x50(%rsp) ; save stuff
 1149b69:	48 c7 c7 80 af 21 01 	mov    $0x121af80,%rdi ; load addr of dummy7
 1149b70:	48 8b 37             	mov    (%rdi),%rsi ; load dummy7
 1149b73:	48 83 c6 01          	add    $0x1,%rsi ; add 1 (this is dummy0)
 1149b77:	48 89 74 24 58       	mov    %rsi,0x58(%rsp) ; save to stack (this first one is for the panic check)
 1149b7c:	0f 92 44 24 60       	setb   0x60(%rsp) ; save CF
 1149b81:	40 8a 7c 24 60       	mov    0x60(%rsp),%dil ; load CF
 1149b86:	48 89 74 24 68       	mov    %rsi,0x68(%rsp) ; idem (this save is for the later use in the fn)
 1149b8b:	0f 92 44 24 70       	setb   0x70(%rsp)
 1149b90:	40 80 ff 01          	cmp    $0x1,%dil ; overflow?
 1149b94:	0f 85 05 00 00 00    	jne    1149b9f <bad.repro+0xef>
 1149b9a:	e8 21 eb ed ff       	call   10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
 1149b9f:	48 8b 74 24 68       	mov    0x68(%rsp),%rsi ; ...which we load here to continue (rsi = dummy0 = 1 now)
 1149ba4:	e9 00 00 00 00       	jmp    1149ba9 <bad.repro+0xf9>
 1149ba9:	48 89 74 24 78       	mov    %rsi,0x78(%rsp)
 1149bae:	48 bf ff ff ff ff ff 	movabs $0x7fffffffffffffff,%rdi ; i64max in rdi
 1149bb5:	ff ff 7f 
 1149bb8:	48 39 f7             	cmp    %rsi,%rdi ; rdi - rsi
 1149bbb:	0f 83 05 00 00 00    	jae    1149bc6 <bad.repro+0x116> >= 0? i.e. is rsi <= i64max? if so, skip out of bounds panic
 1149bc1:	e8 4a eb ed ff       	call   1028710 <debug.FullPanic((function 'defaultPanic')).integerOutOfBounds>
 1149bc6:	48 89 f7             	mov    %rsi,%rdi ; rsi = dummy0 gets moved to rdi, then we sub1 for the dummy1 computation, then signed overflow check (seto this time)
 1149bc9:	e9 00 00 00 00       	jmp    1149bce <bad.repro+0x11e>
 1149bce:	48 83 ef 01          	sub    $0x1,%rdi
 1149bd2:	48 89 bc 24 80 00 00 	mov    %rdi,0x80(%rsp)
 1149bd9:	00 
 1149bda:	0f 90 84 24 88 00 00 	seto   0x88(%rsp)
 1149be1:	00 
 1149be2:	44 8a 84 24 88 00 00 	mov    0x88(%rsp),%r8b
 1149be9:	00 
 1149bea:	48 89 bc 24 90 00 00 	mov    %rdi,0x90(%rsp)
 1149bf1:	00 
 1149bf2:	0f 90 84 24 98 00 00 	seto   0x98(%rsp)
 1149bf9:	00 
 1149bfa:	41 80 f8 01          	cmp    $0x1,%r8b
 1149bfe:	0f 85 05 00 00 00    	jne    1149c09 <bad.repro+0x159>
 1149c04:	e8 b7 ea ed ff       	call   10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
 1149c09:	48 8b bc 24 90 00 00 	mov    0x90(%rsp),%rdi ; we continue with dummy1 in rdi (which is 0)
 1149c10:	00 
 1149c11:	e9 00 00 00 00       	jmp    1149c16 <bad.repro+0x166>
 1149c16:	48 89 bc 24 a0 00 00 	mov    %rdi,0xa0(%rsp) ; save rdi again, get another 0 for the loop idx
 1149c1d:	00 
 1149c1e:	48 c7 84 24 a8 00 00 	movq   $0x0,0xa8(%rsp)
 1149c25:	00 00 00 00 00 
 1149c2a:	49 c7 c2 80 af 21 01 	mov    $0x121af80,%r10
 1149c31:	4d 8b 02             	mov    (%r10),%r8
 1149c34:	49 89 f2             	mov    %rsi,%r10
 1149c37:	4d 29 c2             	sub    %r8,%r10
 1149c3a:	4c 89 94 24 b0 00 00 	mov    %r10,0xb0(%rsp)
 1149c41:	00 
 1149c42:	0f 92 84 24 b8 00 00 	setb   0xb8(%rsp)
 1149c49:	00 
 1149c4a:	44 8a 9c 24 b8 00 00 	mov    0xb8(%rsp),%r11b
 1149c51:	00 
 1149c52:	4c 89 94 24 c0 00 00 	mov    %r10,0xc0(%rsp)
 1149c59:	00 
 1149c5a:	0f 92 84 24 c8 00 00 	setb   0xc8(%rsp)
 1149c61:	00 
 1149c62:	41 80 fb 01          	cmp    $0x1,%r11b
 1149c66:	0f 85 05 00 00 00    	jne    1149c71 <bad.repro+0x1c1>
 1149c6c:	e8 4f ea ed ff       	call   10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
 1149c71:	4c 8b 94 24 c0 00 00 	mov    0xc0(%rsp),%r10
 1149c78:	00 
 1149c79:	e9 00 00 00 00       	jmp    1149c7e <bad.repro+0x1ce>
 1149c7e:	4c 8b 9c 24 a8 00 00 	mov    0xa8(%rsp),%r11 ; inner loop index
 1149c85:	00 
 1149c86:	4d 89 dc             	mov    %r11,%r12
 1149c89:	4d 89 d5             	mov    %r10,%r13
 1149c8c:	4d 39 ec             	cmp    %r13,%r12
 1149c8f:	0f 83 d9 01 00 00    	jae    1149e6e <bad.repro+0x3be> ; maybe jump out of the loop after our iterations
 1149c95:	4d 89 c4             	mov    %r8,%r12
 1149c98:	4d 01 dc             	add    %r11,%r12
 1149c9b:	4c 89 a4 24 d0 00 00 	mov    %r12,0xd0(%rsp)
 1149ca2:	00 
 1149ca3:	0f 92 84 24 d8 00 00 	setb   0xd8(%rsp)
 1149caa:	00 
 1149cab:	44 8a ac 24 d8 00 00 	mov    0xd8(%rsp),%r13b
 1149cb2:	00 
 1149cb3:	4c 89 a4 24 e0 00 00 	mov    %r12,0xe0(%rsp)
 1149cba:	00 
 1149cbb:	0f 92 84 24 e8 00 00 	setb   0xe8(%rsp)
 1149cc2:	00 
 1149cc3:	41 80 fd 01          	cmp    $0x1,%r13b
 1149cc7:	0f 85 05 00 00 00    	jne    1149cd2 <bad.repro+0x222> ; this must be the check for dummy8 (the loop value)
 1149ccd:	e8 ee e9 ed ff       	call   10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
 1149cd2:	4c 8b a4 24 e0 00 00 	mov    0xe0(%rsp),%r12 ; stored in r12
 1149cd9:	00 
 1149cda:	e9 00 00 00 00       	jmp    1149cdf <bad.repro+0x22f>
 1149cdf:	4c 89 a4 24 f0 00 00 	mov    %r12,0xf0(%rsp) ; store dummy8 (=0) and start computing dummy2
 1149ce6:	00 
 1149ce7:	49 89 fd             	mov    %rdi,%r13
 1149cea:	49 83 c5 00          	add    $0x0,%r13 ; compute dummy2
 1149cee:	4c 89 ac 24 f8 00 00 	mov    %r13,0xf8(%rsp)
 1149cf5:	00 
 1149cf6:	0f 90 84 24 00 01 00 	seto   0x100(%rsp) ; ov check
 1149cfd:	00 
 1149cfe:	44 8a b4 24 00 01 00 	mov    0x100(%rsp),%r14b
 1149d05:	00 
 1149d06:	4c 89 ac 24 08 01 00 	mov    %r13,0x108(%rsp)
 1149d0d:	00 
 1149d0e:	0f 90 84 24 10 01 00 	seto   0x110(%rsp)
 1149d15:	00 
 1149d16:	41 80 fe 01          	cmp    $0x1,%r14b ; did we overflow?
 1149d1a:	0f 85 05 00 00 00    	jne    1149d25 <bad.repro+0x275>
 1149d20:	e8 9b e9 ed ff       	call   10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
 1149d25:	4c 8b ac 24 08 01 00 	mov    0x108(%rsp),%r13 ; store r13 = dummy2 = 0
 1149d2c:	00 
 1149d2d:	e9 00 00 00 00       	jmp    1149d32 <bad.repro+0x282>
 1149d32:	4c 89 ac 24 18 01 00 	mov    %r13,0x118(%rsp)
 1149d39:	00 
 1149d3a:	49 be ff ff ff ff ff 	movabs $0x7fffffffffffffff,%r14
 1149d41:	ff ff 7f 
 1149d44:	4d 39 e6             	cmp    %r12,%r14
 1149d47:	0f 83 05 00 00 00    	jae    1149d52 <bad.repro+0x2a2>
 1149d4d:	e8 be e9 ed ff       	call   1028710 <debug.FullPanic((function 'defaultPanic')).integerOutOfBounds>
 1149d52:	4d 89 e6             	mov    %r12,%r14                            ; store r14 = dummy3 = 0
 1149d55:	e9 00 00 00 00       	jmp    1149d5a <bad.repro+0x2aa>            ; noop
 1149d5a:	4c 89 b4 24 20 01 00 	mov    %r14,0x120(%rsp)                     ; and save it
 1149d61:	00 
 1149d62:	49 83 fe 00          	cmp    $0x0,%r14                            ; compare dummy3 to 0
 1149d66:	0f 84 08 00 00 00    	je     1149d74 <bad.repro+0x2c4>            ; if equal, we need to do the second check (since dummy3 != 0 isn't true so we can't early-out)
 1149d6c:	41 b7 01             	mov    $0x1,%r15b                           ; otherwise, we move true/1 to r15b
 1149d6f:	e9 0d 00 00 00       	jmp    1149d81 <bad.repro+0x2d1>            ; and we can early out
 1149d74:	49 83 fe 00          	cmp    $0x0,%r14                            ; now we need the second check
 1149d78:	41 0f 95 c7          	setne  %r15b                                ; r14 = dummy3 != 0?
 1149d7c:	e9 00 00 00 00       	jmp    1149d81 <bad.repro+0x2d1>            ; done
 1149d81:	44 88 bc 24 78 01 00 	mov    %r15b,0x178(%rsp)                    ; we save r15b (the result of the check , dummy4)
 1149d88:	00 
 1149d89:	49 83 fe 00          	cmp    $0x0,%r14                         ;;;; now we start with dummy5, which has THREE d3 != 0 checks. (check 1) d3 == 0?
 1149d8d:	0f 84 0d 00 00 00    	je     1149da0 <bad.repro+0x2f0>            ; if true, go to da0 (another check, no early out)
 1149d93:	c6 84 24 79 01 00 00 	movb   $0x1,0x179(%rsp)                     ; otherwise we move true to stack
 1149d9a:	01 
 1149d9b:	e9 11 00 00 00       	jmp    1149db1 <bad.repro+0x301>            ; and leave
 1149da0:	49 83 fe 00          	cmp    $0x0,%r14                            ; (check 2) dummy 3 = 0?
 1149da4:	0f 95 84 24 79 01 00 	setne  0x179(%rsp)                          ; move dummy3 != 0 to stack
 1149dab:	00 
 1149dac:	e9 00 00 00 00       	jmp    1149db1 <bad.repro+0x301>            ; noop
 1149db1:	48 89 84 24 28 01 00 	mov    %rax,0x128(%rsp)                     ; (we jumped here from the first early out) save min (to 0x128!)
 1149db8:	00 
 1149db9:	8a 84 24 79 01 00 00 	mov    0x179(%rsp),%al                      ; take dummy3 != 0 again from stack. at this point we should have short circuited I think, but it seems we continue anyways and just keep superfluously overwriting al = 1.
 1149dc0:	a8 01                	test   $0x1,%al                             ; does NOT compare al to 1, but computes al & 1 and sets ZF if we get 0 (?))
 1149dc2:	0f 84 07 00 00 00    	je     1149dcf <bad.repro+0x31f>            ; thus we jump if al = 0
 1149dc8:	b0 01                	mov    $0x1,%al                             ; otherwise we set al=1
 1149dca:	e9 14 00 00 00       	jmp    1149de3 <bad.repro+0x333>            ; jump to end of dummy5 computation
 1149dcf:	49 83 fe 00          	cmp    $0x0,%r14                            ; (check 3) is dummy3 == 0?
 1149dd3:	48 89 84 24 28 01 00 	mov    %rax,0x128(%rsp)                     ; save rax again? note THIS OVERWRITES MIN, which explains the bug (i will now be computed as 0)
 1149dda:	00 
 1149ddb:	0f 95 c0             	setne  %al                                  ; if dummy3 != 0, al=1
 1149dde:	e9 00 00 00 00       	jmp    1149de3 <bad.repro+0x333>            ; noop
 1149de3:	88 84 24 7a 01 00 00 	mov    %al,0x17a(%rsp)                      ; save computation result (dummy5) to stack
 1149dea:	49 83 fd 00          	cmp    $0x0,%r13                            ; dummy2 = 0?
 1149dee:	0f 84 05 00 00 00    	je     1149df9 <bad.repro+0x349>            ; this must be the if (dummy2 != 0) {} noop jumps
 1149df4:	e9 05 00 00 00       	jmp    1149dfe <bad.repro+0x34e>
 1149df9:	e9 00 00 00 00       	jmp    1149dfe <bad.repro+0x34e>
 1149dfe:	41 80 f7 01          	xor    $0x1,%r15b
 1149e02:	41 f6 c7 01          	test   $0x1,%r15b
 1149e06:	0f 84 05 00 00 00    	je     1149e11 <bad.repro+0x361>
 1149e0c:	e9 05 00 00 00       	jmp    1149e16 <bad.repro+0x366>
 1149e11:	e9 00 00 00 00       	jmp    1149e16 <bad.repro+0x366>
 1149e16:	48 89 f0             	mov    %rsi,%rax
 1149e19:	4c 01 e0             	add    %r12,%rax
 1149e1c:	48 89 84 24 30 01 00 	mov    %rax,0x130(%rsp)
 1149e23:	00 
 1149e24:	0f 92 84 24 38 01 00 	setb   0x138(%rsp)
 1149e2b:	00 
 1149e2c:	44 8a a4 24 38 01 00 	mov    0x138(%rsp),%r12b
 1149e33:	00 
 1149e34:	48 89 84 24 40 01 00 	mov    %rax,0x140(%rsp)
 1149e3b:	00 
 1149e3c:	0f 92 84 24 48 01 00 	setb   0x148(%rsp)
 1149e43:	00 
 1149e44:	41 80 fc 01          	cmp    $0x1,%r12b
 1149e48:	0f 85 05 00 00 00    	jne    1149e53 <bad.repro+0x3a3>
 1149e4e:	e8 6d e8 ed ff       	call   10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
 1149e53:	48 8b 84 24 40 01 00 	mov    0x140(%rsp),%rax
 1149e5a:	00 
 1149e5b:	e9 00 00 00 00       	jmp    1149e60 <bad.repro+0x3b0>
 1149e60:	48 89 84 24 50 01 00 	mov    %rax,0x150(%rsp)
 1149e67:	00 
 1149e68:	90                   	nop
 1149e69:	e9 05 00 00 00       	jmp    1149e73 <bad.repro+0x3c3>
 1149e6e:	e9 19 00 00 00       	jmp    1149e8c <bad.repro+0x3dc>
 1149e73:	49 83 c3 01          	add    $0x1,%r11
 1149e77:	4c 89 9c 24 a8 00 00 	mov    %r11,0xa8(%rsp)
 1149e7e:	00 
 1149e7f:	48 8b 84 24 28 01 00 	mov    0x128(%rsp),%rax                     ; load the wrong value into rax
 1149e86:	00 
 1149e87:	e9 f2 fd ff ff       	jmp    1149c7e <bad.repro+0x1ce>
 1149e8c:	48 8b 34 24          	mov    (%rsp),%rsi
 1149e90:	48 01 ce             	add    %rcx,%rsi
 1149e93:	48 89 b4 24 58 01 00 	mov    %rsi,0x158(%rsp)
 1149e9a:	00 
 1149e9b:	0f 92 84 24 60 01 00 	setb   0x160(%rsp)
 1149ea2:	00 
 1149ea3:	8a 8c 24 60 01 00 00 	mov    0x160(%rsp),%cl
 1149eaa:	48 89 b4 24 68 01 00 	mov    %rsi,0x168(%rsp)
 1149eb1:	00 
 1149eb2:	0f 92 84 24 70 01 00 	setb   0x170(%rsp)
 1149eb9:	00 
 1149eba:	80 f9 01             	cmp    $0x1,%cl
 1149ebd:	0f 85 05 00 00 00    	jne    1149ec8 <bad.repro+0x418>
 1149ec3:	e8 f8 e7 ed ff       	call   10286c0 <debug.FullPanic((function 'defaultPanic')).integerOverflow>
 1149ec8:	48 8b 8c 24 68 01 00 	mov    0x168(%rsp),%rcx
 1149ecf:	00 
 1149ed0:	e9 00 00 00 00       	jmp    1149ed5 <bad.repro+0x425>
 1149ed5:	48 89 0c 24          	mov    %rcx,(%rsp)
 1149ed9:	90                   	nop
 1149eda:	e9 05 00 00 00       	jmp    1149ee4 <bad.repro+0x434>
 1149edf:	e9 0e 00 00 00       	jmp    1149ef2 <bad.repro+0x442>
 1149ee4:	48 83 c3 01          	add    $0x1,%rbx
 1149ee8:	48 89 5c 24 08       	mov    %rbx,0x8(%rsp)
 1149eed:	e9 26 fc ff ff       	jmp    1149b18 <bad.repro+0x68>
 1149ef2:	48 8b 04 24          	mov    (%rsp),%rax
 1149ef6:	48 8d 65 d8          	lea    -0x28(%rbp),%rsp
 1149efa:	41 5b                	pop    %r11
 1149efc:	41 5c                	pop    %r12
 1149efe:	41 5d                	pop    %r13
 1149f00:	41 5e                	pop    %r14
 1149f02:	41 5f                	pop    %r15
 1149f04:	5d                   	pop    %rbp
 1149f05:	c3                   	ret

The asm seems to be confusingly inefficient (at least, so it looks to me). Variables and their overflow bits are saved twice, I guess since each has two “uses”; one for checking to panic and one for the computations later on. It’s also littered with a lot of (superfluous?) copies of all the variables to stack and back again, multiple times. I don’t understand most of the why behind the codegen, but I can guess it’s to ease debugging or to make the backend simpler or more performant(?).

In any case, the fact that i was set from 100 to 0 was important, since in fact it is min which is overwritten in the triple dummy3 != 0 checks (at 0x128(%rsp)). It looks like that’s a consequence of the very confusing asm for calculating dummy5 (really, why is the control flow so complicated?).

Now I’ve found the result of the bug, but of course I’m clueless what causes it in codegen. Looking at the Air again, probably it happens when generating for cond_br, br or cmp_neq. I know one of the codegen files is huge, 200k lines or so, so I grep that one since it surely must contain all the interesting logic. The first hit is airCondBr.

It starts with finding a bunch of instruction data that is tightly encoded in the slices. Looks precise but probably not the problem. Then there is something about liveness, which I only know to be a compiler thing to track, well, lifetimes and such. Probably this is the problem, since we “forgot” that we had saved rax at a certain place? I also note there’s a special case with a comment, but the commit it was introduced in seems unrelated. Still I’ll try to see what it does.

The function operandDies() seems very low level and just checks whether a certain flag was set, probably earlier on. It seems like liveness_cond_br is important for that but I guess not; that’s only used later in the function. Looking at what liveness_cond_br does, it has deaths for “then” and “else” cases.
Thinking about this for a bit, maybe the distinction is whether a variable dies in the body of a condition, or in one of the cases after (as in, whether it dies in the () of an if statement, or in one of the {}'s). This agrees with what the function seems to do: first generate code for the check, then for the “then case” and then the “else case”. Then I guess doing this liveness check first is just a maybe-register-saving optimization to not needlessly spill to memory. Ok.

Moving on, I see there is a call to “save state” and later state is restored. This may be the problem since we were overwriting a register which was saved in memory twice. It calls some specialized “retroactive” function which seemingly just saves anything to do with the current register state, to then restore later after the body is generated.

Then we enter a genCondBrMir call. I note what we do depends on mcv, but some of those have useful doc comments. There’s only four possibilities for mcv, and load_frame makes the most sense since we were overwriting something at an offset from the stack pointer. This case allocates a “temporary register” and then calls itself again with that reg as new mcv. The copyToTmpRegister function has a useful doc comment that we may be spilling instructions to the stack to free up a register. Exactly what was happening. Then there’s a call to jccReloc.
So, presumably we spilled rax and forgot to notify anyone or update the state we’d just saved. Maybe we can update the state somehow? Let’s see what the rest of CodeGen does with saved state.

Grepping for “self.saveState” there are only four calls. The first one in airMulDivBinOp has the same pattern: first save state, then call asmJccReloc.
The second one in genTry does it in reverse: first genCondBrMir, then check the liveness of the operand, then save state after that. Call 3 is our airCondBr and the last call in airLoopSwitchBr has nothing to do with condBrMir but some things are similar. We first do the operand liveness check, then generate a register, then save state. (This is different from what happened in genCondBrMir, where we also allocated a register, but did that after saving state.) This function enters lowerSwitchBr which seems quite complicated, but I notice there’s also a bunch of calls to “cg.saveState” – the same thing.
There’s only two calls to that; one when generating a loop instruction and one for a cmp_eq instruction (!!).
This latter one does some things I don’t understand and then again goes for asmJccReloc. Looking at that function it seems to not mess with registers and instead just adds a mir instruction.
So, probably the problem was indeed that genCondBrMir allocated a register after the register state was saved, invalidating the saved state. This invalid state is then loaded later and on the basis of that we overwrite rax/al with 0, which then gets used as the value of min.

To try and verify my idea, I try to just move the saveState call after genCondBrMir. I rebuild the compiler, one recompilation of the repro, and… 1045. So it seems like that was indeed it.

Some questions: is this the “proper dev workflow”? Looking at the assembly and then working backwards to think what part of codegen was probably responsible for it based on what the asm is doing? I see something about dev.check() in codegen.generateFunction but I’m clueless how to use that. I imagine it’s not documented.
Is there a way to view Mir output, like with Air? Would that be useful? Reading the asm and comparing it to the source code took me some time, but probably I’d go faster if I did it more often.
Also, probably it’s time to open a book on compilers. Any recommendations?

3 Likes

I believe you should also check that no miscompilation were introduced by your change ? That means running the whole test-suite (or at least the parts of it that checks that code compiles correctly). It’s very easy to fix one bug and introduce 10 new bugs, fixing 1 and not introducing more is much harder.