I know I’ve veered off topic at this point but just wanted to follow up the size discussion with: it appears that clang, even with -Oz, just generally produces larger codegen than powerpc-eabi-gcc:
gcc:
80009480 <sj_iter_object>:
80009480: 94 21 ff d0 stwu r1,-48(r1)
80009484: 7c 08 02 a6 mflr r0
80009488: 80 84 00 0c lwz r4,12(r4)
8000948c: bf a1 00 24 stmw r29,36(r1)
80009490: 7c 7d 1b 78 mr r29,r3
80009494: 7c be 2b 78 mr r30,r5
80009498: 90 01 00 34 stw r0,52(r1)
8000949c: 7c df 33 78 mr r31,r6
800094a0: 48 00 00 01 bl 800094a0 <sj_iter_object+0x20>
800094a4: 7f a4 eb 78 mr r4,r29
800094a8: 38 61 00 10 addi r3,r1,16
800094ac: 48 00 00 01 bl 800094ac <sj_iter_object+0x2c>
800094b0: 81 21 00 10 lwz r9,16(r1)
800094b4: 91 3e 00 00 stw r9,0(r30)
800094b8: 28 09 00 01 cmplwi r9,1
800094bc: 81 41 00 14 lwz r10,20(r1)
800094c0: 91 5e 00 04 stw r10,4(r30)
800094c4: 81 41 00 18 lwz r10,24(r1)
800094c8: 91 5e 00 08 stw r10,8(r30)
800094cc: 81 41 00 1c lwz r10,28(r1)
800094d0: 91 5e 00 0c stw r10,12(r30)
800094d4: 40 81 00 4c ble 80009520 <sj_iter_object+0xa0>
800094d8: 38 61 00 10 addi r3,r1,16
800094dc: 7f a4 eb 78 mr r4,r29
800094e0: 48 00 00 01 bl 800094e0 <sj_iter_object+0x60>
800094e4: 81 21 00 10 lwz r9,16(r1)
800094e8: 91 3f 00 00 stw r9,0(r31)
800094ec: 2c 09 00 01 cmpwi r9,1
800094f0: 30 69 ff ff addic r3,r9,-1
800094f4: 81 41 00 14 lwz r10,20(r1)
800094f8: 7c 63 49 10 subfe r3,r3,r9
800094fc: 91 5f 00 04 stw r10,4(r31)
80009500: 81 41 00 18 lwz r10,24(r1)
80009504: 91 5f 00 08 stw r10,8(r31)
80009508: 81 41 00 1c lwz r10,28(r1)
8000950c: 91 5f 00 0c stw r10,12(r31)
80009510: 40 a2 00 14 bne 80009524 <sj_iter_object+0xa4>
80009514: 3d 20 00 00 lis r9,0
80009518: 39 29 00 00 addi r9,r9,0
8000951c: 91 3d 00 10 stw r9,16(r29)
80009520: 38 60 00 00 li r3,0
80009524: 39 61 00 30 addi r11,r1,48
80009528: 54 63 07 fe clrlwi r3,r3,31
8000952c: 48 00 00 00 b 8000952c <sj_iter_object+0xac>
clang:
000091f4 <sj_iter_object>:
91f4: 7c 08 02 a6 mflr r0
91f8: 94 21 ff d0 stwu r1,-48(r1)
91fc: 90 01 00 34 stw r0,52(r1)
9200: 80 84 00 0c lwz r4,12(r4)
9204: 93 81 00 20 stw r28,32(r1)
9208: 7c bc 2b 78 mr r28,r5
920c: 93 a1 00 24 stw r29,36(r1)
9210: 7c dd 33 78 mr r29,r6
9214: 93 c1 00 28 stw r30,40(r1)
9218: 7c 7e 1b 78 mr r30,r3
921c: 48 00 00 01 bl 921c <sj_iter_object+0x28>
9220: 38 61 00 10 addi r3,r1,16
9224: 7f c4 f3 78 mr r4,r30
9228: 48 00 00 01 bl 9228 <sj_iter_object+0x34>
922c: 80 a1 00 10 lwz r5,16(r1)
9230: 80 61 00 1c lwz r3,28(r1)
9234: 90 bc 00 00 stw r5,0(r28)
9238: 80 81 00 18 lwz r4,24(r1)
923c: 90 7c 00 0c stw r3,12(r28)
9240: 80 61 00 14 lwz r3,20(r1)
9244: 80 bc 00 00 lwz r5,0(r28)
9248: 90 9c 00 08 stw r4,8(r28)
924c: 90 7c 00 04 stw r3,4(r28)
9250: 28 05 00 02 cmplwi r5,2
9254: 3b 80 00 00 li r28,0
9258: 41 80 00 64 blt 92bc <sj_iter_object+0xc8>
925c: 38 61 00 10 addi r3,r1,16
9260: 7f c4 f3 78 mr r4,r30
9264: 48 00 00 01 bl 9264 <sj_iter_object+0x70>
9268: 80 a1 00 10 lwz r5,16(r1)
926c: 80 61 00 1c lwz r3,28(r1)
9270: 90 bd 00 00 stw r5,0(r29)
9274: 90 7d 00 0c stw r3,12(r29)
9278: 80 7d 00 00 lwz r3,0(r29)
927c: 80 81 00 18 lwz r4,24(r1)
9280: 80 a1 00 14 lwz r5,20(r1)
9284: 28 03 00 00 cmplwi r3,0
9288: 90 9d 00 08 stw r4,8(r29)
928c: 90 bd 00 04 stw r5,4(r29)
9290: 41 82 00 20 beq 92b0 <sj_iter_object+0xbc>
9294: 28 03 00 01 cmplwi r3,1
9298: 40 82 00 20 bne 92b8 <sj_iter_object+0xc4>
929c: 3c 60 00 00 lis r3,0
92a0: 38 63 00 00 addi r3,r3,0
92a4: 38 63 00 51 addi r3,r3,81
92a8: 90 7e 00 10 stw r3,16(r30)
92ac: 48 00 00 10 b 92bc <sj_iter_object+0xc8>
92b0: 7c 7c 1b 78 mr r28,r3
92b4: 48 00 00 08 b 92bc <sj_iter_object+0xc8>
92b8: 3b 80 00 01 li r28,1
92bc: 57 83 07 fe clrlwi r3,r28,31
92c0: 83 c1 00 28 lwz r30,40(r1)
92c4: 83 a1 00 24 lwz r29,36(r1)
92c8: 83 81 00 20 lwz r28,32(r1)
92cc: 80 01 00 34 lwz r0,52(r1)
92d0: 38 21 00 30 addi r1,r1,48
92d4: 7c 08 03 a6 mtlr r0
92d8: 4e 80 00 20 blr
Seems like a decent amount of the increase is clang inlining full function prologs/epilogs instead of using the setgpr/restgpr jumps of gcc, and not using stuff like stmw.
Final size diff is 44K vs 55K, which isn’t abysmal but enough to be a deciding factor for me. Oh well 