Hey everyone,
I noticed that the self-hosted compiler on amd64, pads instruction sections (e.g., for function
alignment) with 0x00 (zero bytes).
I’m currently working on a binary rewriter, and this 0x00 padding makes a linear scan disassembly difficult because it’s hard to determine where a new instruction begins after a block of zeros, as 0x00 can be the start of a valid multi-byte instruction.
Wouldn’t it be better to pad with 0x90 (NOP) or 0xCC (INT3)?
This would offer easier disassembly because the instructions then are properly tesselated
(non-overlapping and no gaps). Then a linear scan would be enough to disassembly correctly because INT3 is just a valid single-byte instruction.
It would also help with catch miscompilations or incorrect branches, as any code accidentally
executing into the padding would immediately trigger a breakpoint.
Is there anything that I’m missing? Thanks!