16. Register renaming (PPro, PII and PIII)
Register renaming is an advanced technique used by these microprocessors to remove dependencies between different parts of the code. Example:
MOV EAX, [MEM1] IMUL EAX, 6 MOV [MEM2], EAX MOV EAX, [MEM3] INC EAX MOV [MEM4], EAX
Here the last three instructions are independent of the first three in the sense that they don't need any result from the first three instructions. To optimize this on earlier processors you would have to use a different register instead of EAX in the last three instructions and reorder the instructions so that the last three instructions could execute in parallel with the first three instructions. The PPro, PII and PIII processors do this for you automatically. They assign a new temporary register for EAX every time you write to it. Thereby the MOV EAX,[MEM3] instruction becomes independent of the preceding instructions. With out-of-order execution it is likely to finish the move to [MEM4] before the slow IMUL instruction is finished.
Register renaming goes fully automatically. A new temporary register is assigned as an alias for the permanent register every time an instruction writes to this register. An instruction that both reads and writes a register also causes renaming. For example the INC EAX instruction above uses one temporary register for input and another temporary register for output. This does not remove any dependency, of course, but it has some significance for subsequent register reads as I will explain later.
All general purpose registers, stack pointer, flags, floating point registers, MMX registers, XMM registers and segment registers can be renamed. Control words, and the floating point status word cannot be renamed and this is the reason why the use of these registers is slow. There are 40 universal temporary registers so it is unlikely that you will run out of temporary registers.
A common way of setting a register to zero is XOR EAX,EAX or SUB EAX,EAX. These instructions are not recognized as independent of the previous value of the register. If you want to remove the dependency on slow preceding instructions then use MOV EAX,0.
Register renaming is controlled by the register alias table (RAT) and the reorder buffer (ROB). The uops from the decoders go to the RAT via a queue, and then to the ROB and the reservation station. The RAT can handle only 3 uops per clock cycle. This means that the overall throughput of the microprocessor can never exceed 3 uops per clock cycle on average.
There is no practical limit to the number of renamings. The RAT can rename three registers per clock cycle, and it can even rename the same register three times in one clock cycle.