23. Reducing code size (all processors)
As explained in chapter 7, the code cache is 8 or 16 kb. If you have problems keeping the critical parts of your code within the code cache, then you may consider reducing the size of your code.
32 bit code is usually bigger than 16 bit code because addresses and data constants take 4 bytes in 32 bit code and only 2 bytes in 16 bit code. However, 16 bit code has other penalties such as prefixes and problems with accessing adjacent words simultaneously (see chapter 10.2 above). Some other methods for reducing the size or your code are discussed below.
Both jump addresses, data addresses, and data constants take less space if they can be expressed as a sign-extended byte, i.e. if they are within the interval from -128 to +127.
For jump addresses this means that short jumps take two bytes of code, whereas jumps beyond 127 bytes take 5 bytes if unconditional and 6 bytes if conditional.
Likewise, data addresses take less space if they can be expressed as a pointer and a displacement between -128 and +127. Example:
MOV EBX,DS: / ADD EBX,DS: ; 12 bytes
MOV EAX,100000 / MOV EBX,[EAX] / ADD EBX,[EAX+4] ; 10 bytes
The advantage of using a pointer obviously increases if you use it many times. Storing data on the stack and using EBP or ESP as pointer will thus make your code smaller than if you use static memory locations and absolute addresses, provided of course that your data are within +/-127 bytes of the pointer. Using PUSH and POP to write and read temporary data is even shorter.
Data constants may also take less space if they are between -128 and +127. Most instructions with immediate operands have a short form where the operand is a sign-extended single byte. Examples:
PUSH 200 ; 5 bytes PUSH 100 ; 2 bytes ADD EBX,128 ; 6 bytes SUB EBX,-128 ; 3 bytes
The most important instruction with an immediate operand which doesn't have such a short form is MOV.
MOV EAX, 0 ; 5 bytesMay be changed to:
XOR EAX,EAX ; 2 bytesAnd
MOV EAX, 1 ; 5 bytesMay be changed to:
XOR EAX,EAX / INC EAX ; 3 bytesor:
PUSH 1 / POP EAX ; 3 bytesAnd
MOV EAX, -1 ; 5 bytesMay be changed to:
OR EAX, -1 ; 3 bytes
If the same address or constant is used more than once then you may load it into a register. A MOV with a 4-byte immediate operand may sometimes be replaced by an arithmetic instruction if the value of the register before the MOV is known. Example:
MOV [mem1],200 ; 10 bytes MOV [mem2],200 ; 10 bytes MOV [mem3],201 ; 10 bytes MOV EAX,100 ; 5 bytes MOV EBX,150 ; 5 bytes
Assuming that mem1 and mem3 are both within -128/+127 bytes of mem2, this may be changed to:
MOV EBX, OFFSET mem2 ; 5 bytes MOV EAX,200 ; 5 bytes MOV [EBX+mem1-mem2],EAX ; 3 bytes MOV [EBX],EAX ; 2 bytes INC EAX ; 1 byte MOV [EBX+mem3-mem2],EAX ; 3 bytes SUB EAX,101 ; 3 bytes LEA EBX,[EAX+50] ; 3 bytes
Be aware of the AGI stall in the LEA instruction (for PPlain and PMMX).
You may also consider that different instructions have different lengths. The following instructions take only one byte and are therefore very attractive: PUSH reg, POP reg, INC reg32, DEC reg32.
INC and DEC with 8 bit registers take 2 bytes, so INC EAX is shorter than INC AL.
XCHG EAX,reg is also a single-byte instruction and thus takes less space than MOV EAX,reg, but it is slower.
Some instructions take one byte less when they use the accumulator than when they use any other register.
MOV EAX,DS: is smaller than MOV EBX,DS: ADD EAX,1000 is smaller than ADD EBX,1000
Instructions with pointers take one byte less when they have only a base pointer (not ESP) and a displacement than when they have a scaled index register, or both base pointer and index register, or ESP as base pointer.
MOV EAX,[array][EBX] is smaller than MOV EAX,[array][EBX*4] MOV EAX,[EBP+12] is smaller than MOV EAX,[ESP+12]
Instructions with EBP as base pointer and no displacement and no index take one byte more than with other registers:
MOV EAX,[EBX] is smaller than MOV EAX,[EBP], but MOV EAX,[EBX+4] is same size as MOV EAX,[EBP+4].
Instructions with a scaled index pointer and no base pointer must have a four byte displacement, even when it is 0:
LEA EAX,[EBX+EBX] is shorter than LEA EAX,[2*EBX].