» 首页 > 程序资料 > MMX 汇编优化 > MMX 优化: How to optimize for the Pentium family of microprocessors

23. Reducing code size (all processors)

日期: 2000-04-02 15:00 | 联系我 | 关注我: Telegram, Twitter

23. Reducing code size (all processors)

As explained in chapter 7, the code cache is 8 or 16 kb. If you have problems keeping the critical parts of your code within the code cache, then you may consider reducing the size of your code.

32 bit code is usually bigger than 16 bit code because addresses and data constants take 4 bytes in 32 bit code and only 2 bytes in 16 bit code. However, 16 bit code has other penalties such as prefixes and problems with accessing adjacent words simultaneously (see chapter 10.2 above). Some other methods for reducing the size or your code are discussed below.

Both jump addresses, data addresses, and data constants take less space if they can be expressed as a sign-extended byte, i.e. if they are within the interval from -128 to +127.

For jump addresses this means that short jumps take two bytes of code, whereas jumps beyond 127 bytes take 5 bytes if unconditional and 6 bytes if conditional.

Likewise, data addresses take less space if they can be expressed as a pointer and a displacement between -128 and +127. Example:

MOV EBX,DS:[100000] / ADD EBX,DS:[100004] ; 12 bytes

Reduce to:

MOV EAX,100000 / MOV EBX,[EAX] / ADD EBX,[EAX+4] ; 10 bytes

The advantage of using a pointer obviously increases if you use it many times. Storing data on the stack and using EBP or ESP as pointer will thus make your code smaller than if you use static memory locations and absolute addresses, provided of course that your data are within +/-127 bytes of the pointer. Using PUSH and POP to write and read temporary data is even shorter.

Data constants may also take less space if they are between -128 and +127. Most instructions with immediate operands have a short form where the operand is a sign-extended single byte. Examples:

PUSH 200 ; 5 bytes PUSH 100 ; 2 bytes ADD EBX,128 ; 6 bytes SUB EBX,-128 ; 3 bytes

The most important instruction with an immediate operand which doesn't have such a short form is MOV.

Examples:

MOV EAX, 0 ; 5 bytesMay be changed to:

XOR EAX,EAX ; 2 bytesAnd

MOV EAX, 1 ; 5 bytesMay be changed to:

XOR EAX,EAX / INC EAX ; 3 bytesor:

PUSH 1 / POP EAX ; 3 bytesAnd

MOV EAX, -1 ; 5 bytesMay be changed to:

OR EAX, -1 ; 3 bytes

If the same address or constant is used more than once then you may load it into a register. A MOV with a 4-byte immediate operand may sometimes be replaced by an arithmetic instruction if the value of the register before the MOV is known. Example:

MOV [mem1],200 ; 10 bytes MOV [mem2],200 ; 10 bytes MOV [mem3],201 ; 10 bytes MOV EAX,100 ; 5 bytes MOV EBX,150 ; 5 bytes

Assuming that mem1 and mem3 are both within -128/+127 bytes of mem2, this may be changed to:

MOV EBX, OFFSET mem2 ; 5 bytes MOV EAX,200 ; 5 bytes MOV [EBX+mem1-mem2],EAX ; 3 bytes MOV [EBX],EAX ; 2 bytes INC EAX ; 1 byte MOV [EBX+mem3-mem2],EAX ; 3 bytes SUB EAX,101 ; 3 bytes LEA EBX,[EAX+50] ; 3 bytes

Be aware of the AGI stall in the LEA instruction (for PPlain and PMMX).

You may also consider that different instructions have different lengths. The following instructions take only one byte and are therefore very attractive: PUSH reg, POP reg, INC reg32, DEC reg32.

INC and DEC with 8 bit registers take 2 bytes, so INC EAX is shorter than INC AL.

XCHG EAX,reg is also a single-byte instruction and thus takes less space than MOV EAX,reg, but it is slower.

Some instructions take one byte less when they use the accumulator than when they use any other register.

Examples:

MOV EAX,DS:[100000] is smaller than MOV EBX,DS:[100000] ADD EAX,1000 is smaller than ADD EBX,1000

Instructions with pointers take one byte less when they have only a base pointer (not ESP) and a displacement than when they have a scaled index register, or both base pointer and index register, or ESP as base pointer.

Examples:

MOV EAX,[array][EBX] is smaller than MOV EAX,[array][EBX*4] MOV EAX,[EBP+12] is smaller than MOV EAX,[ESP+12]

Instructions with EBP as base pointer and no displacement and no index take one byte more than with other registers:

MOV EAX,[EBX] is smaller than MOV EAX,[EBP], but MOV EAX,[EBX+4] is same size as MOV EAX,[EBP+4].

Instructions with a scaled index pointer and no base pointer must have a four byte displacement, even when it is 0:

LEA EAX,[EBX+EBX] is shorter than LEA EAX,[2*EBX].

前一篇：26.15 Bit scan (PPlain and PMMX)
下一篇：26.14 TEST instruction (PPlain and PMMX)

标签: MMX 优化

发表你的评论如果你想针对此文发表评论, 请填写下列表单:
姓名:	* 必填 (Twitter 用户可输入以 @ 开头的用户名, Steemit 用户可输入 @@ 开头的用户名)
E-mail:	可选 (不会被公开。如果我回复了你的评论，你将会收到邮件通知)
反垃圾广告:	为了防止广告机器人自动发贴, 请计算下列表达式的值: 6 x 4 + 1 = * 必填
评论内容:	* 必填你可以使用下列标签修饰文字: [b] 文字 [/b]: 加粗文字 [quote] 文字 [/quote]: 引用文字