小辉程序员之路, since 1996 http://www.xiaohui.com
乐走天涯: 工作并快乐着,职业并休闲着
 » 首页 > MMX 优化: How to optimize for the Pentium family of microprocessors

10.1 Pairing integer instructions (PPlain and PMMX): Perfect pairing


http://www.XiaoHui.com 日期: 2000-04-01 13:00

10. Pairing integer instructions (PPlain and PMMX)

10.1 Perfect pairing

The PPlain and PMMX have two pipelines for executing instructions, called the U-pipe and the V-pipe. Under certain conditions it is possible to execute two instructions simultaneously, one in the U-pipe and one in the V-pipe. This can almost double the speed. It is therefore advantageous to reorder your instructions to make them pair.

The following instructions are pairable in either pipe:

  • MOV register, memory, or immediate into register or memory
  • PUSH register or immediate, POP register
  • LEA, NOP
  • INC, DEC, ADD, SUB, CMP, AND, OR, XOR,
  • and some forms of TEST (see chapter 26.14)
The following instructions are pairable in the U-pipe only:
  • ADC, SBB
  • SHR, SAR, SHL, SAL with immediate count
  • ROR, ROL, RCR, RCL with an immediate count of 1
The following instructions can execute in either pipe but are only pairable when in the V-pipe:
  • near call
  • short and near jump
  • short and near conditional jump.
All other integer instructions can execute in the U-pipe only, and are not pairable.

Two consecutive instructions will pair when the following conditions are met:

1. The first instruction is pairable in the U-pipe and the second instruction is pairable in the V-pipe.

2. The second instruction does not read or write a register which the first instruction writes to.

Examples:

MOV EAX, EBX / MOV ECX, EAX ; read after write, do not pair MOV EAX, 1 / MOV EAX, 2 ; write after write, do not pair MOV EBX, EAX / MOV EAX, 2 ; write after read, pair OK MOV EBX, EAX / MOV ECX, EAX ; read after read, pair OK MOV EBX, EAX / INC EAX ; read and write after read, pair OK

3. In rule 2 partial registers are treated as full registers. Example:

MOV AL, BL / MOV AH, 0

writes to different parts of the same register, do not pair

4. Two instructions which both write to parts of the flags register can pair despite rule 2 and 3. Example:

SHR EAX, 4 / INC EBX ; pair OK

5. An instruction which writes to the flags can pair with a conditional jump despite rule 2. Example:

CMP EAX, 2 / JA LabelBigger ; pair OK

6. The following instruction combinations can pair despite the fact that they both modify the stack pointer:

PUSH + PUSH, PUSH + CALL, POP + POP

7. There are restrictions on the pairing of instructions with prefix. There are several types of prefixes:

  • instructions addressing a non-default segment have a segment prefix.
  • instructions using 16 bit data in 32 bit mode, or 32 bit data in 16 bit mode have an operand size prefix.
  • instructions using 32 bit base or index registers in 16 bit mode have an address size prefix.
  • repeated string instructions have a repeat prefix.
  • locked instructions have a LOCK prefix.
  • many instructions which were not implemented on the 8086 processor have a two byte opcode where the first byte is 0FH. The 0FH byte behaves as a prefix on the PPlain, but not on the other versions. The most common instructions with 0FH prefix are: MOVZX, MOVSX, PUSH FS, POP FS, PUSH GS, POP GS, LFS, LGS, LSS, SETcc, BT, BTC, BTR, BTS, BSF, BSR, SHLD, SHRD, and IMUL with two operands and no immediate operand.

On the PPlain, a prefixed instruction can only execute in the U-pipe, except for conditional near jumps.

On the PMMX, instructions with operand size, address size, or 0FH prefix can execute in either pipe, whereas instructions with segment, repeat, or lock prefix can only execute in the U-pipe.

8. An instruction which has both a displacement and immediate data is not pairable on the PPlain and only pairable in the U-pipe on the PMMX:

MOV DWORD PTR DS:[1000], 0 ; not pairable or only in U-pipe CMP BYTE PTR [EBX+8], 1 ; not pairable or only in U-pipe CMP BYTE PTR [EBX], 1 ; pairable CMP BYTE PTR [EBX+8], AL ; pairable

(Another problem with instructions which have both a displacement and immediate data on the PMMX is that such instructions may be longer than 7 bytes, which means that only one instruction can be decoded per clock cycle, as explained in chapter 12.)

9. Both instructions must be preloaded and decoded. This is explained in chapter 8.

10. There are special pairing rules for MMX instructions on the PMMX:

  • MMX shift, pack or unpack instructions can execute in either pipe but cannot pair with other MMX shift, pack or unpack instructions.
  • MMX multiply instructions can execute in either pipe but cannot pair with other MMX multiply instructions. They take 3 clock cycles and the last 2 clock cycles can overlap with subsequent instructions in the same way as floating point instructions can (see chapter 24).
  • an MMX instruction which accesses memory or integer registers can execute only in the U-pipe and cannot pair with a non-MMX instruction.

Tags: MMX 优化



 文章评论

目前没有任何评论.

↓ 快抢占第1楼,发表你的评论和意见 ↓
 
发表你的评论
如果你想针对此文发表评论, 请填写下列表单:
姓名: * 必填
E-mail: 可选 (不会被公开)
反垃圾广告: 为了防止广告机器人自动发贴, 请计算下列表达式的值:
8 + 18 = * 必填
评论内容:
* 必填
你可以使用下列标签修饰文字:
[b] 文字 [/b]: 加粗文字
[quote] 文字 [/quote]: 引用文字

 

小辉程序员之路 建站于 1997 ◇ 做一名最好的开发者是我不变的理想……
Copyright(C) 1997-2009 XiaoHui.com   All rights reserved
声明:站内所有原创文字,未经许可,均可转载、复制。
转载时必须以链接形式注明作者和原始出处