小辉程序员之路, since 1996 http://www.xiaohui.com
乐走天涯: 工作并快乐着,职业并休闲着
 » 首页 > MMX 优化: How to optimize for the Pentium family of microprocessors

28.2 Floating point instructions


http://www.XiaoHui.com 日期: 2000-04-03 13:00

28.2 Floating point instructions

Explanations:

Operands:

r = register, m = memory, m32 = 32 bit memory operand, etc.

Clock cycles:

The numbers are minimum values. Cache misses, misalignment, denormal operands, and exceptions may increase the clock counts considerably.

Pairability:

+ = pairable with FXCH, np = not pairable with FXCH.

i-ov:

Overlap with integer instructions. i-ov = 4 means that the last four clock cycles can overlap with subsequent integer instructions.

fp-ov:

Overlap with floating point instructions. fp-ov = 2 means that the last two clock cycles can overlap with subsequent floating point instructions. (WAIT is considered a floating point instruction here)

Instruction Operand Clock cycles Pairability i-ov fp-ov
FLDr/m32/m641+00
FLDm803np00
FBLDm8048-58np00
FST(P)r1np00
FST(P)m32/m642 m)np00
FST(P)m803 m)np00
FBSTPm80148-154np00
FILDm3np22
FIST(P)m6np00
FLDZ FLD12np00
FLDPI FLDL2E etc.5 s)np22
FNSTSWAX/m166 q)np00
FLDCWm168np00
FNSTCWm162np00
FADD(P)r/m3+22
FSUB(R)(P)r/m3+22
FMUL(P)r/m3+22 n)
FDIV(R)(P)r/m19/33/39 p)+38 o)2
FCHS FABS1+00
FCOM(P)(P) FUCOMr/m1+00
FIADD FISUB(R)m6np22
FIMULm6np22
FIDIV(R)m22/36/42 p)np38 o)2
FICOMm4np00
FTST1np00
FXAM17-21np40
FPREM16-64np22
FPREM120-70np22
FRNDINT9-20np00
FSCALE20-32np50
FXTRACT12-66np00
FSQRT70np69 o)2
FSIN FCOS65-100 r)np22
FSINCOS89-112 r)np22
F2XM153-59 r)np22
FYL2X103 r)np22
FYL2XP1105 r)np22
FPTAN120-147 r)np36 o)0
FPATAN112-134 r)np22
FNOP1np00
FXCHr1np00
FINCSTP FDECSTP2np00
FFREEr2np00
FNCLEX6-9np00
FNINIT12-22np00
FNSAVEm124-300np00
FRSTORm70-95np00
WAIT1np00

Notes:

m) The value to store is needed one clock cycle in advance.

n) 1 if the overlapping instruction is also an FMUL.

o) Cannot overlap integer multiplication instructions.

p) FDIV takes 19, 33, or 39 clock cycles for 24, 53, and 64 bit precision respectively. FIDIV takes 3 clocks more. The precision is defined by bit 8-9 of the floating point control word.

q) The first 4 clock cycles can overlap with preceding integer instructions. See chapter 26.7.

r) clock counts are typical. Trivial cases may be faster, extreme cases may be slower.

s) may be up to 3 clocks more when output needed for FST, FCHS, or FABS.

Tags: MMX 优化 | 浮点运算 | Floating Point



 文章评论

目前没有任何评论.

↓ 快抢占第1楼,发表你的评论和意见 ↓
 
发表你的评论
如果你想针对此文发表评论, 请填写下列表单:
姓名: * 必填
E-mail: 可选 (不会被公开)
反垃圾广告: 为了防止广告机器人自动发贴, 请计算下列表达式的值:
8 + 17 = * 必填
评论内容:
* 必填
你可以使用下列标签修饰文字:
[b] 文字 [/b]: 加粗文字
[quote] 文字 [/quote]: 引用文字

 

小辉程序员之路 建站于 1997 ◇ 做一名最好的开发者是我不变的理想……
Copyright(C) 1997-2009 XiaoHui.com   All rights reserved
声明:站内所有原创文字,未经许可,均可转载、复制。
转载时必须以链接形式注明作者和原始出处