首页 随笔 乐走天涯 程序资料 评论中心 Tag 论坛 其他资源 搜索 消息中心 联系我 关于 RSS

30. Testing speed


日期: 2000-04-03 14:00 | 联系我 | 关注我: SteemIT, Twitter, Google+

30. Testing speed

The Pentium family of processors have an internal 64 bit clock counter which can be read into EDX:EAX using the instruction RDTSC (read time stamp counter). This is very useful for testing exactly how many clock cycles a piece of code takes.

The program below is useful for measuring the number of clock cycles a piece of code takes. The program executes the code to test 10 times and stores the 10 clock counts. The program can be used in both 16 and 32 bit mode on the PPlain and PMMX:

;************ Test program for PPlain and PMMX: ******************** ITER EQU 10 ; number of iterations OVERHEAD EQU 15 ; 15 for PPlain, 17 for PMMX RDTSC MACRO ; define RDTSC instruction DB 0FH,31H ENDM ;************ Data segment: ******************** .DATA ; data segment ALIGN 4 COUNTER DD 0 ; loop counter TICS DD 0 ; temporary storage of clock RESULTLIST DD ITER DUP (0) ; list of test results ;************ Code segment: ******************** .CODE ; code segment BEGIN: MOV [COUNTER],0 ; reset loop counter TESTLOOP: ; test loop ;************ Do any initializations here: ******************** FINIT ;************ End of initializations ******************** RDTSC ; read clock counter MOV [TICS],EAX ; save count CLD ; non-pairable filler REPT 8 NOP ; eight NOP's to avoid shadowing effect ENDM ;************ Put instructions to test here: ******************** FLDPI ; this is only an example FSQRT RCR EBX,10 FSTP ST ;***************** End of instructions to test ******************** CLC ; non-pairable filler with shadow RDTSC ; read counter again SUB EAX,[TICS] ; compute difference SUB EAX,OVERHEAD ; subtract clocks used by fillers etc. MOV EDX,[COUNTER] ; loop counter MOV [RESULTLIST][EDX],EAX ; store result in table ADD EDX,TYPE RESULTLIST ; increment counter MOV [COUNTER],EDX ; store counter CMP EDX,ITER * (TYPE RESULTLIST) JB TESTLOOP ; repeat ITER times ; insert here code to read out the values in RESULTLIST

The 'filler' instructions before and after the piece of code to test are are included in order to get consistent results on the PPlain. The CLD is a non-pairable instruction which has been inserted to make sure the pairing is the same the first time as the subsequent times. The eight NOP instructions are inserted to prevent any prefixes in the code to test to be decoded in the shadow of the preceding instructions on the PPlain. Single byte instructions are used here to obtain the same pairing the first time as the subsequent times. The CLC after the code to test is a non-pairable instruction which has a shadow under which the 0FH prefix of the RDTSC can be decoded so that it is independent of any shadowing effect from the code to test on the PPlain.

On The PMMX you may want to insert XOR EAX,EAX / CPUID before the instructions to test if you want the FIFO instruction buffer to be empty, or some time-consuming instruction (f.ex. CLI or AAD) if you want the FIFO buffer to be full (CPUID has no shadow under which prefixes of subsequent instructions can decode).

On the PPro, PII and PIII you have to insert XOR EAX,EAX / CPUID before and after each RDTSC to prevent it from executing in parallel with anything else, and remove the filler instructions. (CPUID is a serializing instruction which means that it flushes the pipeline and waits for all pending operations to finish before proceeding. This is useful for testing purposes.)

The RDTSC instruction cannot execute in virtual mode on the PPlain and PMMX, so if you are running DOS programs you must run in real mode. (Press F8 while booting and select "safe mode command prompt only" or "bypass startup files").

The complete test program is available from www.agner.org/assem/.

The Pentium processors have special performance monitor counters which can count events such as cache misses, misalignments, various stalls, etc. Details about how to use the performance monitor counters are not covered by this manual but can be found in "Intel Architecture Software Developer's Manual", vol. 3, Appendix A.

标签: MMX 优化

 文章评论
目前没有任何评论.

↓ 快抢占第1楼,发表你的评论和意见 ↓

发表你的评论
如果你想针对此文发表评论, 请填写下列表单:
姓名: * 必填 (Twitter 用户可输入以 @ 开头的用户名, Steemit 用户可输入 @@ 开头的用户名)
E-mail: 可选 (不会被公开。如果我回复了你的评论,你将会收到邮件通知)
网站 / Blog: 可选
反垃圾广告: 为了防止广告机器人自动发贴, 请计算下列表达式的值:
7 x 1 + 3 = * 必填
评论内容:
* 必填
你可以使用下列标签修饰文字:
[b] 文字 [/b]: 加粗文字
[quote] 文字 [/quote]: 引用文字

 
首页 随笔 乐走天涯 猎户星 Google Earth 程序资料 程序生活 评论 Tag 论坛 资源 搜索 联系 关于 隐私声明 版权声明 订阅邮件

程序员小辉 建站于 1997 ◇ 做一名最好的开发者是我不变的理想。
Copyright © XiaoHui.com; 保留所有权利。