All conversions from floating point to integer, and vice versa, must go via a memory location:
FISTP DWORD PTR [TEMP] MOV EAX, [TEMP]
On PPro, PII and PIII, this code is likely to have a penalty for attempting to read from [TEMP] before the write to [TEMP] is finished because the FIST instruction is slow (see chapter 17). It doesn't help to put in a WAIT (see chapter 26.6). It is recommended that you put in other instructions between the write to [TEMP] and the read from [TEMP] if possible in order to avoid this penalty. This applies to all the examples that follow.
The specifications for the C and C++ language requires that conversion from floating point numbers to integers use truncation rather than rounding. The method used by most C libraries is to change the floating point control word to indicate truncation before using an FISTP instruction and changing it back again afterwords. This method is very slow on all processors. On PPro, PII and PIII, the floating point control word cannot be renamed, so all subsequent floating point instructions must wait for the FLDCW instruction to retire.
Whenever you have a conversion from floating point to integer in C or C++, you should think of whether you can use rounding to nearest integer instead of truncation. If your standard library doesn't have a fast round function then make your own using the code examples listed below.
If you need truncation inside a loop then you should change the control word only outside the loop if the rest of the floating point instructions in the loop can work correctly in truncation mode.
You may use various tricks for truncating without changing the control word, as illustrated in the examples below. These examples presume that the control word is set to default, i.e. rounding to nearest or even.
Rounding to nearest or even
; extern "C" int round (double x); _round PROC NEAR PUBLIC _round FLD QWORD PTR [ESP+4] FISTP DWORD PTR [ESP+4] MOV EAX, DWORD PTR [ESP+4] RET _round ENDP
Truncation towards zero
; extern "C" int truncate (double x); _truncate PROC NEAR PUBLIC _truncate FLD QWORD PTR [ESP+4] ; x SUB ESP, 12 ; space for local variables FIST DWORD PTR [ESP] ; rounded value FST DWORD PTR [ESP+4] ; float value FISUB DWORD PTR [ESP] ; subtract rounded value FSTP DWORD PTR [ESP+8] ; difference POP EAX ; rounded value POP ECX ; float value POP EDX ; difference (float) TEST ECX, ECX ; test sign of x JS SHORT NEGATIVE ADD EDX, 7FFFFFFFH ; produce carry if difference < -0 SBB EAX, 0 ; subtract 1 if x-round(x) < -0 RET NEGATIVE: XOR ECX, ECX TEST EDX, EDX SETG CL ; 1 if difference > 0 ADD EAX, ECX ; add 1 if x-round(x) > 0 RET _truncate ENDP
Truncation towards minus infinity
; extern "C" int ifloor (double x); _ifloor PROC NEAR PUBLIC _ifloor FLD QWORD PTR [ESP+4] ; x SUB ESP, 8 ; space for local variables FIST DWORD PTR [ESP] ; rounded value FISUB DWORD PTR [ESP] ; subtract rounded value FSTP DWORD PTR [ESP+4] ; difference POP EAX ; rounded value POP EDX ; difference (float) ADD EDX, 7FFFFFFFH ; produce carry if difference < -0 SBB EAX, 0 ; subtract 1 if x-round(x) < -0 RET _ifloor ENDP
These procedures work for -231 < x < 231-1. They do not check for overflow or NAN's.
The PIII has instructions for truncation of single precision floating point numbers: CVTTSS2SI and CVTTPS2PI. These instructions are very useful if the single precision is satisfactory, but if you are converting a float with higher precision to single precision in order to use these truncation instructions then you have the problem that the number may be rounded up in the conversion to single precision.
Alternative to FISTP instruction (PPlain and PMMX)
Converting a floating point number to integer is normally done like this:
FISTP DWORD PTR [TEMP] MOV EAX, [TEMP]
An alternative method is:
.DATA ALIGN 8 TEMP DQ ? MAGIC DD 59C00000H ; f.p. representation of 2^51 + 2^52 .CODE FADD [MAGIC] FSTP QWORD PTR [TEMP] MOV EAX, DWORD PTR [TEMP]
Adding the 'magic number' of 251 + 252 has the effect that any integer between -231 and +231 will be aligned in the lower 32 bits when storing as a double precision floating point number. The result is the same as you get with FISTP for all rounding methods except truncation towards zero. The result is different from FISTP if the control word specifies truncation or in case of overflow. You may need a WAIT instruction for compatibility with the old 80287 processor, see chapter 26.6.
This method is not faster than using FISTP, but it gives better scheduling opportunities on PPlain and PMMX because there is a 3 clock void between FADD and FSTP which may be filled with other instrucions. You may multiply or divide the number by a power of 2 in the same operation by doing the opposite to the magic number. You may also add a constant by adding it to the magic number, which then has to be double precision.