You must issue an EMMS instruction after your last MMX instruction if there is a possibility that floating point code follows later.
On PMMX there is a high penalty for switching between floating point and MMX instructions. The first floating point instruction after an EMMS takes approximately 58 clocks extra, and the first MMX instruction after a floating point instruction takes approximately 38 clocks extra.
On PII and PIII there is no such penalty. The delay after EMMS can be hidden by putting in integer instructions between EMMS and the first floating point instruction.