A lot of useful literature and tutorials can be downloaded for free from Intel's www site or acquired in print or on CD-ROM. It is recommended that you study this literature in order to get acquainted with the microprocessor architecture. However, the documents from Intel are not always accurate - especially the tutorials have many errors (evidently, they haven't tested their own examples).
I will not give the URL's here because the file locations change very often. You can find the documents you need by using the search facilities at: developer.intel.com or follow the links from www.agner.org/assem
Some documents are in .PDF format. If you don't have software for viewing or printing .PDF files, then you may download the Acrobat file reader from http://www.adobe.com/
The use of MMX and XMM (SIMD) instructions for optimizing specific applications are described in several application notes. The instruction set is described in various manuals and tutorials.
VTUNE is a software tool from Intel for optimizing code. I have not tested it and can therefore not give any evalutation of it here.
A lot of other sources than Intel also have useful information. These sources are listed in the FAQ for the newsgroup comp.lang.asm.x86. For other internet ressources follow the links from www.agner.org/assem