Intel SSE

The Streaming SIMD Extensions enhance the Intel x86 architecture in four ways:

8 new 128-bit SIMD floating-point registers that can be directly addressed;

50 new instructions that work on packed floating-point data;

8 new instructions designed to control cacheability of all MMX and 32-bit x86 data types, including the ability to stream data to memory without polluting the caches, and to prefetch data before it is actually used;

12 new instructions that extend the MMX instruction set.

This set enables the programmer to develop algorithms that can mix packed, single-precision, floating-point and integer using both SSE and MMX instructions respectively.
This approach was chosen because most media processing applications have the following characteristics:

inherently parallel

wide dynamic range, hence floating-point based

regular memory access patterns

data independent control flow.

Intel SSE provides eight 128-bit general-purpose registers, each of which can be directly addressed using the register names XMM0 to XMM7. Each register consists of four 32-bit single precision, floating-point numbers, numbered 0 through 3. MMX registers are mapped onto the floating-point registers, requiring the EMMS instruction to pass from MMX code to x87 floating-point code; since SIMD floating-point registers are a separate register file, MMX or floating-point instructions can be mixed with SSE instructions without execution of a special instruction such as EMMS. On the downside, they require support from the operating system, since they must be saved when switching tasks.
There is a new control/status register MXCSR, that is used to mask/unmask numerical exception handling, to set rounding modes, to set flush-to-zero mode, and to view status flags.
SSE instructions operate on either all or the least significant pairs of packed data operands in parallel. The packed instructions (with PS suffix) operate on a pair of operands, while scalar instructions (with SS suffix) always operate on the least significant pair of the two operands; for scalar operations, the three upper components from the first operand are passed through to the destination.

SSE Packed

SSE Scalar

The SSE set consists of 70 instructions: the following sections give a brief overview of each group of instructions in the SSE set and the instructions within each group.

Data movement instructions

Arithmetic instructions

Reciprocal instructions

Comparison instructions

Conversion instructions

Logical instructions

Additional SIMD integer instructions (SSE Primer)

Shuffle instructions

State Management instructions

Cacheability Control instructions