|

The
Streaming SIMD Extensions enhance the
Intel
x86 architecture in four ways:
-
8 new 128-bit SIMD
floating-point registers
that can be directly addressed;
-
50 new instructions
that work on packed floating-point
data;
-
8 new instructions
designed to control
cacheability of all MMX and 32-bit x86 data types, including the
ability
to stream data to memory without polluting the caches, and to prefetch
data before it is actually used;
-
12 new instructions
that extend the MMX instruction set.
This set enables the
programmer to develop
algorithms that can mix packed, single-precision, floating-point and
integer
using both SSE and MMX instructions respectively.
This approach was chosen
because most
media processing applications have the following characteristics:
-
inherently parallel
-
wide dynamic range,
hence floating-point based
-
regular memory access
patterns
-
data independent
control flow.
Intel SSE provides eight
128-bit general-purpose
registers, each of which can be directly addressed using the register
names
XMM0 to XMM7. Each register consists of four 32-bit single precision,
floating-point
numbers, numbered 0 through 3. MMX registers are mapped onto the
floating-point
registers, requiring the EMMS instruction to pass from MMX code to x87
floating-point code; since SIMD floating-point registers are a separate
register file, MMX or floating-point instructions can be mixed with SSE
instructions without execution of a special instruction such as EMMS.
On
the downside, they require support from the operating system, since
they
must be saved when switching tasks.
There is a new
control/status register
MXCSR, that is used to mask/unmask numerical exception handling, to set
rounding modes, to set flush-to-zero mode, and to view status flags.
SSE instructions operate
on either all
or the least significant pairs of packed data operands in parallel. The
packed
instructions (with PS suffix) operate on a pair of operands, while
scalar
instructions (with SS suffix) always operate on the least
significant
pair of the two operands; for scalar operations, the three upper
components
from the first operand are passed through to the destination.
 |
SSE Packed
|
 |
SSE Scalar
|
The SSE set consists of
70 instructions:
the following sections give a brief overview of each group of
instructions
in the SSE set and the instructions within each group.
|
|