|
MMX Conversion Instructions
There are several cases where elements of
packed data may be required to be repositioned within the packed data,
or the elements of two packed data operands may need to be merged. There
are cases where either input or the desired output representation of a
data may not be ideal for maximizing computation throughput. There
are also situations where one needs to perform intermediate computations
in wider format (perhaps packed word format), while the result is presented
in packed byte format.
In the above cases, there is a need to
extract some elements of a packed data type and write them into a different
position in the packed result. One general solution to this issue is to
provide an instruction that takes two packed data operands and allows merging
of their bytes in any arbitrary order into the destination packed data
operand. However, such a general solution is expensive to implement, requiring
a full cross bar connection.
MMX technology defines instructions that
requires a relatively easy swizzle network and yet allows the efficient
repositioning and combining of elements from packed data operands in most
cases.
SSE technology
adds a shuffle words instruction that
represents a better general solution at the expense of backward compatibility.
PACKSSWB mm, mm/m64
PACKSSDW mm, mm/m64 |
The PACKSS (Packed with Signed Saturation)
instruction packs and saturates the signed data elements from the source
and the destination operands and writes the signed results to the destination
operand.
PACKSSWB packs four signed words from
the source operand and four signed words from the destination operand into
eight signed bytes in the destination register. If the signed value of
a word is larger or smaller than the range of a signed byte, the value
is saturated (in the case of an overflow to 0x7F, and in case of an underflow
to 0x80).

PACKSSDW instruction packs two signed doublewords
from the source operand and two signed doublewords from the destination
operand into four signed words in the destination register. If the signed
value of a doubleword is larger or smaller than the range of a signed word,
the value is saturated (in the case of an overflow to 0x7FFF, and in the
case of an underflow to 0x8000). |
PACKSSWB instruction with 64-bit operands
DEST[7..0] ← SaturateSignedWordToSignedByte DEST[15..0];
DEST[15..8] ← SaturateSignedWordToSignedByte DEST[31..16];
DEST[23..16] ← SaturateSignedWordToSignedByte DEST[47..32];
DEST[31..24] ← SaturateSignedWordToSignedByte DEST[63..48];
DEST[39..32] ← SaturateSignedWordToSignedByte SRC[15..0];
DEST[47..40] ← SaturateSignedWordToSignedByte SRC[31..16];
DEST[55..48] ← SaturateSignedWordToSignedByte SRC[47..32];
DEST[63..56] ← SaturateSignedWordToSignedByte SRC[63..48];
PACKSSDW instruction with 64-bit operands
DEST[15..0] ← SaturateSignedDoublewordToSignedWord DEST[31..0];
DEST[31..16] ← SaturateSignedDoublewordToSignedWord DEST[63..32];
DEST[47..32] ← SaturateSignedDoublewordToSignedWord SRC[31..0];
DEST[63..48] ← SaturateSignedDoublewordToSignedWord SRC[63..32]; |
PACKSSWB __m64 _mm_packs_pi16(__m64 m1, __m64 m2)
PACKSSDW __m64 _mm_packs_pi32 (__m64 m1, __m64 m2) |
PACKUSWB mm, mm/m64 |
The PACKUSWB (Packed with Unsigned
Saturation) instruction packs and saturates four signed words of the
source operand and four signed words of the destination operand into eight
unsigned bytes stored into the destination operand. If the signed value
of the word is larger or smaller than the range of an unsigned byte, the
value is saturated (in the case of an overflow to 0xFF and in the case
of an underflow to 0x00).
 |
PACKUSWB instruction with 64-bit operands:
DEST[7..0] ← SaturateSignedWordToUnsignedByte DEST[15..0];
DEST[15..8] ← SaturateSignedWordToUnsignedByte DEST[31..16];
DEST[23..16] ← SaturateSignedWordToUnsignedByte DEST[47..32];
DEST[31..24] ← SaturateSignedWordToUnsignedByte DEST[63..48];
DEST[39..32] ← SaturateSignedWordToUnsignedByte SRC[15..0];
DEST[47..40] ← SaturateSignedWordToUnsignedByte SRC[31..16];
DEST[55..48] ← SaturateSignedWordToUnsignedByte SRC[47..32];
DEST[63..56] ← SaturateSignedWordToUnsignedByte SRC[63..48]; |
PACKUSWB __m64 _mm_packs_pu16(__m64 m1, __m64 m2) |
PUNPCKHBW mm, mm/m64
PUNPCKHWD mm, mm/m64
PUNPCKHDQ mm, mm/m64 |
The PUNPCKH (Unpack High Packed Data)
instructions unpack and interleave the high-order data elements of the
destination and source operands into the destination operand, ignoring
the low-order data elements. If the source operand is all zeros, the result
is a zero extension of the high order elements of the destination operand.
PUNPCKH supports packed byte (PUNPCKHBW),
packed word (PUNPCKHWD) and packed doubleword (PUNPCKHDQ) source data types.
 |
PUNPCKHBW instruction with 64-bit operands:
DEST[7..0] ← DEST[39..32];
DEST[15..8] ← SRC[39..32];
DEST[23..16] ← DEST[47..40];
DEST[31..24] ← SRC[47..40];
DEST[39..32] ← DEST[55..48];
DEST[47..40] ← SRC[55..48];
DEST[55..48] ← DEST[63..56];
DEST[63..56] ← SRC[63..56];PUNPCKHW instruction
with 64-bit operands:
DEST[15..0] ← DEST[47..32];
DEST[31..16] ← SRC[47..32];
DEST[47..32] ← DEST[63..48];
DEST[63..48] ← SRC[63..48];
PUNPCKHDQ instruction with 64-bit operands:
DEST[31..0] ← DEST[63..32]
DEST[63..32] ← SRC[63..32]; |
PUNPCKHBW __m64 _mm_unpackhi_pi8(__m64 m1, __m64 m2)
PUNPCKHWD __m64 _mm_unpackhi_pi16(__m64 m1,__m64 m2)
PUNPCKHDQ __m64 _mm_unpackhi_pi32(__m64 m1, __m64 m2) |
PUNPCKLBW mm, mm/m32
PUNPCKLWD mm, mm/m32
PUNPCKLDQ mm, mm/m32 |
The PUNPCKL (Unpack Low Packed Data)
instructions unpack and interleave the low-order data elements of the destination
and source operands into the destination operand. When unpacking from a
memory operand, only 32 bits are accessed. If the source operand has a
value of all zeros, the result is a zero extension of the low order elements
of the destination operand. PUNPCKL supports packed byte (PUNPCKLBW), packed
word (PUNPCKLWD) and packed doubleword (PUNPCKLDQ) source data types.
 |
PUNPCKLBW instruction with 64-bit operands:
DEST[63..56] ← SRC[31..24];
DEST[55..48] ← DEST[31..24];
DEST[47..40] ← SRC[23..16];
DEST[39..32] ← DEST[23..16];
DEST[31..24] ← SRC[15..8];
DEST[23..16] ← DEST[15..8];
DEST[15..8] ← SRC[7..0];
DEST[7..0] ← DEST[7..0];PUNPCKLWD instruction with
64-bit operands:
DEST[63..48] ← SRC[31..16];
DEST[47..32] ← DEST[31..16];
DEST[31..16] ← SRC[15..0];
DEST[15..0] ← DEST[15..0];
PUNPCKLDQ instruction with 64-bit operands:
DEST[63..32] ← SRC[31..0];
DEST[31..0] ← DEST[31..0]; |
PUNPCKLBW __m64 _mm_unpacklo_pi8 (__m64 m1, __m64 m2)
PUNPCKLWD __m64 _mm_unpacklo_pi16 (__m64 m1, __m64 m2)
PUNPCKLDQ __m64 _mm_unpacklo_pi32 (__m64 m1, __m64 m2) |
|
|