
A basic building block operation in geometry involves computing divisions and square roots. For instance, transformation often involves dividing each x, y, z coordinate by the W perspective coordinate; normalization is another common geometry operation, which requires the computation of 1/squareroot. In order to optimize these cases, SSE introduces two approximation instructions: RCP and RSQRT. These instructions are implemented via hardware lookup tables and are inherently less precise (12 bits of mantissa) than the full IEEEcompliant DIV and SQRT (24 bits of mantissa), but have the advantage of being much faster than the full precision versions. When greater precision is needed, the approximation instructions can be used with a single NewtonRaphson iteration to achieve almost the same precision as the IEEE instructions (~22 bits of mantissa). For a basic geometry pipeline, these instructions can improve overall performance on the order of 15%.
RCPSS xmm1, xmm2/m32 
Computes of an approximate reciprocal of the low singleprecision floatingpoint value in the source operand (second operand) stores the singleprecision floatingpoint result in the destination operand. The source operand can be an XMM register or a 32bit memory location. The destination operand is an XMM register. The three highorder doublewords of the destination operand remain unchanged.

DEST[310] ← APPROX (1.0/(SRC[310])); 
RCPSS __m128 _mm_rcp_ss(__m128 a) 
RCPPS xmm1, xmm2/m128 
Performs a SIMD computation of the approximate reciprocals of the four packed singleprecision floatingpoint values in the source operand (second operand) stores the packed singleprecision floatingpoint results in the destination operand. The source operand can be an XMM register or a 128bit memory location. The destination operand is an XMM register.

DEST[310] ← APPROXIMATE(1.0/(SRC[310])); DEST[6332] ← APPROXIMATE(1.0/(SRC[6332])); DEST[9564] ← APPROXIMATE(1.0/(SRC[9564])); DEST[12796] ← APPROXIMATE(1.0/(SRC[12796])); 
RCCPS __m128 _mm_rcp_ps(__m128 a) 
RSQRTSS xmm1, xmm2/m32 
Computes an approximate reciprocal of the square root of the low singleprecision floating point value in the source operand (second operand) stores the singleprecision floatingpoint result in the destination operand. The source operand can be an XMM register or a 32bit memory location. The destination operand is an XMM register. The three highorder doublewords of the destination operand remain unchanged.

DEST[310] ← APPROXIMATE(1.0/SQRT(SRC[310])); 
RSQRTSS __m128 _mm_rsqrt_ss(__m128 a) 
RSQRTPS xmm1, xmm2/m128 
Performs a SIMD computation of the approximate reciprocals of the square roots of the four packed singleprecision floatingpoint values in the source operand (second operand) and stores the packed singleprecision floatingpoint results in the destination operand. The source operand can be an XMM register or a 128bit memory location. The destination operand is an XMM register.

DEST[310] ← APPROXIMATE(1.0/SQRT(SRC[310])); DEST[6332] ← APPROXIMATE(1.0/SQRT(SRC[6332])); DEST[9564] ← APPROXIMATE(1.0/SQRT(SRC[9564])); DEST[12796] ← APPROXIMATE(1.0/SQRT(SRC[12796])); 
RSQRTPS __m128 _mm_rsqrt_ps(__m128 a) 