Cpu sse instruction set C ++ code

Source: Internet
Author: User

Only VS2002 and above support the SSE command function library

Currently, most CPUs (Intel and AMD) on the market support the SSE instruction set.

 

The SSE command function must contain the following header files:

# Include <xmmintrin. h>

 

The details of the SSE command are not described in detail here. Here we only talk about the batchcompute function.

However, this batch operation only processes four 32-bit characters at a time, or the others. The data transfer volume at a time is 128 bits, that is, 16 bytes.

Therefore, when processing a large number of arrays, The SSE instruction set has obvious advantages over common assembly instructions.

 

The xmmintrin. h header file contains some functions:

Arithmetic functions, such as addition, subtraction, surplus, division, square, reciprocal, and maximum and minimum, and logical functions such as, or, and non,

As well as various comparison functions and conversion functions, please refer to this header file for details.

 

The storage variable of the SSE command operation is _ m128, which is defined in xmmintrin. h.

 

For example, the D3DXVec3Cross function commonly used in DX can be rewritten in the same way (efficiency is not considered here and only for SSE instructions)

The D3DXVec3Cross function of DX is as follows:

D3DXINLINE D3DXVECTOR3 * D3DXVec3Cross <br/> (D3DXVECTOR3 * pOut, CONST D3DXVECTOR3 * pV1, CONST D3DXVECTOR3 * pV2) <br/>{< br/> D3DXVECTOR3 v; </p> <p> # ifdef D3DX_DEBUG <br/> if (! POut |! PV1 |! PV2) <br/> return NULL; <br/> # endif </p> <p> v. x = pV1-> y * pV2-> z-pV1-> z * pV2-> y; <br/> v. y = pV1-> z * pV2-> x-pV1-> x * pV2-> z; <br/> v. z = pV1-> x * pV2-> y-pV1-> y * pV2-> x; </p> <p> * pOut = v; <br/> return pOut; <br/>}

We use the SSE command to rewrite the following code (CPU support is required ):

Inline D3DXVECTOR3 * DXVec3Cross_SEE (D3DXVECTOR3 * vout, D3DXVECTOR3 * a, D3DXVECTOR3 * B) <br/>{< br/> _ m128 a1 = _ mm_set_ps (a-> y, a-> z, a-> x, 0); <br/> _ m128 b1 = _ mm_set_ps (B-> z, B-> x, B-> y, 0); <br/> _ m128 o1 = _ mm_mul_ps (a1, b1); </p> <p> a1 = _ mm_set_ps (B-> z, b-> x, B-> y, 0); <br/> b1 = _ mm_set_ps (a-> y, a-> z, a-> x, 0 ); </p> <p> _ m128 oo = _ mm_sub_ps (o1, _ mm_mul_ps (a1, b1); </p> <p> vout-> x = oo. m128_f32 [0]; <br/> vout-> y = oo. m128_f32 [1]; <br/> vout-> z = oo. m128_f32 [2]; </p> <p> return vout; <br/>}

_ Mm_set_ps initializes a _ m128 variable.

_ Mm_mul_ps calculates the multiplication of two values, and the result is returned to the secondary variable.

_ Mm_sub_ps Subtraction

 

 

In fact, this is simple.

_ Mm_div_ss Division

_ Mm_sqrt_ss Initiator

... And so on.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.