Zfxengine Development notes-zfxvector of SSE version

Source: Internet
Author: User
Tags intel pentium
SSE Introduction

I learned the skills of 3D game programming masters, but I used a technology called "single command, multi-data (SIMD)" to compile a 3D mathematical library. Using this method, operations such as vector computing and matrix transformation that we often use can be accelerated many times. This time, we also used this technology when learning about 3D engine development. SIMD is a technology name, not a specific tool. To implement this technology, different CPU vendors have introduced different technologies, such as MMX, 3 dnow !, SSE, sse2, sse3 .... Because intel processor is used on my computer, it supports MMX, SSE, and sse2, so here I use SSE commands. If you are using an AMD processor and support it! 3 dnow! You don't have to worry. Although they are different vendors, their commands use the same standard, so they can still use the code here.


SSE support

To speed up vector computation, the SSE command is used. We need to know whether our CPU and operating system support the SSE command code. For such a query, we can use the following code in the assembly code to obtain the CPU feature identifier:

MoV eax, 1

Cpuid

MoV flag, EDX

In this way, the CPU feature value is saved in the flag. Then, by parsing this feature value, we can determine whether our computer supports the instruction sets mentioned above. However, I do not use this method here. If you want to use this method, you can query relevant information on the Internet.

Here, I use the sample code provided on the msdn website to complete this judgment.

If you want to use it, you only need to download and compile and run it, and you will be able to know whether your computer supports SSE.

On this basis, I modified the code used in zfxengine to determine whether the user's computer supports SSE:

<span style="font-family:Microsoft YaHei;">#include"ZFX3D.h"#include<iostream>#include<stdio.h>#include<fstream>#include"cpuid.h"using namespace ZFXEngine ;using namespace std ;/*** Global variant*/bool g_bSSE = false ;//Check if the operate system support ssevoid expand(int avail, int mask, ofstream* pOut){char buffer[64];    if (mask & _CPU_FEATURE_MMX) {sprintf(buffer,"\t%s\t_CPU_FEATURE_MMX\n",               avail & _CPU_FEATURE_MMX ? "yes" : "no");(*pOut)<<buffer;    }    if (mask & _CPU_FEATURE_SSE) {        sprintf(buffer,"\t%s\t_CPU_FEATURE_SSE\n",               avail & _CPU_FEATURE_SSE ? "yes" : "no");(*pOut)<<buffer;    }    if (mask & _CPU_FEATURE_SSE2) {        sprintf(buffer,"\t%s\t_CPU_FEATURE_SSE2\n",               avail & _CPU_FEATURE_SSE2 ? "yes" : "no");(*pOut)<<buffer;    }    if (mask & _CPU_FEATURE_3DNOW) {        sprintf(buffer,"\t%s\t_CPU_FEATURE_3DNOW\n",               avail & _CPU_FEATURE_3DNOW ? "yes" : "no");(*pOut)<<buffer;    }}//---------------------------------------------------------------------------------------------bool ZFX3DInitCPU(void) { _p_info info;    _cpuid(&info);ofstream out ;out.open("ZFXEngine_CPU_Info.log");char buffer[64];sprintf(buffer,"v_name:\t\t%s\n", info.v_name);out<<buffer;    sprintf(buffer,"model:\t\t%s\n", info.model_name);out<<buffer;    sprintf(buffer,"family:\t\t%d\n", info.family);out<<buffer;    sprintf(buffer,"model:\t\t%d\n", info.model);out<<buffer;    sprintf(buffer,"stepping:\t%d\n", info.stepping);out<<buffer;    sprintf(buffer,"feature:\t%08x\n", info.feature);out<<buffer;    expand(info.feature, info.checks, &out);    sprintf(buffer,"os_support:\t%08x\n", info.os_support);out<<buffer;    expand(info.os_support, info.checks,&out);    sprintf(buffer,"checks:\t\t%08x\n", info.checks);out<<buffer;if((info.feature & _CPU_FEATURE_SSE)&&(info.os_support & _CPU_FEATURE_SSE))g_bSSE = true ;elseg_bSSE = false ;out.close();return g_bSSE ;}// end for ZFX3DInitCPU</span>

You only need to call zfx3dinitcpu () to check whether SSE is supported, and this function will print the user's CPU information, as shown below:

<span style="font-family:Microsoft YaHei;">v_name:        GenuineIntelmodel:        INTEL Pentium-IIIfamily:        6model:        10stepping:    9feature:    00000007    yes    _CPU_FEATURE_MMX    yes    _CPU_FEATURE_SSE    yes    _CPU_FEATURE_SSE2    no    _CPU_FEATURE_3DNOWos_support:    00000007    yes    _CPU_FEATURE_MMX    yes    _CPU_FEATURE_SSE    yes    _CPU_FEATURE_SSE2    no    _CPU_FEATURE_3DNOWchecks:        0000000f</span>

Zfxvector implementation

The header file of zfxvector is as follows:

<span style="font-family:Microsoft YaHei;">/*** Define ZFXVector*/class _declspec(dllexport) ZFXVector{public:float x, y, z, w ;public:ZFXVector(void){ x = 0 , y = 0 , z = 0, w = 1.0f ;}ZFXVector(float _x, float _y, float _z):x(_x),y(_y),z(_z),w(1.0){}~ZFXVector(){}public:inline void set(float _x, float _y, float _z, float _w = 1.0f);inline float getLength(void);inline float getSqrtLength(void) const ;inline void negate(void);inline void normalize(void);inline float angleWith(ZFXVector& v);inline void difference(const ZFXVector& u,const ZFXVector&v);void operator +=(const ZFXVector &v);void operator -=(const ZFXVector &v);void operator *=(float f);void operator /=(float f);float operator *(const ZFXVector &v) const ;ZFXVector operator *(float f) const ;ZFXVector operator * (const ZFXMatrix &m) const ;ZFXVector operator + (const ZFXVector &v) const ;ZFXVector operator - (const ZFXVector &v) const ;inline void cross(const ZFXVector &u, const ZFXVector& v);}; // end for ZFXVector</span>

The following is the implementation file of this class:

<span style="font-family:Microsoft YaHei;">#include"ZFX3D.h"#include<cmath>using namespace ZFXEngine ;extern bool g_bSSE ;float _fabs(float f){if(f < 0.0f)return -f ;return f ;}// end for _fabsinline void ZFXVector::set(float _x, float _y,float _z, float _w){x = _x ;y = _y ;z = _z ;w = _w ;}// end for setvoid ZFXVector::operator+=(const ZFXVector& v){x += v.x ;y += v.y ;z += v.z ;}// end for +=ZFXVector ZFXVector::operator+(const ZFXVector& v) const{return ZFXVector(x + v.x, y + v.y, z+ v.z);}// end for +void ZFXVector::operator -=(const ZFXVector& v){x -= v.x ;y -= v.y ;z -= v.z ;}// end for -=ZFXVector ZFXVector::operator -(const ZFXVector& v) const{return ZFXVector(x - v.x, y - v.y, z - v.z);}// end for -void ZFXVector::operator *=(float f){x *= f ;y *= f ;z *= f ;}// end for *=void ZFXVector::operator /= (float f){x /= f ;y /= f ;z /= f ;}// end for /=ZFXVector ZFXVector::operator *(float f) const{return ZFXVector(x * f, y * f, z * f) ;}// end for *float ZFXVector::operator*(const ZFXVector& v) const{return (x * v.x + y * v.y + z * v.z);}// end for *inline float ZFXVector::getSqrtLength(void) const{return (x * x + y * y + z * z) ;}// end for getSqrLengthinline void ZFXVector::negate(void){x = -x ;y = -y ;z = -z ;}// end for negateinline void ZFXVector::difference(const ZFXVector&v1,const ZFXVector&v2){x = v2.x - v1.x ;y = v2.y - v1.y ;z = v2.z - v1.z ;w = 1.0f ;}// end for differenceinline float ZFXVector::angleWith(ZFXVector& v){return (float)acos(((*this) * v )/(this->getLength() * v.getLength()));}// end for angleWithinline float ZFXVector::getLength(void){float f = 0.0f ;if(!g_bSSE){f = (float)sqrt(x*x + y*y + z*z);}else{float *pf = &f ;w = 0.0f;_asm{mov ecx , pf; point to the resultmov esi , this; copy the pointer of this to esimovups xmm0, [esi]; copy the this vector to xmm0 mulps xmm0, xmm0; multiply all the componentmovaps xmm1, xmm0; copy result to xmm1shufps xmm1, xmm1, 4Eh; shuffle : f1, f0, f3, f2addps  xmm0, xmm1;movaps xmm1, xmm0; copy the xmm0 to xmm1shufps xmm1, xmm1, 11h;addps xmm0, xmm1sqrtss xmm0, xmm0; sqrt the first elementmovss [ecx], xmm0; copy the first element to the result}// end for _asmw = 1.0f ;}return f ;}// end for getLengthinline void ZFXVector::normalize(void){if(x == 0 && y == 0 && z == 0)return ;if(!g_bSSE){float f = (float)sqrt(x*x + y*y + z*z);x /= f;y /= f;z /= f;}else{w = 0.0f ;_asm{mov esi , this; copy the pointer of this to esimovups xmm0, [esi]; copy the this vector to xmm0 movaps xmm2, xmm0mulps xmm0, xmm0; multiply all the componentmovaps xmm1, xmm0; copy result to xmm1shufps xmm1, xmm1, 4Eh; shuffle : f1, f0, f3, f2addps  xmm0, xmm1;movaps xmm1, xmm0; copy the xmm0 to xmm1shufps xmm1, xmm1, 11h;addps xmm0, xmm1rsqrtps xmm0, xmm0 ;mulpsxmm2, xmm0 ; multiply the inverse of squre rootmovups [esi], xmm2}// end for _asmw = 1.0f;}// end if...else...}// end for normalizeinline void ZFXVector::cross(const ZFXVector& v, const ZFXVector& u){if(!g_bSSE){x = v.y * u.z - v.z * u.y ;y = v.z * u.x - v.x * u.z ;z = v.x * u.y - v.y * u.x ;w = 1.0f;}else{_asm{mov esi , vmov edi , umovups xmm0, [esi]movups xmm1, [edi]movaps xmm2, xmm0movaps xmm3, xmm1shufps xmm0, xmm0, 0xC9shufps xmm1, xmm1, 0xD2mulps xmm0, xmm1shufps xmm2, xmm2, 0xD2shufps xmm3, xmm3, 0xC9mulps xmm2, xmm3subps xmm0, xmm2mov esi, thismovups [esi], xmm0}// end for _asmw = 1.0f ;}// end if...else...}// end for crossZFXVector ZFXVector::operator*(const ZFXMatrix& m) const{ZFXVector vcResult ;if(!g_bSSE){vcResult.x = x* m._11 + y * m._21 + z * m._31 + w * m._41 ;vcResult.y = x* m._12 + y * m._22 + z * m._32 + w * m._42 ;vcResult.z = x* m._13 + y * m._23 + z * m._33 + w * m._43 ;vcResult.w = x* m._14 + y * m._24 + z * m._34 + w * m._44 ;}else{float *ptrRet = (float*)&vcResult ;ZFXVector s ; s.set(m._11, m._12, m._13, m._14);ZFXVector t ; t.set(m._21, m._22, m._23, m._24);ZFXVector u ; u.set(m._31, m._32, m._33, m._34);ZFXVector v ; v.set(m._41, m._42, m._43, m._44);float* ps = (float*)&s ;float* pt = (float*)&t ;float* pu = (float*)&u ;float* pv = (float*)&v ; __asm  { mov    esi, this movups xmm0, [esi] movaps xmm1, xmm0 movaps xmm2, xmm0 movaps xmm3, xmm0 shufps xmm0, xmm2, 0x00 shufps xmm1, xmm2, 0x55 shufps xmm2, xmm2, 0xAA shufps xmm3, xmm3, 0xFF mov    edx,  ps movups xmm4, [edx] mov    edx,  pt movups xmm5, [edx] mov    edx,  pu movups xmm6, [edx] mov    edx,  pv movups xmm7, [edx] mulps xmm0, xmm4 mulps xmm1, xmm5 mulps xmm2, xmm6 mulps xmm3, xmm7 addps xmm0, xmm1 addps xmm0, xmm2 addps xmm0, xmm3 mov edx, ptrRet ; movups [edx], xmm0 ;}// end for _asm}// end if...else...//homo it if(vcResult.w != 1.0f&& vcResult.w != 0.0f){vcResult.x /= vcResult.w ;vcResult.y /= vcResult.w ;vcResult.z /= vcResult.w ;vcResult.w = 1.0f ;}return vcResult ;}// end for *</span>

Pay attention to the following points:

1. Not all operations must be completed using SSE. We need to know that the overhead of moving data from general CPU registers to dedicated SSE registers and moving the SSE calculation results to CPU registers is relatively large. If your computing overhead is not as high as the moving overhead, for example, simply performing an addition, it is not suitable for SSE. You should use SSE to complete the slightly complex and optimized operation functions.


2. for SSE command operations, you need to know which commands are used to align data, which data is used for package data, and the meaning of the Shuf command.


3. when writing functions, you can write vectors according to mathematical definitions. However, due to the limited precision of computers, we usually only use an approximate algorithm to compile functions such as normalize and getlength. The approximate idea can increase the operation speed of the function and there is a certain error within the range permitted by the accuracy, this error can be ignored for the image.


Well, the above are the notes for today.

Zfxengine Development notes-zfxvector of SSE version

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.