A simple wrapper of DirectX math
About DirectX math
DirectX math was originally called xNa math. It is a cross-platform C ++ mathematical library with full SIMD command optimization. The current version is 3.03 and supports x86, x64, and arm platforms, used to replace the mathematical library in DX 9/10.
Why DirectX math
If you have enough time, energy, and knowledge, and firmly believe that your own math library is the best, you can ignore this article :). Details Based on gamasutra technical controlIntroductionShao, it is basically difficult to achieve better than DXM, and it is completely cross-platform, only contains. h and. INL files, even OpenGLProgramIt can also be used. In the end, DXM is very comprehensive and implements most common 3D computing, and even a simple collision detection library.
Correct use of DXM
First, you can use the # DEFINE _ xm_no_intrinsics _ macro to cancel SIMD acceleration and generate common commands. Unless you want to be compatible with a very old CPU, you should not use this macro. Although DXM encapsulates the SIMD command, you still need to know a little bit about it (please read the DXM documentation carefully ). DXM defines a large number of common data types, but only defines various mathematical operations for xmvector and xmmatrix.All DXM calculations are based on xmvector and xmmatrix. For the X86 platform, xmvector is defined as follows:
Typedef _ m128 xmvector;
Xmmatrix is a group of (4) xmvector. Since _ m128 requires strict 16 byte alignment in the memory, it is not recommended to use these two types of data directly in the program, such
Class camera {xmvector position; xmmatrix viewmatrix ;}
The aboveCodeBasically, it will not run successfully. 16 bytes of New are not guaranteed to be aligned under 32 bits, and the xmvector is not necessarily 16 bytes aligned due to the layout of class members. Access the above data, the program may crash directly.Therefore, DXM defines a large number of data types, such as xmfloat4, xmfloat3, and xmfloat4x4:
Class camera {xmfloat3 position; xmfloat4x4 viewmatrix ;}
Most of these types are float arrays, so you don't need to consider alignment. You need to convert them to xmvector or xmmatrix when you need to calculate them. For example:
Xmfloat3 position; xmvector myvector= Xmloadfloat3 (position)//Load to MMX register//Perform calculate with SIMD .....//...Xmstorefloat4 (& position, myvector );//Save to memory
Obviously, this load/Store Mode is really troublesome. You also need to write 4 or 5 lines of code for simple calculation of two vectors, therefore, the DX Development Group program writes a simple wrapper named simplemath which is included in the DirectX tool kit to add computing functions for common types. However, simplemath simply wraps load and store:
Struct Vector2: Public Xmfloat2 { // ... Other member function Float DOT ( Const Vector2 & V) Const Float Lengthsquared () Const ; // ..... Other member function } Inline Float Vector2: dot ( Const Vector2 & V) Const { Using Namespace DirectX; xmvector V1 = Xmloadfloat2 ( This ); Xmvector v2 = Xmloadfloat2 (& V); xmvector x = Xmvector2dot (V1, V2 ); Return Xmvectorgetx (x );}
This implementation may be meaningful for float4, but it is too cumbersome for float2, it may be slower to copy two float2 to the MMX register and save them to the normal memory location after adding them with the non-SIMD command! In addition, for SIMD computing, we should minimize the load and unload, and store the data in registers as much as possible. For example, before calculation, we should load the data into xmvector and execute a series of computations, save it to a normal location, instead of load/store each time a calculation is executed
Case 1: Load to xmvectorperform calc1Perform calc1.. Perform calc nstore to float4Case 2Load to xmvectorperform calc1Store to float4load to xmvectorperform calc2Store to float4...
There are two different writing methods. If the compiler is not particularly optimized, the efficiency difference will be obvious. Therefore, a good math library must have a correct method to achieve the best performance.
Flamemath
Flamemath retains the simplemath interface, but improves some performance problems and replaces the implementation of some functions. for simple and lightweight computing, it directly uses common commands and complex computing such as matrix, then use SIMD.
InlineFloatFloat2: dot (ConstFloat2 & V)Const{Return(X * v. x + y *V. Y );}
However, flamemath still cannot avoid repeated load/store issues between different computations. Therefore, the best solution should be to use float3, float4, and Other types to store data. For lightweight and simple computing, the floatx member functions are directly used. For complex computing that requires continuous execution, the data load should be xmvector and other types, directly calculated using the DXM function, and then stored in floatx.Floatx in floatmath corresponds to the scalar Data Types in DXM and simplemath. vector4 and matrix4x4 are equivalent to xmvector and xmmatrix. All functions in DXM are provided in the form of simdmath static functions.
Simpe usage: float2 V1; float2 V2; float2 result= V1 +V2;//DoneAdvance usagefloat2 F1; float2 F2; vector4 V1=Simdmath. loadfloat2 (F1); vector4 v2=Simdmath. loadfloat2 (F2); vector4 v3=Simdmath. vector2normalize (V1); vector4 v4=Simdmath. vector2normalize (V2); vector4 V5=Simdmath. vector2anglebetweennormals (V3, V4); float result; simdmath. storefloat (&Result, V5 );//Done
PS: flamemath all code is not tested, use only need to contain flamemath. h :)
http://files.cnblogs.com/clayman/FlameMath.rar