Efficient 3D graphics Math Library

Source: Internet
Author: User

Http://dev.gameres.com/

Vector Overview
I 've been diving for a long time. It's time to make some contributions. Recently I wrote this article and I will send it to you:SSE and Matrix Multiplication
This will introduce the SSE Extended Instruction Set and optimization of matrix multiplication. If you do not like assembler, please send the wm_close message !!
The history of SSE commands is everywhere. It mainly describes my understanding of the principles, functions, and usage of instruction sets.

 

There have been many compilations recently, and it is simply a torment to look at the source code of C ++ code. This forces me to re-use assembly commands to implement all the mathematical libraries. Of course, including detecting cpuid and using Extended Instruction sets. The test result is compared with the d3dx9 mathematical function, and the effect is satisfactory. Except that the matrix multiplication algorithm is always 7% different from the d3dxmatrixmultiply function, the rest are flat or even far ahead (maybe I am crazy, and a new viewer can test it by myself ). As my technology is simple and the test efficiency method is simple, please correct me!
The first step is to introduce my Vector class. The following is the declaration:

Struct _ declspec (dllexport) vector
{

/****************** Variable ********************/

Float X, Y, Z, W;

*******************/

// Constructor
Vector (){}
// Constructor
Vector (const float * V );
// Constructor
Vector (float _ x, float _ y, float _ z, float _ w );

*******************/

// Set the Vector
Void setvector (const float * V );
// Set the Vector
Void setvector (float _ x, float _ y, float _ z, float _ w );
// Subtraction
Void difference (const vector * psrc, const vector * pdest );
// Reverse Flow
Void inverse ();
// Unit vector
Void normalize ();
// Whether the unit vector
Bool isnormalized ();
// Vector length (slow)
Float getlength ();
// The square of the vector length (FAST)
Float getlengthsq ();
// Calculate the cross multiplication using two vectors and save the result to this vector.
Void cross (const vector * pu, const vector * PV );
// Calculate the angle between two vectors
Float anglewith (vector & V );

*****************/

// Operator overload
Void operator + = (vector & V );
// Operator overload
Void operator-= (vector & V );
// Operator overload
Void operator * = (float V );
// Operator overload
Void operator/= (float V );
// Operator overload
Vector operator + (vector & V) const;
// Operator overload
Vector operator-(vector & V) const;
// Operator overload
Float operator * (vector & V) const;
// Operator overload
Void operator * = (gaiamatrix & M );
// Operator overload
Vector operator * (float f) const;
// Operator overload
Bool operator = (vector & V );
// Operator overload
Bool Operator! = (Vector & V );
// Operator overload
// Void operator = (vector & V );
};

Then there is a simple inline function:

// Constructor
Inline vector: vector (const float * V)
: X (V [0])
, Y (V [1])
, Z (V [2])
, W (V [3])
{
}

// Constructor
Inline vector: vector (float _ x, float _ y, float _ z, float _ w)
: X (_ x)
, Y (_ y)
, Z (_ z)
, W (_ w)
{
}

// Set the Vector
Inline void vector: setvector (const float * V)
{
X = V [0]; y = V [1]; Z = V [2];
}

// Set the Vector
Inline void vector: setvector (float _ x, float _ y, float _ z, float _ w)
{
X = _ x; y = _ y; Z = _ z; W = _ w;
}

// Subtraction
Inline void vector: difference (const vector * psrc, const vector * pdest)
{
X = pdest-> X-psrc-> X;
Y = pdest-> Y-psrc-> Y;
X = pdest-> Z-psrc-> Z;
}

// Reverse Flow
Inline void vector: inverse ()
{
X =-X; y =-y; Z =-Z;
}

// Whether the unit vector
Inline bool vector: isnormalized ()
{
Return cmpfloatsame (x * x + y * Y + z * z, 1.0f );
}

// Operator overload
Inline void vector: Operator + = (vector & V)
{
X + = V. X; y + = V. Y; Z + = V. Z;
}
// Operator overload
Inline void vector: Operator-= (vector & V)
{
X-= V. X; y-= V. Y; Z-= V. Z;
}
// Operator overload
Inline void vector: Operator * = (float F)
{
X * = f; y * = f; z * = F;
}
// Operator overload
Inline void vector: Operator/= (float F)
{
F = 1.0f/F;
X * = f; y * = f; z * = F;
}
// Operator overload
Inline vector: Operator + (vector & V) const
{
Return vector (x + v. X, Y + v. Y, Z + v. Z, W );
}
// Operator overload
Inline vector: Operator-(vector & V) const
{
Return vector (x-v.x, y-v.y, z-v.z, W );
}
// Operator overload
Inline float vector: Operator * (vector & V) const
{
Return (x * v. x + y * v. Y + z * v. z );
}
// Operator overload
Inline vector: Operator * (float f) const
{
Return vector (x * F, y * F, z * F, W );
}
// Operator overload
Inline bool vector: Operator = (vector & V)
{
Return (x-v.x) <float_eps & (x-v.x)>-float_eps) | (y-v.y) <float_eps & (y-v.y)>-float_eps) | (z-v.z) <float_eps & (z-v.z)>-float_eps ))? False: True );
}
// Operator overload
Inline bool vector: Operator! = (Vector & V)
{
Return (x-v.x) <float_eps & (x-v.x)>-float_eps) | (y-v.y) <float_eps & (y-v.y)>-float_eps) | (z-v.z) <float_eps & (z-v.z)>-float_eps ))? True: false );
}

There are several important optimizations here. They can also be used as the principle for writing code. They are very important:

1. You must use the const! The editor will use this for optimization.
2. When return returns a value, if yes, it must be returned in the form of a constructor. For example:
Return vector (x + v. X, Y + v. Y, Z + v. Z, W );
3. When multiple numbers are divided by the same number, they must be written in the format of vector: Operator/= (float F.
4. Such a small function must be inline!

The above four points must be observed; otherwise, the compilation code is terrible! Efficiency is also a huge trend. Remember to remember.

Next is the advanced functions of vector:

// The square of the vector length (FAST)
Float vector: getlengthsq () // potential danger
{
_ ASM
{
Mongodword PTR [ECx];
Fmul dword ptr [ECx];
Export dword ptr [ECx + 4];
Fmul dword ptr [ECx + 4];
Faddp ST (1), ST;
Export dword ptr [ECx + 8];
Fmul dword ptr [ECx + 8];
Faddp ST (1), ST;
}
// Return x * x + y * Y + z * z;
}

// Vector length (slow)
Float vector: getlength ()
{
Float F;
If (g_busesse2)
{
_ ASM
{
Lea ECx, F;
MoV eax, this;
MoV dword ptr [eax + 12], 0; // W = 0.0f;

Movups xmm0, [eax];
Mulps xmm0, xmm0;
Movaps xmm1, xmm0;
Shufps xmm1, xmm1, 4eh; shuffles
Addps xmm0, xmm1;
Movaps xmm1, xmm0;
Shufps xmm1, xmm1, 11 h; shuffles
Addss xmm0, xmm1;

Sqrtss xmm0 and xmm0; the first unit is used to calculate the square
Movss dword ptr [ECx], xmm0; the value of the first unit points to the memory space of ECx

MoV dword ptr [eax + 12], 3f800000h; // 3f800000h = 1.0f
}
}
Else
{
F = (float) SQRT (x * x + y * Y + z * z );
}
Return F;
}

// Unit vector
Void vector: normalize ()
{
If (g_busesse2)
{
_ ASM
{
MoV eax, this;
MoV dword ptr [eax + 12], 0;

Movups xmm0, [eax];
Movaps xmm2, xmm0;
Mulps xmm0, xmm0;
Movaps xmm1, xmm0;
Shufps xmm1, xmm1, 4eh;
Addps xmm0, xmm1;
Movaps xmm1, xmm0;
Shufps xmm1, xmm1, 11 h;
Addps xmm0, xmm1;

Rsqrtps xmm0, xmm0;
Mulps xmm2, xmm0;
Movups [eax], xmm2;

MoV dword ptr [eax + 12], 3f800000h;
}
}
Else
{
Float F = (float) SQRT (x * x + y * Y + z * z );
If (F! = 0.0f)
{
F = 1.0f/F;
X * = f; y * = f; z * = F;
}
}
}

// Calculate the cross multiplication using two vectors and save the result to this vector.
Void vector: Cross (const vector * pu, const vector * PV)
{
If (g_busesse2)
{
_ ASM
{
MoV eax, Pu;
MoV edX, PV;

Movups xmm0, [eax]
Movups xmm1, [edX]
Movaps xmm2, xmm0
Movaps xmm3, xmm1

Shufps xmm0, xmm0, 0xc9
Shufps xmm1, xmm1, 0xd2
Mulps xmm0, xmm1

Shufps xmm2, xmm2, 0xd2
Shufps xmm3, xmm3, 0xc9
Mulps xmm2, xmm3

Subps xmm0, xmm2

MoV eax, this
Movups [eax], xmm0

MoV [eax + 12], 3f800000h;
}
}
Else
{
X = pu-> y * PV-> Z-pu-> Z * PV-> Y;
Y = pu-> Z * PV-> X-pu-> X * PV-> Z;
Z = pu-> X * PV-> Y-pu-> y * PV-> X;
W = 1.0f;
}
}

// Operator overload
Void vector: Operator * = (Matrix & M) // potential danger
{
# Ifdef _ debug
Assert (W! = 1.0f & W! = 0.0f );
# Endif

If (g_busesse2)
{
_ ASM
{
MoV ECx, this;
MoV edX, M;
Movss xmm0, [ECx];
// Lea eax, VR;
Shufps xmm0, xmm0, 0; // xmm0 = x, x

Movss xmm1, [ECx + 4];
Mulps xmm0, [edX];
Shufps xmm1, xmm1, 0; // xmm1 = Y, y

Movss xmm2, [ECx + 8];
Mulps xmm1, [edX + 16];
Shufps xmm2, xmm2, 0; // xmm2 = z, Z

Movss xmm3, [ECx + 12];
Mulps xmm2, [edX + 32];
Shufps xmm3, xmm3, 0; // xmm3 = W, W

Addps xmm0, xmm1;
Mulps xmm3, [edX + 48];

Addps xmm0, xmm2;
Addps xmm0, xmm3; // xmm0 = Result
Movups [ECx], xmm0;
MoV [ECx + 12], 3f800000h;
}

}
Else
{
Vector VR;
VR. x = x * M. _ 11 + y * M. _ 21 + z * M. _ 31 + W * M. _ 41;
VR. Y = x * M. _ 12 + y * M. _ 22 + z * M. _ 32 + W * M. _ 42;
VR. z = x * M. _ 13 + y * M. _ 23 + z * M. _ 33 + W * M. _ 43;
VR. W = x * M. _ 14 + y * M. _ 24 + z * M. _ 34 + W * M. _ 44;

X = VR. X;
Y = VR. Y;
Z = VR. Z;
W = 1.0f;
}
}

// Calculate the angle between two vectors
Float vector: anglewith (vector & V)
{
Return (float) acossf (* This * V)/(this-> getlength () * v. getlength () * 2.0f ));
}

The following three functions are described: getlengthsq, * =, and anglewith.
Getlengthsq is potentially dangerous, because I am based on. code written by the net2003 editor. I know ECx = This, and that the float return value is directly from the floating point stack register fstp to the outside parameter. Therefore, I will use this method to write, no return value is even written! You may not use the same editor as me when reading this article. Therefore, after understanding the essence, you can use reasonable algorithms to implement your math library. All subsequent functions are written in an editor-independent method.

* = The potential danger of Operator Overloading is that vector is 4d and can represent 3D vectors or 3D coordinate points. If it is a vector, W = 0, which will only be affected by rotation and scaling. If it is a space point, W = 1, it will be subject to all types of changes, such as translation, rotation, and scaling. Because vectors cannot be translated and therefore are considered for operational efficiency, the caller of the mathematical library needs to pay attention to them.

The reason why the anglewith function is not internalized is that in future articles, I will further optimize the code here. Neither getlength nor acossf is an inline function. I have to expand it to compile the implementation and re-organize the encoding. This function does not seem to exist in the d3dx9 mathematical library ~~ There is no way to compare.

The efficiency of the above functions is roughly the same as that of the d3dx database:
Getlengthsq is slightly higher than d3dx
Getlength is twice the speed of d3dx because the d3d library does not use the SSE command.
The speed of normalize and cross is much higher than that of d3dx. The same reason is that the d3d library does not use the SSE command.
* = The efficiency is less than d3dxvec3transform by about 7%, which may be further improved! Let's take a look. The d3dx library uses 3 dnow! The operation is faster than SSE! Probably because of my amd3000 +... the speed should be almost the same for Inter.
Anglewith has no way to evaluate it, because there are no comparison objects.

Many algorithms have been manually rescheduled to find that the order of commands has a huge impact on efficiency! Be careful when changing the command order! It is best to copy the original one. Otherwise, you will be dizzy when running long assembly code ~ O ~
By the way, there are several questions that many people are confused about:
1. The code similar to _ mm_mov_ps () in the C ++ library is garbage! If you want efficiency, never use it. Learn the compilation and then write the code. The Code produced by the functions in those libraries is terrible!
2. The efficiency gap between movups and movaps is negligible! Do not declare a vector or matrix of _ m128 for the speed of so fast as 1%. You will get it when creating an array later!
3. My testing method is too good, that is, to cycle 10 million times, use timegettime () to check it. Run multiple times to find an average. Therefore, once the release mode is inline, the efficiency cannot be measured ~ If you have time, you can test it. It is estimated that the inline functions are close to the efficiency limit and are not worth optimization.
If you have any questions about my tests, you can take the test back to test the efficiency and try it with a variety of cpus. I am here to accept the bricks from anyone!

Next I will detail my understanding of SSE and floating point commands and the most useful matrix multiplication algorithm.

 

The largest part of the SSE instruction set is the ability to perform four float parallel operations. This is not so much in line with graphics algorithms as it is designed for Graphic programming. For example, two vectors are added, X1 + X2, Y1 + y2, Z1 + Z2, W1 + W2. It is better to save time by using four addition commands. This is what SSE can do for us. Here we will introduce eight registers: xmm0, xmm1, xmm2... xmm6 and xmm7. These registers are 128-bit. Each register can store four float values. Therefore, four float types, X1, Y1, Z1, and W1, can be placed in xmm0, while X2, Y2, Z2, W2 can be placed in xmm1, so xmm0 + = xmm1, and the result is saved in the xmm0 register! The following code is used:

Struct Vector
{
Float X, Y, Z, W;

Void add (const vector * pin)
{
_ ASM
{
MoV eax, pin; // Here it should actually be mov eax, dword ptr [pin];
MoV ECx, this;

Movups xmm0, [ECx]; // put the xyzw of this into xmm0
Movups xmm1, [eax]; // put the xyzw of pin into xmm1
Addps xmm0, xmm1; // xmm0 + = xmm1
Movups [ECx], xmm0; // give the xmm0 value to this
MoV [ECx + 12], 3f800000h // do not forget to give W = 1.0f
}
}
}

In fact, this is so simple. movups puts the vectors in the memory into the register, or puts the vectors in the register back into the memory. In short, it is an instruction to move 128 digits at a time, this command is about 70% slower than a common mov command, which is much slower than floating-point multiplication. A lot of time is spent on movups. Therefore, simple algorithms are not worth using SSE commands.

This vector addition command may be slower than the following algorithm:
Void add (const vector * pin)
{
X + = pin-> X;
Y + = pin-> Y;
Z + = pin-> Z;
W + = pin-> W;
}

So what if I want to add the four float VALUES OF THE xyzw vector to the same float value at the same time? For example, if I want to add the float variable h to xyzw in xmm0, I will do the following:

Float H = ...;
_ ASM
{
Movss xmm1, h
Shufps xmm1, xmm1, 0
Addps xmm0, xmm1
}

Now there are two new commands: movss and shufps.

Movss moves a 32-bit value to a 32-bit low in a 128-bit register. That is to say, only 0th of the four 32-bit blocks in xmm1 store the value of H.

At this time, I want the other three blocks to store the same value for parallel addition of xmm0, so I used the following command ----

Shufps is used to swap, copy, or overwrite the values of the four blocks in the 128-bit register, just like shuffling, so it is called shuffling command. We can see that the 3rd operands of this command are 0, and the operands are 8 bits, 0 = 00 00 00 00. You can see that I have divided 8 digits into 4 parts, each copy represents a register range.
00 is 0th Blocks
01 is the first block
10 is 2nd Blocks
3rd Blocks

Shufps DEST, SRC, 00 00 00
We will select a range from SRC Based on the 3rd operands, and drag the number in the block to the corresponding DEST block. The four operands here are all 0. Therefore, the values in the four DEST blocks are the values of the 0th intervals in SRC. When DEST and SRC are in the same register, they will shuffles themselves. Therefore, the four blocks in xmm1 store the values of the xmm1 0th blocks. The following example will help you better understand the problem:

DeST = W1, Z1, Y1, X1
Src = W2, Z2, Y2, X2

Run this command: shufps DEST, SRC, 01 11 00 10
The result is as follows: DEST = Y2, W2, X2, Z2.
See the following for clarity:

DeST block 3rd <------ block 01 in src (1)
DeST block 2nd <------ SRC Block 11 (3)
DeST block 1st <------ 00 block in src (0)
DeST block 0th <------ 10 blocks in src (2)

Note: The DEST register block sequence represented by the 3rd operands here is 3210, instead of 0123. X is usually stored in the 0 block, the smaller the address in the memory, the smaller the number ~~ In short, don't reverse it. It is very confusing here.

In fact, there are limits on 10 thousand million evil !!! : If DEST and SRC are different registers, the values in Blocks 3 and 4 will be in the Dest block rather than in the SRC block. Therefore, the Dest executed above is actually like this:
DeST = Y1, W1, X2, Z2

DeST block 3rd <------ block 01 in DEST (1)
DeST block 2nd <------ DEST Block 11 (3)
DeST block 1st <------ 00 block in src (0)
DeST block 0th <------ 10 blocks in src (2)

If you still don't understand it, you can send me a short message in the Forum.

The SSE basic commands are calculated using four arithmetic operations: addps, subps, mulps, and divps. It is easy to remember that the basic x86 commands are followed by a PS suffix. If it is an SS suffix, it means that there are only 0th intervals for calculation, the speed will be about 20% faster than the PS, And the amd cpu will be slower than the floating point command, so, the so-called 400% improvement in performance is purely a matter of mouth, because it is only a theoretical value. But 300% can still be done, especially complex algorithms, such as matrix multiplication.

Now I am sending a code for Matrix Multiplication. For the sake of focusing on the SSE instruction usage, I mainly pay attention to the readability of the code, so I didn't rearrange the instruction, so it will be about 25% slower than my optimized version, but even so, it is already three times faster than the general algorithm. You can take it back to the experiment, the rearranged code will be released later.

Void multmatrix (const matrix * pout, const matrix * pin1, const matrix * pin2)
{
If (! G_busesse2)
{
// [EdX] = xmm0 * xmm4 + xmm1 * xmm5 + xmm2 * xmm6 + xmm3 * xmm7
Pout-> _ 11 = pin1-> _ 11 * pin2. _ 11 + pin1-> _ 12 * pin2. _ 21 + pin1-> _ 13 * pin2. _ 31 + pin1-> _ 14 * pin2. _ 41;
Pout-> _ 12 = pin1-> _ 11 * pin2. _ 12 + pin1-> _ 12 * pin2. _ 22 + pin1-> _ 13 * pin2. _ 32 + pin1-> _ 14 * pin2. _ 42;
Pout-> _ 13 = pin1-> _ 11 * pin2. _ 13 + pin1-> _ 12 * pin2. _ 23 + pin1-> _ 13 * pin2. _ 33 + pin1-> _ 14 * pin2. _ 43;
Pout-> _ 14 = pin1-> _ 11 * pin2. _ 14 + pin1-> _ 12 * pin2. _ 24 + pin1-> _ 13 * pin2. _ 34 + pin1-> _ 14 * pin2. _ 44;

Pout-> _ 21 = pin1-> _ 21 * pin2. _ 11 + pin1-> _ 22 * pin2. _ 21 + pin1-> _ 23 * pin2. _ 31 + pin1-> _ 24 * pin2. _ 41;
Pout-> _ 22 = pin1-> _ 21 * pin2. _ 12 + pin1-> _ 22 * pin2. _ 22 + pin1-> _ 23 * pin2. _ 32 + pin1-> _ 24 * pin2. _ 42;
Pout-> _ 23 = pin1-> _ 21 * pin2. _ 13 + pin1-> _ 22 * pin2. _ 23 + pin1-> _ 23 * pin2. _ 33 + pin1-> _ 24 * pin2. _ 43;
Pout-> _ 24 = pin1-> _ 21 * pin2. _ 14 + pin1-> _ 22 * pin2. _ 24 + pin1-> _ 23 * pin2. _ 34 + pin1-> _ 24 * pin2. _ 44;

Pout-> _ 31 = pin1-> _ 31 * pin2. _ 11 + pin1-> _ 32 * pin2. _ 21 + pin1-> _ 33 * pin2. _ 31 + pin1-> _ 34 * pin2. _ 41;
Pout-> _ 32 = pin1-> _ 31 * pin2. _ 12 + pin1-> _ 32 * pin2. _ 22 + pin1-> _ 33 * pin2. _ 32 + pin1-> _ 34 * pin2. _ 42;
Pout-> _ 33 = pin1-> _ 31 * pin2. _ 13 + pin1-> _ 32 * pin2. _ 23 + pin1-> _ 33 * pin2. _ 33 + pin1-> _ 34 * pin2. _ 43;
Pout-> _ 34 = pin1-> _ 31 * pin2. _ 14 + pin1-> _ 32 * pin2. _ 24 + pin1-> _ 33 * pin2. _ 34 + pin1-> _ 34 * pin2. _ 44;

Pout-> _ 41 = pin1-> _ 41 * pin2. _ 11 + pin1-> _ 42 * pin2. _ 21 + pin1-> _ 43 * pin2. _ 31 + pin1-> _ 44 * pin2. _ 41;
Pout-> _ 42 = pin1-> _ 41 * pin2. _ 12 + pin1-> _ 42 * pin2. _ 22 + pin1-> _ 43 * pin2. _ 32 + pin1-> _ 44 * pin2. _ 42;
Pout-> _ 43 = pin1-> _ 41 * pin2. _ 13 + pin1-> _ 42 * pin2. _ 23 + pin1-> _ 43 * pin2. _ 33 + pin1-> _ 44 * pin2. _ 43;
Pout-> _ 44 = pin1-> _ 41 * pin2. _ 14 + pin1-> _ 42 * pin2. _ 24 + pin1-> _ 43 * pin2. _ 34 + pin1-> _ 44 * pin2. _ 44;
}
Else
{
_ ASM
{
MoV edX, pin2; // at this time, pin2 is saved
Movups xmm4, [edX]; // 1st rows of pin2
Movups xmm5, [edX + 16]; // 2nd rows of pin2
Movups xmm6, [edX + 32]; // 3rd rows of pin2
Movups xmm7, [edX + 48]; // 4th rows of pin2

MoV eax, pin1; // in this case, pin1 is saved.
MoV edX, pout;

MoV ECx, 4; // four cycles

Loopit: // start Loop
Movss xmm0, [eax]; xmm0 = pin1-> X
Shufps xmm0, xmm0, 0; shuffles xmm0 = pin1-> X, pin1-> X, pin1-> X, pin1-> X
Mulps xmm0, xmm4;

Movss xmm1, [eax + 4]; xmm1 = pin1-> Y
Shufps xmm1, xmm1, 0; shuffling xmm1 = pin1-> Y, pin1-> Y, pin1-> Y, pin1-> Y
Mulps xmm1, xmm5;

Movss xmm2, [eax + 8]; xmm2 = pin1-> Z
Shufps xmm2, xmm2, 0; shuffles xmm2 = pin1-> Z, pin1-> Z, pin1-> Z, pin1-> Z
Mulps xmm2, xmm6;

Movss xmm3, [eax + 12]; xmm3 = pin1-> W
Shufps xmm3, xmm3, 0; shuffles xmm3 = pin1-> W, pin1-> W, pin1-> W, pin1-> W
Mulps xmm3, xmm7;

Addps xmm0, xmm1;
Addps xmm2, xmm3;
Addps xmm0, xmm2; the final result row is saved in xmm0

Movups [edX], xmm0; Save the result to pout
Add edX, 16;
Add eax, 16; used as the address change

Loop loopit;
}
}
}

The Assembly here is easy to understand compared with the general algorithm above. Next, I will talk about the key algorithms of the matrix class.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.