0.033 seconds of art-traps in xNa Math Library

Source: Internet
Author: User

0.033 seconds of art ---- traps in the xNa Math Library

For personal use only, do not reprint, do not use for any commercial purposes. 

Last timeDescribes how to view. netProgramASMCodeAnd analyzes some functions under system. Math. This time, we will take a closer look at how to efficiently use the library of mathematics in xNa. The following uses matrix and vector3 as an example. Other types can be pushed accordingly. For the purpose of testing and demonstration, I wrote a fairly "idiot" code:

Test 1:

Code
AABB box =   New AABB ( New Vector3 ( 34.4f , 4 , 23 ));
Vector3 Seed = Box. Center;

Public class AABB
{< br> private vector3 center;
Public vector3 center
{< br> Get { return center ;}
set {center = value ;}
}

PublicAABB (vector3 Center)
{
This. Center=Center;
}
}

The purpose of test 1 is to check whether JIT has simple attributes such as inline and vector3. Unfortunately, the answer is no.

Test 2:

Code
Public   Static Vector3 Foo (vector3 V, Float Radius)
{
Matrix;
Matrix. createtranslation ( Ref V, Out A );

Matrix B=Matrix. createtranslation (v );

Matrix. Multiply (RefA,RefB,OutA );

Vector3 F1=B. forward;

Vector3 F2;
F2.x=B. m31;
F2.y=B. M32;
F2.z=B. M33;

Vector3 F3;
Vector3.add (RefF1,RefF2,OutF3 );
Vector3.transform (RefF3,RefA,OutF1 );
ReturnF1;
}

Test 2 is the focus of this Article. To facilitate the discussion, we divide the code of Test2 into several sections:

Section 1:

Code
  Public   Static Vector3 Foo (vector3 V, Float Radius)
{
Matrix;
Matrix. createtranslation ( Ref V, Out A );
00000000 Push EBP
00000001 MoV EBP, ESP
00000003 Push EDI
00000004 Push ESI
00000005 Sub ESP, 0f4h
10000000b mov ESI, ECx
10000000d Lea EDI, [EBP + Ffffff54h]
00000013 MoV ECx, 29 H // 29 H = 41
00000018 XOR eax, eax // Eax = 0
2017001a rep STOs dword ptr es: [EDI] // For (I <41) {New float = 0}
2017001c mov ECx, ESI
2017001e mov dword ptr [EBP + Ffffff04h], ECx
00000024 Cmp dword ptr ds: [03a97e8ch], 0  
2017002b je 00000032  
2017002d call 76cfd6c9 // Throw exception here ???
00000032 Lea ECx, [EBP + 0ch] // Pass ARG
00000035 Lea edX, [EBP - 48 h] // Pass ARG
00000038 Call dword ptr ds: [00f46fb8h] // Call createtranslation

This part of the code confused me for a long time at the beginning, because there are only three commands related to calling createtranslation: 032-038. What is the previous heap of code? Is it a compiler error? I thought I didn't need to check il when I had ASM. Unfortunately, it wasn't until I re-viewed Il. I suddenly realized: How did I forget part of the function's local stack initialization. Five temporary variables are used in FOO: two matrices and three vector3. The most interesting thing is. Instead of initializing five different struct values, the pencil initializes 41 independent float values! 018 ~ 01a completes initialization and initializes 41 float to 0. As for 000 ~ The 00D part is not very important. You can ignore them. The strange thing is 01c ~ In part, I have never understood the purpose of this Code. I guess it should be a security check, for example, whether there is stackover flow, and if so, jump to the function at 76cfd6c9c.

The conclusion is: the number of temporary variables in the function affects the function efficiency. 01a indicates repeated 41 times to initialize all variables.

Section 2:

Code
Matrix B = Matrix. createtranslation (v );
10000003e Lea eax, [EBP + 0ch]
00000041 Push dword ptr [eax + 8 ]
00000044 Sub ESP, 8  
00000047 Movq xmm0, mmword PTR [eax]
2017004b movq mmword PTR [esp], xmm0
00000050 Lea ECx, [EBP + Ffffff14h]
00000056 Call dword ptr ds: [00f46fach]
2017005c Lea EDI, [EBP + Ffffff78h] // Address of B
00000062 Lea ESI, [EBP + Ffffff14h] // Address of returen Value
00000068 MoV ECx, 10 h // 10 h = 16
2017006d rep movs dword ptr es: [EDI], dword ptr [esi] // Copy return value to B

Obviously, createtranslation is much more complicated than above. I am not an assembly expert, so 041 ~ I am confused about what 050 did (hope someone can give me some advice ). For 05c ~ In the 06d part, createtranslation copies the return value to B. Note that 16 float values are copied here. However, it is not obvious that the address of createtranslation called this time is completely different from that of the previous time. If you are interested in viewing the ASM of createtranslation (vector3), you will find that it does two more tasks than createtranslation (ref vector3, out matrix): 1, create 16 local stacks of Float size; 2. When the calculation is complete, copy the value of the local stack to a temporary memory, which is also 16 mov.

The conclusion is that functions of non-ref versions execute 48 more commands than those of ref versions! For value type, only the transfer method will bring about huge performance differences.

Section 3:

Code
Matrix. Multiply ( Ref A, Ref B, Out A );
10000006f Lea eax, [EBP - 48 h]
00000072 Push eax
00000073 Lea ECx, [EBP - 48 h]
00000076 Lea edX, [EBP + Ffffff78h]
2017007c call dword ptr ds: [00f472a8h]

The preceding conclusion is verified again. For functions of the ref version, you only need to directly pass the parameter address and call the function.

Section 4:

Code
Part 1 :
Vector3 F1 = B. forward;
00000082 Lea ECx, [EBP + Ffffff78h]
00000088 Lea edX, [EBP + Ffffff08h]
2017008e call dword ptr ds: [00f46f28h]
00000094 Lea EDI, [EBP + Ffffff6ch]
10000009a Lea ESI, [EBP + Ffffff08h]
201700a0 movq xmm0, mmword PTR [esi]
201700a4 movq mmword PTR [EDI], xmm0
201700a8 add ESI, 8  
1000000ab add EDI, 8
201700ae movs dword ptr es: [EDI], dword ptr [esi]

Part 2 :
Vector3 F2;
F2.x = B. m31;
201700af rjdword PTR [EBP - 68 h]
201700b2 fstp dword ptr [EBP + Ffffff60h]
F2.y = B. M32;
201700b8 mongodword PTR [EBP - 64 h]
201700bb fstp dword ptr [EBP + Ffffff64h]
F2.z = B. M33;
201700c1 1_dword PTR [EBP - 60 h]
201700c4 fstp dword ptr [EBP + Ffffff68h]

The two short pieces of code here complete the same thing. In the first version, the attributes of matrix are directly used, and in the second version, the basic elements are manually accessed. Forward is not inline, but the center attributes discussed by forward and test1 are not quite the same. The center is much simpler-directly returning an existing value, while forward needs to "combine" a value to return, so it is barely acceptable without inline. Surprisingly, 10 commands are used to access such a property. If you calculate the function called by 08e, you need to add 14 to this number for a total of 28! The code we manually inline only uses six commands, big win.

The conclusion is: Apply manual inline properly to improve performance.

Section 5:This part does not have much meaning, just to prevent the previous code from being optimized by the compiler.

Finally, I 'd like to have a rough idea about how to correctly use the mathematical library in xNa and how to optimize it. If you study the ASM code more, you will surely find more :)


 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.