This articleArticleThe author is an MVP from the United States. This is the first article in his series of articles "under the cover". The intention of this article is optimized from the bottom layer.CodePerformance, and serve as the technical basis for reading other articles by the author. Although this practice seems to be too much at first, it is very helpful for readers to understand many underlying operations of. net)
We started with the foundation of using Visual Studio for unmanaged code debugging, so that you can learn more about future examples and make this article the basis of my future articles, although I also use windbg, Visual Studio has become a powerful debugging tool and is easier to use for simple code optimization problems.
When we need to tune code with high performance requirements, it is usually not the best practice to view Il, because the JIT optimizer will silently optimize our code, using reflector or ildasm, you can quickly find that the Il code generated in the release and debug modes is almost identical. So what makes the code in the release mode run so quickly? This is the result of JIT optimization. by viewing the managed code (IL code), we cannot see these optimizations, so we will look for clues Through native code (Local Code.
It must be noted that I do not advocate that we often do this. I do not agree with early optimization. You must make your code work first, and you must clearly know which code isNot worth itOptimized. When your code is complete, you can find the places that need to be accelerated. When you find that 10% of the code is used for 70% of the time, go back and optimize the 10% code. at the same time, you should always establish the judgment basis on the actual speed measurement, instead of simply reading the code. Finally, the data structure selection is more important than the underlying optimization.
Of course, it is interesting to understand the secrets hidden in the. NET underlying layer. Let's start to set up Visual Studio and experiment with a simple example.
First, we need some test code.
Static void main (string [] ARGs ){
For (INT I = 0; I <10; I ++ ){
Console. writeline ("Hello world! ");
}
}
To enable unmanaged code debugging, You Need To Set Visual Studio. Open the properties of the project and enter the debug tab. Select the "enable unmanaged code debugging" check box on this page.
(Note: This option is only valid for the current configuration, so we should set this option for all the configurations we use.) insert a breakpoint at the beginning of the loop and runProgram, You will hit a breakpoint as usual. At this time, your screen should look 2 (TRANSLATOR: Missing figure). If you do not have a stack window, you can use menu-> Windows-> call stack (or Ctrl + d c) after opening the call stack, we can right-click it and select go to disassembly to enter the following code.
static void Main(string[] args) {
00000000 push ebp
00000001 mov ebp,esp
00000003 push edi
00000004 push esi
00000005 push ebx
00000006 sub esp,38h
00000009 xor eax,eax
0000000b mov dword ptr [ebp-10h],eax
0000000e xor eax,eax
00000010 mov dword ptr [ebp-1Ch],eax
00000013 mov dword ptr [ebp-3Ch],ecx
00000016 cmp dword ptr ds:[00912DC8h],0
0000001d je 00000024
0000001f call 792B228E
00000024 xor esi,esi
00000026 xor edi,edi
00000028 nop
for (int i = 0; i < 10; i++) {
00000029 xor esi,esi
0000002b nop
0000002c jmp 0000003D
0000002e nop
Console.WriteLine("Hello World!");
0000002f mov ecx,dword ptr ds:[022B303Ch]
00000035 call 785D9074
0000003a nop
}
0000003b nop
for (int i = 0; i < 10; i++) {
0000003c inc esi
0000003d cmp esi,0Ah
00000040 setl al
00000043 movzx eax,al
00000046 mov edi,eax
00000048 test edi,edi
0000004a jne 0000002E
}
0000004c nop
0000004d lea esp,[ebp-0Ch]
00000050 pop ebx
00000051 pop esi
00000052 pop edi
00000053 pop
We are looking at the native code (Local Code) produced by JIT for our code. We can see how a simple loop runs at the native code level, if you have never studied native code, the common code may seem quite strange. Let's take a look at what happened here.
00000029 xor esi, ESI
2017002b NOP
0000002c JMP 0000003d
The above code initializes our counter in ESI. ESI is an index register that can be used to index arrays. You can see that there is an old "trick" to clear the counter 0, the Code does not put the 0 value into the register. Instead, it allows the Register itself to be unique or (XOR) itself to achieve the goal of clearing 0. The next line of Nop indicates "no operation ", and their role is the same as their name, and nothing is done. The Code will jump to 3D immediately. sometimes jump like this makes our code not run from top to bottom (just like in many advanced languages such as C, VB, and C ), if you jump to another part of this cycle, you can continue to analyze our code.
2017003c Inc ESI
In the first command after 3c, add a counter in ESI (you can see its value through the Register window or the key combination Ctrl + d r). In the first cycle, the code will skip this line, because the preceding jump command directly points to the terminal 3D
2017003d cmp esi, 0ah
00000040 SETl al
00000043 movzx eax, Al
00000046 mov EDI, eax
00000048 test EDI, EDI
0000004a JNE 0000002e
From 10000003d to 4A, it indicates the actual comparison of the cyclic stop value and the jump. If we do not reach this value (I <10), the last line will jump to 2E: this is 4A in the original book. It is a written mistake.) continue this cycle, that is, the place where the cycle begins.
2017002f mov ECx, dword ptr ds: [022b303ch]
00000035 call 785d9074
The first row above will load the string from the memory to the ECX register (this is a general register). Generally, ECx is always used to pass the first parameter to the method. In the instance method, ECX will always contain this, followed by EDX containing the second parameter, followed by a series of pushes used to import other parameters into the stack
The next statement executes the actual call. We will discuss how to find the called method later, but now we can refer toSource codeAs you can see, this is undoubtedly the console. writeline. The Code then executes the auto-increment of the index and returns to continue executing the code inside the loop.
However, our trivial examples have produced significant waste. The following is an example.
00000009 XOR eax, eax
10000000b mov dword ptr [ebp-10h], eax
0000000e XOR eax, eax
00000010 mov dword ptr [ebp-1Ch], eax
We set eax to 0 twice in a row. At this time, because we are running in debug mode, the optimization is not performed in debug mode. In other words, this code is only executed by JIT, however, JIT is not allowed to perform any intelligent optimization.
Next let's take a look at the optimized code:
Here are some questions about viewing the optimization code.
1) the JIT optimization is disabled by the debugger by default. (it took me a long time to realize that I have been watching the code that has not been optimized)
2) It is necessary to handle the impact of the "just my code" option on the optimized code.
I first saw a solution to this problem in my post on Vance Morrison (thanks to Vance, I have been troubled by the problem for a long time, finally, the original assemble method without source code is used ).
To solve this problem, follow these steps:
1) Open tools> Options> debugging> General
2) Make sure that 'support JIT Optimization on module load' is not selected
3) Make sure that 'enablejust my Code' is not selected.
Vance also recommends that you go to advanced build to set the release DLL to PDB only, then we can run this code in the same way as before.
Another way to view our code with JIT is to use the release mode, start the executable without the debugger, and then attach Visual Studio to the process for debugging.
We can optimize the code by using any method. The optimized code is as follows:
for (int i = 0; i < 10; i++) {
00000000 push esi
00000001 xor esi,esi
Console.WriteLine("Hello World!");
00000003 cmp dword ptr ds:[02271084h],0
0000000a jne 00000016
0000000c mov ecx,1
00000011 call 786FC654
00000016 mov ecx,dword ptr ds:[02271084h]
0000001c mov edx,dword ptr ds:[0227307Ch]
00000022 mov eax,dword ptr [ecx]
00000024 call dword ptr [eax+000000D8h]
for (int i = 0; i < 10; i++) {
0000002a add esi,1
0000002d cmp esi,0Ah
00000030 jl 00000003
00000032 pop esi
}
}
00000033 ret
Wow, this time the Code is much less than the first time, and JIT optimization does work well. That's why checking the actual decompilation code rather than Il is so important, because JIT is often optimized by recognizing the mode in Il, alert readers may note that more passwords are generated within our cycle. It seems terrible at first, but it actually means that the optimizer has helped us to inline the console. the writeline method actually saves a lot of code. In the following post, I will talk about inline, but it is an important optimization.
Now we have prepared how to enjoy the optimized and unoptimized code in the debugger. I think this is a good start, in the following posts, I will lay a solid foundation for a better understanding of the general optimization process of JIT. We can also use some tools to see how they can help us get better code.
Hope to see you there.
Original article address
Http://codebetter.com/blogs/gregyoung/archive/2006/06/09/146298.aspx
Resources of the original article:
Http://en.wikipedia.org/wiki/X86
Http://www.codeguru.com/csharp/.net/net_general/il/article.php/c4635/ (IL tutorial)
Http://burks.brighton.ac.uk/burks/language/asm/asmtut/asm1.htm (ASM tutorial)