Starting with such simple code

Source: Internet
Author: User

Starting with such simple code
Starting from such a simple code, I discussed the following method in the company's technical group some time ago, which is faster. public class Demo {private int count; const int maxNum = int. maxValue; public void Run1 () {for (int I = 0; I <maxNum; I ++) {this. count ++;} Console. writeLine (this. count); Console. readKey () ;}public void Run2 () {int temp = 0; for (int I = 0; I <maxNum; I ++) {temp ++;} this. count = temp; Console. writeLine (this. count); Console. readKey ();} Many people can correctly perceive that the Run2 method is faster, Why? Let's take a look at the IL commands of the two methods first. Demo. run1: IL_0000: nop IL_0001: ldc. i4.0 IL_0002: stloc.0 // iIL_0003: br. s IL_0019IL_0005: nop IL_0006: ldarg.0 IL_0007: dup IL_0008: ld1_userquery + Demo. countIL_000D: ldc. i4.1 IL_000E: add IL_000F: stfld UserQuery + Demo. countIL_0014: nop IL_0015: ldloc.0 // iIL_0016: ldc. i4.1 IL_0017: add IL_0018: stloc.0 // iIL_0019: ldloc.0 // iIL_001A: ldc. i4 FF 7F IL_001F: clt IL_0021: stloc.1 // CS $4 $ Export il_0022: ldloc.1 // CS $4 $ export il_0023: brtrue. s IL_0005IL_0025: ldarg.0 IL_0026: ld1_userquery + Demo. countIL_002B: call System. console. writeLineIL_0030: nop IL_0031: call System. console. readKeyIL_0036: pop IL_0037: ret Demo. run2: IL_0000: nop IL_0001: ldc. i4.0 IL_0002: stloc.0 // tempIL_0003: ldc. i4.0 IL_0004: stloc.1 // iIL_0005: br. s IL_0011IL_0007: nop IL_0008: ldloc.0 // tempIL_0009: ldc. i4. 1 IL_000A: add IL_000B: stloc.0 // tempIL_000C: nop IL_000D: ldloc.1 // iIL_000E: ldc. i4.1 IL_000F: add IL_0010: stloc.1 // iIL_0011: ldloc.1 // iIL_0012: ldc. i4 FF 7F IL_0017: clt IL_0019: stloc.2 // CS $4 $ export il_001a: ldloc.2 // CS $4 $ export il_001b: brtrue. s IL_0007IL_001D: ldarg.0 IL_001E: ldloc.0 // tempIL_001F: stfld UserQuery + Demo. countIL_0024: ldarg.0 IL_0025: ld1_userquery + Demo. countI L_002A: call System. console. writeLineIL_002F: nop IL_0030: call System. console. readKeyIL_0035: pop IL_0036: ret Demo .. ctor: IL_0000: ldarg.0 IL_0001: call System. object .. ctorIL_0006: ret shows the difference between the ldw.ldloc and ldloc commands. These are all il commands, so it is difficult to measure the speed. The il command will be converted to the machine code only when it is running. Let's take a look at how they see the note from the Assembly perspective, and use Windbg to load and run the program! Run the U command to view the JIT generated code. Of course, the u command can still be used. Here we use! The advantage of U is that you can clearly restore the call command. 0: 003>! U %7ff963450%normal JIT generated codeDemo. run2 () Begin logs, size 4d >>> 20177ff9 '2014 53 push rbx00007ff9' 63450380 4883ec30 sub rsp, 30h00007ff9 '63450381 488bd9 mov rbx, rcx00007ff9 '2014 33c0 xor eax, comment '8bc8 mov ecx, comment '6345038c 0f1f4000 nop dword ptr [rax] Comment 7ff9 '63450390 83c101 add ecx, comment 7ff9' 63450393 83c001 add eax, comment 7ff9 '63450396 3dffffff7 F cmp eax, comment '6345039b 7cf3 jl comment 7ff9 'comment '6345039d 894b08 mov dword ptr [rbx + 8], ecx00007ff9 '634503a0 8b5b08 mov ebx, dword ptr [rbx + 8] 20177ff9 '634503a3 e8b816aa5e call mscorlib_ni + 0x391a60 (20177ff9 'c1ef1a60) (System. console. get_Out (), mdToken: 06000776) Listen 7ff9 '634503a8 4c8bd8 mov r11, comment '634503ab 498b03 mov rax, qword ptr [r11] Comment 7ff9 '634503ae 8bd3 mov edx %7ff9 '%498bcb mov rcx, %'634503b3 ff9068010000 call qword ptr [rax + 168 h] %7ff9 '%33d2 xor edx, edx%7ff9 '634503bb %lea rcx, [rsp + 20 h] Listen 7ff9 '634503c0 e8bbd5035f call mscorlib_ni + 0x92d980 (Listen 7ff9 'c248d980) (System. console. readKey (Boolean), mdToken: 060007aa) %7ff9 '634503c5 90 nop%7ff9 '634503c6 4883c430 add rsp, 30h%7ff9 '634503ca 5b pop rbx%7ff9 '63 20173cb f3c3 rep ret0: 003>! U need 7ff963450310normal JIT generated codeDemo. run1 () Begin %7ff963450310, size 51 >>> %7ff9 '2017 53 push rbx%7ff9 '63450310 4883ec30 sub rsp, 30h%7ff9' 63450311 33d2 xor edx, edx00007ff9 '1996 660f1f840000000000 nop word ptr [rax + rax] %7ff9 '8b00008 mov eax, dword ptr [rcx + 8] %7ff9 '63450317 83c001 add eax, 20177ff9 '2014 63450326 mov dword ptr [rcx + 8], eax3167ff9 '2014 83c201 add edx, 20177ff9 '6345032c 81faffffff7f cmp edx, 7FFFFFFFh00007ff9 '7cec jl 00007ff9 '6345032000007ff9 '63450332 8b5908 mov ebx, dword ptr [rcx + 8] 20177ff9 '1996 e000017aa5e call mscorlib_ni + 0x391a60 (20177ff9 'c1ef1a60) (System. console. get_Out (), mdToken: 06000776) Comment 7ff9 '6345033c 4c8bd8 mov r11, comment '6345033f 498b03 mov rax, qword ptr [r11] Comment 7ff9 '63450342 8bd3 mov edx, ebx00007 Ff9 '19498bcb mov rcx, r113167ff9' 63450344 ff9068010000 call qword ptr [rax + 63450347 h] 1277ff9 '6345034d 33d2 xor edx, comment '6345034f 488d4c2420 lea rcx, [rsp + 20 h] Listen 7ff9 '1996 e827d6035f call mscorlib_ni + 0x92d980 (Listen 7ff9 'c248d980) (System. console. readKey (Boolean), mdToken: 060007aa) 00007ff9 '1990 nop00007ff9 '6345035a 4883c430 add rsp, 30h00007ff9 '6345035e 5b pop rbx00007ff9' 63450359 F f3c3 rep ret we only care about the commands before the Console call. You can see the differences in detail. Run2 performs addition operations mainly to test the two add commands, operating two registers respectively, one eax is I ++, one ecx is temp ++, And the other 7ff9 '63450390 83c101 add ecx, listen 7ff9 '63450393 83c001 add eax, 1Run1 the main instructions for addition are as follows, we can see two more mov operation eax registers. 20177ff9 '8b00008 mov eax, dword ptr [rcx + 8] 4107ff9 '1997 83c001 add eax, 20177ff9 '192 63450320 mov dword ptr [rcx + 8], eax3167ff9 '63450329 83c201 add edx, 1 Note: http://zh.wikipedia.org/wiki/x86call conventions can be viewed in the register conventions. mov operation sent from rcx The number of memory + 8 locations (this is the place where the object on the hosting stack is located ). the first step is to count from the storage to the eax register, the second step is to add 1 to the register, and the third step is to write the register value back to the memory. the fourth sentence is to ignore the I ++ operation first. to satisfy everyone's curiosity, I will outline the object layout. we can see that offset 8 stores the count value we want. 0: 003>! Do worker 00000253baf0 Name: DemoMethodTable: Random: 24 (0x18) bytes (C: \ Users \ cuiweifu \ AppData \ Local \ Temp \ SnippetCompilerTemp \ Program \ output.exe) Fields: MT Field Offset Type VT Attr Value name1_7ff9c1f9f060 4000002 8 System. int32 1 instance 2147483647 count. Now we know the above operation details. Let's review some basic knowledge about the computer composition principle. the higher the execution rate from the CPU, L1> L2> L3> memory> hard disk> other peripherals. registers are part of the CPU (for more information, see http://zh.wikipedia.org/wiki/register), so register operations are the fastest. the access to the cache depends on the level of hit. in this example, this is the test result of the register PK L1. here, we need to mention that this addition operation is performed in L1, so the real value in the memory is refreshed to the memory by the CPU. if the Run1 method is called simultaneously in a multi-threaded environment, the output value is uncertain. we. to avoid this problem, the method used in NET is System. threading. interlocked. increment (ref this. count); this is eventually converted into an atomic command of the CPU (I remember that the command names on different CPUs are different, so the command names are not mentioned here ). this command can update the value atomically. although this method is atomic, It is slower than the Run1 method. if you are interested in the specific cause, study it yourself. ASIDE: First of all, I would like to thank Yang Jie for pushing me to write this article. Although this article is a rough description, I hope it will help you understand it. I wanted to talk about some things from the perspective of compilation principles, and from the perspective of CPU instructions and operating systems, however, I found that it is difficult to clarify in a blog with my stupid descriptive ability. In addition, there are too many things to be mentioned, and there are too many related concepts to be understood, so I can only give a try. if you like some detailed principles, please feel free to discuss them with me.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.