How far can the C + + compiler help us to optimize the code?

Source: Internet
Author: User
Tags int size

Turn from: http://www.kuqin.com/language/20120324/319283.html

A simple summation and summation program:

01.TYPE s=0;
02.for (int i = 0;i < SIZE; i++) {
S + = A[i];
04.}

Many people feel that the program is poorly written and the compiler cannot generate good assembly code. There are several "optimizations" that follow:

#include
02.using namespace Std;
03.
04.void Main (int argc,char **argv)
05.{
#define TYPE INT
#define SIZE 10000
08.
type* A=new Type[size];
for (int i = 0; i<size; ++i) {
A[i] = i;
12.}
13.//sum, usually version
TYPE s=0;
for (int i = 0;i < SIZE; i++) {
S + + a[i];
17.}
cout<<s<<endl;
19.
TYPE S2 = 0;
21.//Version 1: Think the intermediate generated variable i is redundant, use the move pointer instead
type* end = a + SIZE;
for (; a!= end;) {
S2 + = * (a++);
25.}
cout<<s2<<endl;
27.
28.//version 1 to move A to the end of the array, now move back to the original position
A = End-size;
30.
31.//Version 2: Think that the number of cycles is too much, you can reduce the number of cycles
TYPE S3 = 0;
for (int i = 0; i < SIZE;) {//Only if the size is even
S3 + + a[i++];
S3 + + a[i++];
36.}
Panax cout<<s3<<endl;.
38.
39.//Version 3: I think that version 2 will make the CPU can not be disorderly execution, reduce efficiency, should be converted to assembly, the intermediate results into a separate register
40.//Thanks to Menzi11 's article, let me realize that the relevant data in the program will make the CPU can not disorderly execution.
41.//Here is replaced by pseudo assembler
TYPE S4 = 0;
43.
Register TYPE r1 = 0;
Register TYPE r2 = 0;
for (int i = 0; i < SIZE;) {//Only if the size is even
R1 + + + a[i++];
R2 + = a[i++];
49.}
50.
Wuyi Cout<<r1 + r2<<endl;
52.}

Several of the above versions are reasonable, but these optimizations are based on the assumption that the compiler cannot generate efficient assembly code.

Here's a look at the compiler-generated results (vs2010,release):

for (int i = 0;i < SIZE; i++) {
S + = A[i];
03.013b1040 mov ebx,dword ptr [eax+4]//Put a[0],a[4],a[8] ... Add to the EBX
04.013b1043 add Ecx,dword ptr [eax-8]//Put a[1],a[5],a[9] ... Add to the ECX
05.013b1046 add Edx,dword ptr [eax-4]//Put a[2],a[6],a[10] ... Add to the EdX
06.013b1049 add Esi,dword ptr [eax]//Put a[3],a[7],a[11] ... Add to ESI
07.013b104b add DWORD ptr [EBP-4],EBX
08.013b104e Add eax,10h
09.013b1051 Dec DWORD ptr [EBP-8]
10.013b1054 jne main+40h (13b1040h)
11.}
cout<<s<<endl;
13.013b1056 mov eax,dword ptr [ebp-4]
14.013b1059 Add Eax,esi
15.013b105b Add Eax,edx
16.013B105D mov edx,dword ptr [__imp_std::endl (13B204CH)]
17.013b1063 Add Ecx,eax//The top 3 Add instructions add Ebx,ecx,edx,edi to ECX, that is, the ECX is the cumulative result

Visible compiler generated code is the best code, eliminate the intermediate variable i, reduce the number of cycles, eliminate the CPU can not be disorderly execution of the factors.

BTW:

One might have a question: if size is not an even number, the compiler can generate similar efficient assembler code.

When size = 9999:

01.//when size = 9999, the compiler puts the intermediate results into three registers, perfect
for (int i = 0;i < SIZE; i++) {
S + = A[i];
04.01341030 add Ecx,dword ptr [eax-8]
05.01341033 add Edx,dword ptr [eax-4]
06.01341036 add Esi,dword ptr [eax]
07.01341038 Add eax,0ch
08.0134103B Dec ebx
09.0134103C jne main+30h (1341030h)
10.}

When size = 9997:

01.//when size = 9997, a bit complicated, first a[0] to a[9995] into ecx and edx
02.//then a[9996 into the EDI, and finally Ecx,edi are added to the edx
03.//edx pressure Stack, Call the operator<< function
. for (int i = 0;i < SIZE; i++) {
05.00d31024 xor eax,eax
06. S + + a[i];
07.00d31026 add Ecx,dword ptr [esi+eax*4]
08.00d31029 add Edx,dword ptr [esi+eax*4+4]
09.00d3102d Add eax,2 10.00d31030 cmp eax,270ch
11.00d31035 JL main+26h (0d31026h)
. for (int i = 0;i < SIZE; i++) {
13.00d31 037 CMP EAX,270DH
14.00d3103c jge main+41h (0d31041h)
15. S + + a[i];
16.00d3103e mov edi,dword ptr [esi+eax*4]
17.}
cout<<s<<endl;
19.00d31041 mov eax,dword ptr [__imp_std::endl (0d3204ch)]
20.00d31046 add edx,ecx
21.00d31048 mov ecx,dword ptr [__imp_std::cout (0d32050h)]
22.00d3104e push eax
23.00d3104f add edx,edi
24.00d31051 push edx
25.00d3 1052 call DWORD ptr [__imp_std::basic_ostream<char,std::char_traits >::operator<< (0d32048h)]

The above analysis is size, that is, the size of the array is known, the size of the array is unknown, what the compiler will be.

01.TYPE MySum (type* A, int size) {
TYPE s = 0;
for (int i = 0; i < size; ++i) {
-S + + a[i];
05.}
A. return s;
07.}

Generated assembly code:

01.//first add a[0] to a[size-2]
TYPE s = 0;
03.00ed100c XOR Esi,esi
for (int i = 0; i < size; ++i) {
05.00ed100e XOR Eax,eax
06.00ED1010 CMP ebx,2
07.00ed1013 JL mysum+27h (0ed1027h)
08.00ed1015 Dec ebx
-S + + a[i];
10.00ed1016 add Ecx,dword ptr [edi+eax*4]//a[0],a[2],a[4] ... Add to ECX.
11.00ed1019 add Edx,dword ptr [edi+eax*4+4]//a[1],a[3],a[5] ... Add to EdX.
12.00ed101d Add eax,2
13.00ED1020 CMP EAX,EBX
14.00ed1022 JL mysum+16h (0ed1016h)
15.00ed1024 mov ebx,dword ptr [size]
for (int i = 0; i < size; ++i) {
17.00ed1027 CMP EAX,EBX//Judge if the last element is added
18.00ed1029 jge Mysum+2eh (0ed102eh)
-S + + a[i];
20.00ED102B mov esi,dword ptr [edi+eax*4]//when size is odd it will execute, even when the number is not executed
21.00ed102e Add EDX,ECX
22.}

Summary: C + + compiler generated assembly code in most cases and people write the best assembly code equivalent.

The key point is that the compiler will continue to upgrade, adapt to the new CPU instructions, systems, and so on, handwritten assembly code is often tragic.

It is very important for programmers to know how to optimize the compiler and how to optimize it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.