C ++ performance analysis (1): Performance Analysis

Source: Internet
Author: User

C ++ performance analysis (1): Performance Analysis
C ++ performance analysis (1)

Performance problems can not be solved simply by "technology". They are often a comprehensive problem such as architecture, testing, and assumptions. However, an engineer must start from an early stage and solve some small "obvious" problems. Otherwise, it will accumulate much less, and the dam will break into the ant nest.

Why is the performance of C ++ always after C (see http://benchmarksgame.alioth.debian.org/u32/performance.php? Test = latest test results of websites such as binarytrees )? I think there are three reasons:

1) The C ++ compiler used for testing does not use the latest Optimization Technology

2) the additional value of C ++ is not considered for testing.

3) The "subtlety" of the C ++ application layer (refer to my other blogs on C ++) makes general programmers tend to be discouraged by choosing "Textbook use cases ", this prevents some side effects from being removed from the application layer.

I remember that more than 10 years ago, when I was developing at Microsoft, I wrote to Stan Lippman, the earliest compiler AUTHOR OF C ++ (then the Microsoft VC ++ architect) I consulted a series of C ++ performance difficulties in our group. With his help, we used technologies such as inline and RVO in key areas to completely solve performance problems, several major errors of VC ++ are also found. I realized that most of the performance problems of C ++ are due to our understanding of C ++, and most of them are not difficult to solve.

Next we will use an example to compare and see how some subtle details affect program performance.

 

StructIntPair

{

Int ip1;

Int ip2;

IntPair (int i1, int i2): ip1 (i1), ip2 (i2 ){}

IntPair (int i1): ip1 (i1), ip2 (i1 ){}

};

 

// Calc sum (usinh value semantic)

IntSum1(IntPair p)

{

Return p. ip1 + p. ip2;

}

// Calc sum (usinh ref semantic)

IntSum2(IntPair & p)

{

Return p. ip1 + p. ip2;

}

// Calc sum (usinh const ref semantic)

IntSum3(Const intPair & p)

{

Return p. ip1 + p. ip2;

}

The above simpleStruct, There are threeSumFunction, but is the performance the same? We use the following program to test:

Double Sum (int t, int loop)

{

Using namespace std;

If (t =1)

{

Clock_t begin = clock ();

Int x = 0;

For (int I = 0; I <loop; ++ I)

{

X + = Sum1 (intPair (1, 2 ));

}

Clock_t end = clock ();

Return double (end-begin)/CLOCKS_PER_SEC;

}

Else if (t =2)

{

Clock_t begin = clock ();

Int x = 0;

IntPair p (1, 2 );

For (int I = 0; I <loop; ++ I)

{

X + = Sum1 (p );

}

Clock_t end = clock ();

Return double (end-begin)/CLOCKS_PER_SEC;

}

Else if (t =3)

{

Clock_t begin = clock ();

Int x = 0;

IntPair p (1, 2 );

For (int I = 0; I <loop; ++ I)

{

X + = Sum2 (p );

}

Clock_t end = clock ();

Return double (end-begin)/CLOCKS_PER_SEC;

}

Else if (t =4)

{

Clock_t begin = clock ();

Int x = 0;

IntPair p (1, 2 );

For (int I = 0; I <loop; ++ I)

{

X + = Sum3 (p );

}

Clock_t end = clock ();

Return double (end-begin)/CLOCKS_PER_SEC;

}

Else if (t =5)

{

Clock_t begin = clock ();

Int x = 0;

For (int I = 0; I <loop; ++ I)

{

X + = Sum3 (10 );

}

Clock_t end = clock ();

Return double (end-begin)/CLOCKS_PER_SEC;

}

Return 0;

}

We used five case columns and two call methods for Sum1 and Sum3 respectively, and one call Method for Sum2. We tested 0.1 million calls:

Double sec = sums (1, 100000 );

Printf ("Sum1 (use ctor) time: % f \ n", sec );

Sec = Sum (2, 100000 );

Printf ("Sum1 (use no C' tor) time: % f \ n", sec );

Sec = Sum (3, 100000 );

Printf ("Sum2 time: % f \ n", sec );

Sec = Sum (4, 100000 );

Printf ("Sum3 without conversion time: % f \ n", sec );

Sec = Sum (5, 100000 );

Printf ("Sum3 with conversion time: % f \ n", sec );

 

InVisual stidio 2010The result is:

 

Case 1 18 ms

Case 2 9 ms

Case 3 6 ms

Case 4 7 ms

Case 5 12 ms

That is to say: Use Case 1 and 5 are the slowest, And there is basically no difference between others.

Careful readers can easily see that,

1) The performance problem in case 5 is that Sum3 uses the implicit conversion of C ++ to automatically convert integers into temporary intPair variables. This is an application-layer problem. If we have to convert integers, we have to pay the performance cost.

2) The problem in Case 1 is similar to that in case 5 because temporary variables have to be created each time. Of course, constructor inline can be forced to reduce the generation cost of temporary variables.

3) Case 2 uses the self-compiled copy constructor before the function call. However, the intPair object is small, and the impact is negligible.

4) Case 3 has stable performance, but it uses an "indirect" method (For details, refer to reference Bock). Therefore, two more commands are generated than case 2. But it has little impact on performance. It is estimated that it is related to Intel's L1 and L2 caches.

* Note that if the OOP function only accesses data for this members, the cache can be fully utilized unless the object is too large.

5) Use Case 4 and Use Case 3 generate code exactly the same and there should be no difference. Const is only useful during compilation. The generated code has nothing to do with const or not.

There are too many performance issues. This article is just a bit of water, but it has already touched on two major performance risks of C ++:

A) temporary variables

B) Implicit conversion (silent conversion)

 

SeptemSeattle


The performance of internal sorting algorithms can be analyzed.

If you don't have time to write programs for you, you need to give me a thought. You should first copy the examples in the three sort books and write them into three functions, start and end of each function, get the system time, and subtract it. The overhead time is printed.

Completed in C language: 1. The performance of the internal Sorting Algorithm of the Harman encoding/decoder 2 can be analyzed.

I modified the program on the Internet and integrated it. You can see
# Include <stdio. h>
# Include <string. h>
# Include <stdlib. h>
# Define M 50
# Define MAX 100000;

Typedef struct
{
Int weight; // The node weight.
Int parent, lchild, rchild;
} HTNODE, * HUFFMANTREE;

Typedef char ** HUFFMANCODE; // dynamically allocates an array to store the Harman encoding table.

Typedef struct
{
Int key;/* keyword */
} RecordNode;/* type of the sorting node */

Typedef struct
{
RecordNode * record;
Int n;/* size of the sorting object */
} SortObject; // the sequence to be sorted.

HUFFMANTREE huffmantree (int n, int weight []) // construct a user-defined tree
{
Int m1, m2, k;
Int I, j, x1, x2;
HUFFMANTREE ht;
Ht = (HUFFMANTREE) malloc (2 * n) * sizeof (HTNODE ));
For (I = 1; I <(2 * n); I ++) // initialize the data of each node in the Harman tree. The value without an initial value is 0.
{
Ht [I]. parent = ht [I]. lchild = ht [I]. rchild = 0;
If (I <= n)
Ht [I]. weight = weight [I];
Else
Ht [I]. weight = 0;
}
For (I = 1; I <n; I ++) // select the smallest two trees in the forest to form a new tree.
{
M1 = m2 = MAX;
X1 = x2 = 0;
For (j = 1; j <(n + I); j ++)
{
If (ht [j]. weight <m1) & (ht [j]. parent = 0 ))
{
M2 = m1;
X2 = x1;
M1 = ht [j]. weight;
X1 = j;
}
Else if (ht [j]. weight <m2) & (ht [j]. parent = 0 ))
{
M2 = ht [j]. weight;
X2 = j;
}
}
K = n + I;
Ht [x1]. parent = ht [x2]. parent = k;
Ht [k]. weight = m1 + m2;
Ht [k]. lchild = x1;
Ht [k]. rchild = x2;
}
Return ht;
}

Void huffmancoding (int n, HUFFMANCODE hc, HUFFMANTREE ht, char str [])
{
Int I, start, child, father;
Char * cd;
Hc = (HUFFMANCODE) malloc (n + 1) * sizeof (char *); // assign a header pointer encoded with n characters
Cd = (char *) malloc (n * sizeof (char); // allocate the encoded Workspace
Cd [n-1] = '\ 0'; // encoding Terminator
F... the remaining full text>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.