C + + Performance profiling (i), Profiling _php Tutorials

Source: Internet
Author: User

C + + Performance profiling (i), profiling


Analysis of C + + performance (i.)

Performance problems are not just "technology" can be solved, it is often the framework, testing, assumptions and other comprehensive problems. However, for an engineer, must start from childhood, some "obvious" small problems to solve. Otherwise, the plot is much smaller, thousands of miles of levees, Yixue.

Why is C + + 's performance always after C (see the latest test results for websites like http://benchmarksgame.alioth.debian.org/u32/performance.php?test=binarytrees)? I think this is a 3-part reason:

1) The C + + compiler for testing does not use the latest optimization techniques

2) The added value of C + + is not considered in the test

3) The "subtlety" of the C + + application layer (which can refer to my other blogs about C + +) makes the general program shy and chooses "textbook use cases" so that some side effects are not removed at the application level.

Remember, more than 10 years ago, when I was developing in Microsoft, I consulted with the first C + + compiler author Lieberman Stan Lippman (then Microsoft VC + + architect) A series of our team's C + + performance challenges, and with his help, we used technologies such as INLINE,RVO in key locations, Completely solve the performance problem, but also find out the VC + + several small errors. I realize that most of the performance problems of C + + is that we have a shallow understanding of C + + and most of them are not difficult to solve.

Here's an example of a comparison to see how subtle details affect program performance.

struct Intpair

{

int ip1;

int ip2;

Intpair (int i1, int i2): ip1 (I1), IP2 (I2) {}

Intpair (int i1): ip1 (I1), IP2 (I1) {}

};

Calc sum (usinh value semantic)

Int Sum1(Intpair p)

{

return p.ip1 + p.ip2;

}

Calc sum (usinh ref semantic)

int Sum2(Intpair &p)

{

return p.ip1 + p.ip2;

}

Calc sum (Usinh const ref semantic)

Int Sum3(const intpair& p)

{

return p.ip1 + p.ip2;

}

Above this simple struct, there are three Sum functions, doing exactly the same thing, but is the performance the same? We use the following procedure to test:

Double Sum (int t, int loop)

{

using namespace Std;

if (t = = 1)

{

clock_t begin = Clock ();

int x = 0;

for (int i = 0; I < loop; ++i)

{

x + = SUM1 (Intpair);

}

clock_t end = Clock ();

return double (end-begin)/clocks_per_sec;

}

else if (t = = 2)

{

clock_t begin = Clock ();

int x = 0;

Intpair p (n);

for (int i = 0; I < loop; ++i)

{

x + = SUM1 (p);

}

clock_t end = Clock ();

return double (end-begin)/clocks_per_sec;

}

else if (t = = 3)

{

clock_t begin = Clock ();

int x = 0;

Intpair p (n);

for (int i = 0; I < loop; ++i)

{

x + = Sum2 (p);

}

clock_t end = Clock ();

return double (end-begin)/clocks_per_sec;

}

else if (t = = 4)

{

clock_t begin = Clock ();

int x = 0;

Intpair p (n);

for (int i = 0; I < loop; ++i)

{

x + = SUM3 (p);

}

clock_t end = Clock ();

return double (end-begin)/clocks_per_sec;

}

else if (t = = 5)

{

clock_t begin = Clock ();

int x = 0;

for (int i = 0; I < loop; ++i)

{

x + = SUM3 (10);

}

clock_t end = Clock ();

return double (end-begin)/clocks_per_sec;

}

return 0;

}

We used 5 cases, the SUM1 and Sum3 wind do not use two methods of invocation, the Sum2 used a call mode. We tested 100,000 calls:

Double sec = Sum (1, 100000);

printf ("Sum1 (use ctor) time:%f \ n", sec);

SEC = Sum (2, 100000);

printf ("Sum1 (use no C ' Tor) time:%f \ n", sec);

SEC = Sum (3, 100000);

printf ("Sum2 time:%f \ n", sec);

SEC = Sum (4, 100000);

printf ("Sum3 without conversion time:%f \ n", sec);

SEC = Sum (5, 100000);

printf ("Sum3 with conversion time:%f \ n", sec);

We tested in Visualstidio , as a result:

Use Case 1 18ms

Use Case 2 9ms

Use Case 3 6ms

Use Case 4 7ms

Use Case 5 12ms

In other words: Use cases 1 and 5 are the slowest, others are basically no different.

The attentive reader is not hard to see,

1) The performance problem with use Case 5 is because SUM3 uses the implicit conversion of C + + to automatically convert integers to intpair temporary variables. This is an application-level problem, and if we have to convert integers, we have to pay this performance price.

2) The problem with use Case 1 is similar to 5 because you have to create a temporary variable every time. Of course, you can force constructor inline to reduce the cost of generating temporary variables.

3) Use case 2 uses the copy constructor that was compiled before the function call, but because Intpair object is small, the impact can be negligible.

4) The use Case 3 performance is stable, but it uses the "indirect" way (see my blog about reference for details), so the instruction generated is more than 2 two articles of use case. However, the impact on performance is small and is estimated to be related to Intel's L1,L2 cache.

* Note that if the OOP function only accesses data to the members of this, it is generally possible to make full use of the cache unless the object is too large.

5) Use case 4 and use case 3 generate code exactly the same, there should be no difference. Const is only useful at compile time, and the generated code is not related to const or not.

The topic of performance is too much, this article is only dragonfly water, but has touched the C + + 's two biggest performance pitfalls:

A) temporary variables

b) Implicit conversion (silent conversion)

2014-6-20 Seattle


The nature of the internal sorting algorithm can be analyzed

There's no time for you to write a program. You have to say a way of thinking about it. You first copy the 3 sorts of books to write down the example of 3 functions, each function inside the beginning and end of the system time to subtract the last time the cost is printed out

Complete in C: 1 Huffman Code/Decoder 2 The nature of the internal sorting algorithm can be analyzed

I changed the online program a bit, and integrated it, you see
#include
#include
#include
#define M 50
#define MAX 100000;

typedef struct
{
int weight;//Node Weight value
int parent,lchild,rchild;
}htnode,*huffmantree;

typedef char** huffmancode;//Dynamic Allocation array Store Huffman coding table

typedef struct
{
int key; /* Keyword */
}recordnode; /* Sort the type of node */

typedef struct
{
Recordnode *record;
int n; /* The size of the Sort object */
}sortobject; Sequence to sort

Huffmantree huffmantree (int n,int weight[])//Build Huffman Tree
{
int m1,m2,k;
int i,j,x1,x2;
Huffmantree HT;
ht= (Huffmantree) malloc ((2*n) *sizeof (Htnode));
For (i=1;i< (2*n); i++)//Initialize the data for each node in the Huffman tree, with no initial value assigned to 0
{
Ht[i].parent=ht[i].lchild=ht[i].rchild=0;
if (i<=n)
Ht[i].weight=weight[i];
Else
ht[i].weight=0;
}
for (i=1;i <>
{
M1=m2=max;
x1=x2=0;
For (j=1;j< (n+i); j + +)
{
if ((ht[j].weight <>
{
M2=M1;
x2=x1;
M1=ht[j].weight;
X1=j;
}
else if ((ht[j].weight <>
{
M2=ht[j].weight;
X2=j;
}
}
K=n+i;
Ht[x1].parent=ht[x2].parent=k;
ht[k].weight=m1+m2;
ht[k].lchild=x1;
ht[k].rchild=x2;
}
return HT;
}

void huffmancoding (int n,huffmancode hc,huffmantree Ht,char str[])
{
int i,start,child,father;
Char *CD;
Hc= (Huffmancode) malloc ((n+1) *sizeof (char*));//assigning N-character encoded head pointers
Cd= (char*) malloc (n*sizeof (char));//assignment of working space for coding
cd[n-1]= ' + ';//Encode Terminator
F...... Remaining full text >>

http://www.bkjia.com/PHPjc/867005.html www.bkjia.com true http://www.bkjia.com/PHPjc/867005.html techarticle C + + performance profiling (i), profiling C + + performance profiling (a) performance issues are not just technical solutions, it is often architecture, testing, assumptions and other comprehensive challenges. But .....

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.