Analysis of C + + performance (i.)
Performance problems are not just "technology" can be solved, it is often the framework, testing, assumptions and other comprehensive problems. However, for an engineer, must start from childhood, some "obvious" small problems to solve. Otherwise, the plot is much smaller, thousands of miles of levees, Yixue.
Why is C + + 's performance always after C (see the latest test results for websites like http://benchmarksgame.alioth.debian.org/u32/performance.php?test=binarytrees)? I think this is a 3-part reason:
1) The C + + compiler for testing does not use the latest optimization techniques
2) The added value of C + + is not considered in the test
3) The "subtlety" of the C + + application layer (which can refer to my other blogs about C + +) makes the general program shy and chooses "textbook use cases" so that some side effects are not removed at the application level.
Remember, more than 10 years ago, when I was developing in Microsoft, I consulted with the first C + + compiler author Lieberman Stan Lippman (then Microsoft VC + + architect) A series of our team's C + + performance challenges, and with his help, we used technologies such as INLINE,RVO in key locations, Completely solve the performance problem, but also find out the VC + + several small errors. I realize that most of the performance problems of C + + is that we have a shallow understanding of C + + and most of them are not difficult to solve.
Here's an example of a comparison to see how subtle details affect program performance.
struct Intpair
{
int ip1;
int opt;
Intpair (int i1, int i2): ip1 (I1), IP2 (I2) {}
Intpair (int i1): ip1 (I1), IP2 (I1) {}
};
Calc sum (usinh value semantic)
Int Sum1(Intpair p)
{
return p.ip1 + p.ip2;
}
Calc sum (usinh ref semantic)
int Sum2(Intpair &p)
{
return p.ip1 + p.ip2;
}
Calc sum (Usinh const ref semantic)
Int Sum3(const intpair& p)
{
return p.ip1 + p.ip2;
}
Above this simple struct, there are three Sum functions, doing exactly the same thing, but is the performance the same? We use the following procedure to test:
Double Sum (int t, int loop)
{
using namespace Std;
if (t = = 1)
{
clock_t begin = Clock ();
int x = 0;
for (int i = 0; I < loop; ++i)
{
x + = SUM1 (Intpair);
}
clock_t end = Clock ();
return double (end-begin)/clocks_per_sec;
}
else if (t = = 2)
{
clock_t begin = Clock ();
int x = 0;
Intpair p (n);
for (int i = 0; I < loop; ++i)
{
x + = SUM1 (p);
}
clock_t end = Clock ();
return double (end-begin)/clocks_per_sec;
}
else if (t = = 3)
{
clock_t begin = Clock ();
int x = 0;
Intpair p (n);
for (int i = 0; I < loop; ++i)
{
x + = Sum2 (p);
}
clock_t end = Clock ();
return double (end-begin)/clocks_per_sec;
}
else if (t = = 4)
{
clock_t begin = Clock ();
int x = 0;
Intpair p (n);
for (int i = 0; I < loop; ++i)
{
x + = SUM3 (p);
}
clock_t end = Clock ();
return double (end-begin)/clocks_per_sec;
}
else if (t = = 5)
{
clock_t begin = Clock ();
int x = 0;
for (int i = 0; I < loop; ++i)
{
x + = SUM3 (10);
}
clock_t end = Clock ();
return double (end-begin)/clocks_per_sec;
}
return 0;
}
We used 5 cases, the SUM1 and Sum3 wind do not use two methods of invocation, the Sum2 used a call mode. We tested 100,000 calls:
Double sec = Sum (1, 100000);
printf ("Sum1 (use ctor) time:%f \ n", sec);
SEC = Sum (2, 100000);
printf ("Sum1 (use no C ' Tor) time:%f \ n", sec);
SEC = Sum (3, 100000);
printf ("Sum2 time:%f \ n", sec);
SEC = Sum (4, 100000);
printf ("Sum3 without conversion time:%f \ n", sec);
SEC = Sum (5, 100000);
printf ("Sum3 with conversion time:%f \ n", sec);
We tested in Visualstidio , as a result:
Use Case 1 18ms
Use Case 2 9ms
Use Case 3 6ms
Use Case 4 7ms
Use Case 5 12ms
In other words: Use cases 1 and 5 are the slowest, others are basically no different.
The attentive reader is not hard to see,
1) The performance problem with use Case 5 is because SUM3 uses the C + + implicit conversion to automatically convert integers to Intpair. This is an application-level problem, and if we have to convert integers, we have to pay for this performance.
2) The problem with use Case 1 is similar to 5 because you have to create a temporary variable every time.
3) Use case 2 with VC + + input argument optimization optimization, exempt from the use of copy constructor, but I do not know whether all the compilation used this optimization. This optimization makes use case 2 performance is not bad with use case 3 how much.
4) The use Case 3 performance is stable, but it uses the "indirect" way (see my blog about reference for details), so the instruction generated is more than 2 two articles of use case. However, the impact on performance is small, and is estimated to be related to Intel's directive pipeline.
5) Use case 4 and use case 3 generate code exactly the same, there should be no difference. Const is only useful at compile time, and the generated code is not related to const or not.
The topic of performance is too much, this article is only dragonfly water, but has touched the C + + 's two biggest performance pitfalls:
- Temp variable
- Implicit conversion (silent conversion)
2014-6-20 Seattle