Introduction to c ++ Performance Optimization Technology & lt; 1 & gt;

Source: Internet
Author: User
Tags valgrind

 

By rushing out of the universe

2011-8-24

 

1. Performance Optimization Principle

Before talking about performance optimization technology, we must clarify some points. The first point is that there must be well-written code that is difficult to optimize because of messy code (such as lack of comments and fuzzy naming. The second is a good architectural design. performance optimization can only optimize a single program, but cannot optimize a poor architecture. However, as long as the network is so developed, as long as it is not a framework of your own imagination, as long as you actively analyze others' successful architectures, you will hardly encounter a poor architecture.

 

 

 

1.1. number of calls and time consumption of computing functions and code segments

You can use a simple counter to call a function. A more general framework may be to maintain a global count. Each time you enter a function or code segment, increase the corresponding count of the storage by 1.

To accurately calculate the time consumed by a piece of code, we need extremely high-precision time functions. Gettimeofday is a good choice. It has a precision of 1 US and can be called several hundred thousand times per second. Note that the modern cpu can process commands of G per second, so the cpu in 1 US can process thousands or even tens of thousands of commands. For a function with less than lines of code, the execution time of a single function may be less than 1 us. The most accurate timing method is the instruction provided by the cpu itself: rdtsc. It can be accurate to a clock cycle (one command consumes several cpu cycles ).

 

 

 

We have noticed that when the system schedules a program, it may put the program on a different cpu core to run, and the running cycle on each cpu core is different, resulting in the adoption of rdtsc, the calculation result is incorrect. The solution is to call sched_setaffinity of the linux system to force the process to run only on a fixed cpu core.

Reference code for time-consuming computing:

// Generally, the computing code is time-consuming.

Uint64_t preTime = GetTime ();

// Code segment

Uint64_t timeUsed = GetTime ()-preTime;

 

// Improved Calculation Method

Struct TimeHelper {

Uint64_t preTime;

TimeHelper (): preTime (GetTime ())

{}

~ TimeHelper (){

G_timeUsed = GetTime ()-preTime;

}

};

// Call

{

TimeHelper th;

// Code segment

}

// G_timeUsed saves time

 

// Obtain the cpu tick count, cpuid (reset clock cycle) consumes about 300 cycles (if you do not need a particularly precise precision, you can not execute cpuid

Inline uint64_t GetTickCPU ()

{

Uint32_t op; // input: eax

Uint32_t eax; // output: eax

Asm volatile (

"Pushl % ebx \ n \ t"

"Cpuid \ n \ t"

"Popl % ebx \ n \ t"

: "= A" (eax): "a" (op): "cc ");

 

Uint64_t ret;

Asm volatile ("rdtsc": "= A" (ret ));

Return ret;

}

 

// Obtain the cpu clock speed. This function takes 0.01 seconds for the first call.

Inline uint64_t GetCpuTickPerSecond ()

{

Static uint64_t ret = 0;

If (ret = 0)

{

Const uint64_t gap = 1000000/100;

Uint64_t endTime = GetTimeUS () + gap;

Uint64_t curTime = 0;

Uint64_t tickStart = GetTickCPU ();

Do {

CurTime = GetTimeUS ();

} While (curTime <endTime );

Uint64_t tickCount = GetTickCPU ()-tickStart;

Ret = tickCount * 1000000L/(curTime-endTime + gap );

}

Return ret;

}

 

1.2 Other policies

In addition to the basic computing execution times and time, there are several Performance Analysis Strategies as follows:

A. Probability-based

By constantly interrupting the program, you can view the function where the program is interrupted. The most frequently occurring function is the most time-consuming function.

B. Event-based

When a cpu Hardware event occurs, some CPUs will notify the process. If the event includes how many times L1 becomes invalid, we can know the cause of slow program running.

C. Avoid interference

External interference is the most taboo in performance testing. For example, if the memory is insufficient, the READ memory becomes a disk operation.

 

 

 

1.3 Performance Analysis Tool-callgrind

Valgrind is the most common tool in linux because it is free of charge. Callgrind is a member of the valgrind tool. Its main function is to simulate the cpu cache, which can calculate the effective and invalid times of multi-level cache and the statistics on the call time of each function.

The implementation mechanism of callgrind (based on external interruptions) determines that it has many shortcomings. For example, the program may be severely slowed down, highly optimized programs are not supported, and the results of time-consuming statistics may have a large error.

 

 

We have compiled a simple test program to test common performance analysis tools. The Code is as follows:

// Calculate the maximum number of public approx.

Inline int gcd (int m, int n)

{

PERFOMANCE ("gcd"); // define of global computing time consumption

Int d = 0;

Do {

D = m % n;

M = n;

N = d;

} While (d> 0 );

 

Return m;

}

// Main Function

Int main (){

Int g = 0;

Uint64_t pretime = GetTickCPU ();

For (int idx = 1; idx <1000000; idx ++)

G + = gcd (1234134, idx );

Uint64_t time = GetTickCPU ()-pretime;

 

Printf ("% d, % lld \ n", g, time );

Return 0;

}

 

The callgrind running result is as follows:

 

 

 

We analyze the output results using callgrind in windows and obtain the following results:

 

 

 

1.4.g ++ Performance Analysis

Gprof is a gnu profile tool that comes with g ++ ). It embeds code into each function to calculate the function time consumption. It should be effective for highly optimized code, but it is actually unfriendly to-O2 code, which may be related to its implementation location (after code optimization ). The principle of gprof determines that it has little impact on the program.

 

 

Is the result of gprof check for the same program:

 

 

We can see that this result is much more accurate than that calculated by callgrind.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.