Take you for a spin. Visual studio--performance Analysis and optimization

Source: Internet
Author: User

Previous article take you to play the multi-threaded development of visual studio--vc++ the main usage of multi-threading in VC + +. Multithreading is an effective way to improve performance and solve concurrency problems. In the development of commercial programs, performance is an important indicator, the performance optimization of the program is also an important work.

Find Performance Bottlenecks

The 28 rule is suitable for many things: the most important one is only a small part, about 20%, and the remaining 80%, although the majority, is secondary. As in the code of the program, the 20% code (or even less) that determines the performance of the application. So in the optimization practice, we focus on optimizing the 20% most time-consuming code, which is 20% of the code is the program's performance bottleneck, mainly for this part of the code optimization.

Common optimization methods:

I don't write this part, just see the performance tuning strategy, because I don't have the confidence to write better than that.

If you do not want to understand this in depth, see the "C + + program Common Performance Tuning Method" This article is also good.

Application case

We use an application case to explain that it is not so dull and difficult to understand.

We know that the integers that can be divisible by 1 and itself are called prime numbers, assuming 1 to any integer n and to Sn (Sn=1+2+3+...+n). Now requires 10000 to 100000 between all prime numbers and SN.

Maybe you will think this problem is not so easy! Do not have the brain to think, bam a bit to write the code, the code is as follows:

#include <iostream>#include <windows.h>//define 64-bit shapingtypedef__int64 int64_t;//Gets the current time of the system, in microseconds (US)int64_t Getsystimemicros () {//From January 1, 1601 0:0:0:000 to January 1, 1970 0:0:0:000 Time (unit 100ns)#define EPOCHFILETIME (116444736000000000UL)FILETIME ft;    Large_integer Li; int64_t TT =0;    Getsystemtimeasfiletime (&ft); Li.    LowPart = Ft.dwlowdatetime; Li. Highpart = Ft.dwhighdatetime;//Number of microseconds (UTC time) from January 1, 1970 0:0:0:000 to presenttt = (li. Quadpart-epochfiletime)/Ten;returntt;}//Calculates the and of all integers between 1 and nint64_t Calculatesum (intN) {if(N <0)    {return-1; } int64_t sum =0; for(inti =0; I < n;    i++) {sum + = i; }returnsum;}//Determine if integer n is primeBOOLIsPrime (intN) {if(N <2)    {return false; } for(inti =2; I < n; i++) {if(N%i = =0)        {return false; }    }return true;}voidPrintprimesum () {int64_t startTime = Getsystimemicros ();intCount =0; int64_t sum =0; for(inti =10000; I <=100000; i++) {if(IsPrime (i)) {sum = calculatesum (i);STD::cout<< sum <<"\ T"; count++;if(Count%Ten==0)            {STD::cout<<STD:: Endl; }}} int64_t Usedtime = Getsystimemicros ()-startTime;intSecond = Usedtime/1000000; int64_t temp = usedtime%1000000;intMillise = temp/ +;intMicros = temp% +;STD::cout<<"Execution time:"<< Second <<"S"<< millise <<"'"<< Micros <<"'"<<STD:: Endl;}

Then a run, time consuming 9s 659 ' 552 "(9 seconds 659 milliseconds 552 microseconds). I think this is not the result you want (too slow), if you feel satisfied, then the following can not be seen.

VS Performance analysis tool selection for performance analysis tools

Open a session for profiling: Debug->start diagnotic Tools without Debugging (or press ALT+F2), VS2013 in the Analysis menu.


Performance analysis

CPU usage

Detects CPU performance and is primarily used to identify code that affects CPU bottlenecks (which consume a lot of CPU resources).

GPU Usage

Detects GPU performance, often used in graphics engine applications such as DirectX programs, primarily to determine CPU or GPU bottlenecks.

Memory Usage

Detects the memory of the application and discovers the memory.

Performance Wizard

Performance (Monitoring) wizard, which comprehensively detects the performance bottlenecks of the program. This is more commonly used, described below.

Performance (monitoring) wizard
    1. Specify the performance analysis method;

      Performance analysis Methods
      CPU Sampling (CPU sampling):
      Sample statistics to monitor high CPU-intensive applications at low overhead levels. This can greatly reduce the monitoring time for large computational procedures.
      Instrumentation (detection):
      Full statistics, measurement of function call count and elapsed time
      . NET memory allocation (. NET RAM allocation):
      Tracks managed memory allocations. This seems to be only available in managed code (such as C #), which is generally not as good as C + + code.
      Resource contention data (concurrency):
      Detects threads waiting on other threads, and is used for multiple-threaded concurrency.
    2. Select the module or application to be detected;
    3. Start the analysis program for monitoring.
Performance Analysis Report

After the program analysis is completed, an analysis report is generated, which is the result we need.


Overview of performance analysis reports

View Type

There are several different views available for us to switch between, and the bold part below is a view that people find more convenient and common.
Summary (Summary): Overview of the entire report
Call Tree: expands the relationship between functions in a tree-shaped table.
Module: Analyze the different program modules that are called, such as the time-consuming of different DLLs, LIB modules
Caller/callee (called and called): a value-based call and the called relationship
Functions (function Statistics): The execution time and number of executions statistics for each function displayed as a numeric value
Marks (Mark):
Processers (Process):
function Detials: A graphical representation of the calling function-the current function-the relationship between the called child functions and the time scale.



Call Tree

Function Details

function statistics

Special terminology

If this is the first time you read the report, you may not be able to read it. You need to know some special terms first (you can understand the call tree view and the functions view):
Num of Calls: Number of calls (function)
Elapsed Inclusive: Elapsed Inclusive Time
Elapsed Exclusive: Elapsed Exclusive Time
avg Elapsed Inclusive time: average Elapsed inclusive
avg Elapsed Exclusive time: AVG Elapsed Exclusive
module name, which is typically the name of an executable (. exe), dynamic library (. dll), Static library (. lib).

Maybe after reading you are still confused, as long as understand what is exclusive and non-exclusive you will understand.

What is exclusive versus non-exclusive

Inclusive sample count refers to the total execution time of the child function execution time
The exclusive sample count is the execution time of the function body that does not include the execution time of the child function, the time it takes for the function to execute itself, and does not include the time of the child (function) tree execution.

Solving application Case Problems

We have an overview of how VS2015 performance analysis tools are used. Now return to the essence and solve the problem of the application cases mentioned above.

1, we select the function detials view, starting from the root function according to the largest percentage of the item selection, until the selection of printprimesum, you can see such as:


Identify performance bottlenecks 1
We can see that IO accounts for more than 50% (49.4%+9.7%) of time, so IO is the biggest performance bottleneck. In fact, someone with some programming experience should be able to understand that it is time-consuming to output information in the console. We just need the results, not necessarily all the output in control (so it's inconvenient to see), we can save the results to a file, which is faster than outputting to the console.

Note: The time shown should be a percentage of the inclusive time.

If you know the bottleneck, change the code to optimize it:

voidPrintprimesum () {int64_t startTime = Getsystimemicros ();STD:: Ofstream outfile; Outfile.open ("D:\\test\\primesum.dat",STD:: Ios::out |STD:: Ios::app);intCount =0; int64_t sum =0; for(inti =10000; I <=100000; i++) {if(IsPrime (i))            {sum = calculatesum (i); outfile << sum <<"\ T"; count++;if(Count%Ten==0) {outfile <<STD:: Endl;    }}} outfile.close (); int64_t usedtime = Getsystimemicros ()-startTime;intSecond = Usedtime/1000000; int64_t temp = usedtime%1000000;intMillise = temp/ +;intMicros = temp% +;STD::cout<<"Execution time:"<< Second <<"S"<< millise <<"'"<< Micros <<"'"<<STD:: Endl;}

Execute again and find the time to decrease to: 3s 798 ' 218 ". The effect is obvious!

2, but this is not enough, continue to check other issues, the new code again with the Performance analysis tool to detect.


Identify performance bottlenecks 2

We find that the IsPrime function takes up 62% of the time, which should be a bottleneck, can we optimize it for arithmetic? To think about it, the method of finding prime numbers above is actually the dumbest way to optimize it a little bit:

//Determine if integer n is primeBOOLIsPrime (intN) {if(N <2)    {return false; }if(n = =2)    {return true; }//Remove the multiples of 2    if(n%2==0)    {return false; }//Can not be divided by a number less than the root of N, is a prime     for(inti =3; I*i <= N; i + =2)    {if(n% i = =0)        {return false; }    }return true;}

Once again, it was found that the time was reduced to: 1s 312 ' 75 ", almost half the time lost.

3, this is still a bit slow, and then see if it can be optimized. Check with the new code again using the profiling tool.


Identify performance bottlenecks 2

The Calculatesum function accounted for 88.5% of the time, which is definitely the main factor affecting the current program performance. On it. Think about it, ask for 1 to N and in fact is to ask 1, 2, 3 ... N of the arithmetic progression and. The optimization code is as follows:

// 计算1到n之间所有整数的和int64_t CalculateSum(int n){    if0)    {        return -1;    }    //(n * (1 + n)) / 2    return ( n * (11;}

Once again, it is found that the time is reduced to: 0s 91 ' 6 ", within a second, can basically meet the requirements of the sub.

Summarize

Program Performance tuning is the process of making a few improvements on top of it, until it meets the requirements of the application. It solves the problem by using only one statistic indicator of the view (the percentage of time that each function takes to total time). For large complex applications, we can result in multiple views of multiple statistical indicators for comprehensive judgment, to find out the bottleneck of program performance!

Previous review:
Take you to the multithreaded development of visual studio--vc++

What to tell NEXT:
Take you to the Visual studio--Unit test

Take you for a spin. Visual studio--performance Analysis and optimization

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.