Use GProf to optimize your C/C ++ Program

Source: Internet
Author: User

  GProfA simple but effective method is used for optimization.C ++/C++ProgramAnd can easily identifyCode. A simple case analysis will show how GProf optimizes the number of applications in an application from 3 minutes to 5 seconds by identifying and optimizing two key data structures.
This program dates back to the special discussion Conference on Compiler building in 1982 (the SIGPLAN Symposium on Compiler Construction ). Now this program has become a standard tool on Various UNIX platforms.


Profiling in a nutshell
The concept of Program Summary analysis is very simple: by recording the call and end time of each function, we can calculate the maximum runtime program segment of the program. This method seems to take a lot of effort-fortunately, we are not far from the truth! We only need to add an additional parameter ('-pg') when compiling with gcc to run this (Compiled) Program (to collect the data related to program summary analysis ), then run 'gprof' to analyze these results more conveniently.

  Case study: Pathalizer
I used a program in reality as an example. It is part of pathalizer: event2dot, A tool that converts the path "Event" description file into a graphical "dot" file (executablewhichtranslatesapathalizer 'events' filetoagraphviz 'dot 'file ).
Simply put, it reads various events from a file and saves them as images (using pages as nodes and changing pages as edges ), these images are then integrated into a large image and saved as a graphical 'dot 'file.


Program timing
Let's show us how long they will take to run our unoptimized programs. Using event2dot on my computer and using the example in the source code as the input (about 55000 of the data) takes about three minutes:
Real3m36.316s
User0m55.590s
Sys0m1.070s


Program Analysis
To use gprof for summary analysis, you must add the '-pg' option during compilation. We will re-compile the source code as follows:
G ++-pgdotgen. cppreadfile. cppmain. cppgraph. cppconfig. cpp-oevent2dot
Now we can run event2dot again and use the test data we used earlier. During this operation, the analysis data run by event2dot will be collected and saved in the 'gmon. out' file. You can run 'gprofevent2dot | less 'to view the result.
Gprof will show that the following functions are important:
% Cumulativeselfselftotal
Timesecondssecondscallss/CILS/callname
43.3246.0346.033399529890.000.00CompareNodes (Node *, Node *)
25347672.6626.63550000.000.00getnode (char *, NodeListNode *&)
16.8090.5117.853394333740.000.00CompareEdges (Edge *, AnnotatedEdge *)
12.70104.0113.50519870.000.00addAnnotatedEdge (AnnotatedGraph *, Edge *)
1.98106.112.10519870.000.00addEdge (Graph *, Node *, Node *)
0.07106.180.0710.070.07FindTreshold (AnnotatedEdge *, int)
0.06106.240.0610.0628.79getGraphFromFile (char *, NodeListNode * &, Config *)
0.02106.260.0210.0277.40summarize (GraphListNode *, Config *)
0.00106.260.00552.16.000.00fixname (char *)
It can be seen that the first function is very important: it occupies most of the running tasks in the program.

  Optimization
The above results show that this program spends most of its time on the CompareNodes function. After you use grep to check it, we find that CompareNodes is called only once by CompareEdges, compareEdges is called only by addAnnotatedEdge-they all appear in the above list. Here is where we should optimize it!
We noticed that addAnnotatedEdge traversed a linked list. Although the linked list is easy to implement, it is not the best data type. We decided to replace the linked list g-> edges with a binary tree: this will make the search faster.

  Result
Now let's take a look at the optimized running results:
Real2m19.314s
User0m36.370s
Sys0m0.940s


Second time
Run gprof again for analysis:
% Cumulativeselfselftotal
Timesecondssecondscallss/CILS/callname
87.0125.2525.25552.16.000.00getnode (char *, NodeListNode *&)
10.6528.343.09519870.000.00addEdge (Graph *, Node *, Node *)
It seems that the functions that used to occupy a large amount of runtime are no longer the biggest ones in runtime! Let's try to optimize it: Replace the node tree with a node hash table.
This is a huge improvement:
Real0m3.269s
User0m0.830s
Sys0m0.090s


Other C/C ++ program analyzers
There are many other analyzer data that can use gprof, such


KProf (screenshot) and cgprof. Although the graphic interface looks more comfortable, I personally think that gprof is more convenient to use.


Analyze programs in other languages
Here we will introduce how to use gprof to analyze C/C ++ programs. For other languages, we can do the same: For Perl, we can use Devel: DProf module. Your program should start with perl-d: DProfmycode. pl, and use dprofpp to view and analyze the results. If you can use gcj to compile your Java program, you can also use gprof. However, currently, only single-threaded Java code is supported.


Conclusion
As we have seen, we can use program summary analysis to quickly find a place worthy of optimization in the program. To optimize the program, we can reduce the running time of a program from 3 minutes 36 seconds to less than 5 seconds, as shown in the preceding example.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.