GProf uses an unusually simple but very efficient way to optimize C + + + + + + programs and easily identify code that is worth optimizing. A simple case study will show how Gprof optimizes the actual application from 3 minutes of runtime to 5 seconds by identifying and optimizing two key data structures.
This program dates back to the 1982 special discussion conference on compiler building (the SIGPLAN Symposium on Compiler Construction). Now this program is a standard tool on a variety of Unix platforms.
Profiling in a nutshell
The concept of program profiling is simple: by recording the call and end times of each function, we can calculate the program segment of the program's maximum runtime. It sounds like a lot of effort--luckily, we're not far from the truth! We just need to compile with GCC with an extra parameter ('-PG '), run this (compiled) program (to collect relevant data for the program profiling), and then run ' gprof ' to make it easier to analyze the results.
Case study: Pathalizer
I used a real-world program to use as an example, part of the Pathalizer: Event2dot, a path "event" description file into a graphical "dot" Documentation Tool (Executablewhichtranslatesapathalizer ' events ' filetoagraphviz ' dot ' file).
To put it simply, it reads events from one file and then saves them as images (with pages as nodes and as edges between pages), and then consolidates the images into a large graphic and saves them as graphical ' dot ' format files.
Time to program
first let's take a look at how long it takes to run the program without optimizing it. Using Event2dot on my computer and the example in the source code as input (about 55000 of the data), roughly three minutes:
real3m36.316s
user0m55.590s
sys0m1.070s
Program Analysis
to use gprof for profile analysis, add the '-pg ' option at compile time, we are the following recompile source code as follows:
g++- Pgdotgen.cppreadfile.cppmain.cppgraph.cppconfig.cpp-oevent2dot
Now we can run the Event2dot again and use the test data we used earlier. When we run this time, the profiling data that Event2dot runs is collected and stored in the ' gmon.out ' file, and we can view the results by running ' gprofevent2dot|less '.
Gprof will show the following functions as important:
%cumulativeselfselftotal
Timesecondssecondscallss/calls/callname
43.3246.0346.033399529890.000.00CompareNodes (node*,node*)
25.0672.6626.63550000.000.00getNode (char*, nodelistnode*&)
16.8090.5117.853394333740.000.00CompareEdges (edge*,annotatededge*)
12.70104.0113.50519870.000.00addAnnotatedEdge (annotatedgraph*,edge*)
1.98106.112.10519870.000.00addEdge ( graph*,node*,node*)
0.07106.180.0710.070.07FindTreshold (annotatededge*,int)
0.06106.240.0610.0628.79getGraphFromFile (char*,nodelistnode*&,config*)
0.02106.260.0210.0277.40summarize (graphlistnode*,config*)
0.00106.260.00550000.000.00FixName (char*)
to see , the first function is more important: most of the running time in the program is occupied by it.
Optimization
The above results show that most of the time spent on the Comparenodes function, using grep to see the comparenodes was only compareedges called once, Compareedges are only called by Addannotatededge-they all appear in the list above. This is where we should do some optimization.
We notice that the Addannotatededge traverses a linked list. Although linked lists are easy to implement, they are not the best data types. We decided to replace the list g->edges with a binary tree: This would make the lookup faster.
Results
Now let's look at the results of the optimized operation:
Real2m19.314s
user0m36.370s
sys0m0.940s
second time
run gprof again to analyze:
%cumulativeselfselftotal
Timesecondssecondscallss/calls/callname
87.0125.2525.25550000.000.00getNode (char*,nodelistnode*&)
10.6528.343.09519870.000.00addEdge (graph*,node*,node*)
A function that seems to occupy a lot of the runtime now is no longer the big head of the run time! Let's try to optimize it again: Using a node hash table to replace the node tree.
This is a huge improvement:
real0m3.269s
user0m0.830s
sys0m0.090s
other C + + program Analyzer
There are many other parsers that can use gprof data, such as
Kprof (screenshots) and cgprof. Although the graphical interface looks more comfortable, I personally think that command line gprof is more convenient to use.
analysis of programs in other languages
Here we introduce the use of gprof to analyze the C + + program, as can be done in other languages: For Perl, we can use Devel::D prof Modules. Your program should start with perl-d:dprofmycode.pl and use DPROFPP to view and analyze the results. If you can compile your Java program with GCJ, you can also use Gprof, yet only single threaded Java code is currently supported.
Conclusions
as we've already seen, we can use program profiling to quickly find a program where it's worth optimizing. Where optimization is desirable, we can reduce the runtime of a program from 3 minutes 36 seconds to less than 5 seconds, as seen from the example above.