Address: http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html
The CPU profiler usage process consists of three steps: connecting to the application, running the code, and analyzing the output result.
1. link the library into the application
To use CPU profiler during execution, you need to add the parameter-lprofiler during code link.
You can also use ld_preload, e.g.%
Env ld_preload = "/usr/lib/libprofiler. So" <binary> (this method is not recommended)
In this way, we do not open the CPU profiler but insert code, so we will always add-lprofiler during the link process.
2. Running the code
There are multiple ways to execute Profile
Method 1: use the environment variable cpuprofile to specify the profile output result file. to specify the profile file/usr/local/bin/my_binary_compiled_with_libprofiler_so
E.g. % env cpuprofile =/tmp/mybin. Prof/usr/local/bin/my_binary_compiled_with_libprofiler_so
Method 2: use parentheses to define the code block of the profile and call the function:ProfilerStart()
AndProfilerStop()
The function is defined in: <Google/profiler. h>
For more information about the use of profile, see the description of the profiler header file.
In addition, some environment variables can be used to better control the CPU profiler.
For example, cpuprofile_frequency =XSampling frequency
Cpuprofile_realtime = 1This parameter is not set by default. After this parameter is set, itimer_real will be used to replace itimer_prof for profile, but this value is not accurate.
3. Analyzing the output
Pprof is a script used to analyze the profile. Run per15 before using pprof. If you want to output icons, install dot. If you want output in -- GV mode, install GV.
There are several ways to call pprof:
% pprof /bin/ls ls.prof Enters "interactive" mode% pprof --text /bin/ls ls.prof Outputs one line per procedure% pprof --gv /bin/ls ls.prof Displays annotated call-graph via 'gv'% pprof --gv --focus=Mutex /bin/ls ls.prof Restricts to code paths including a .*Mutex.* entry% pprof --gv --focus=Mutex --ignore=string /bin/ls ls.prof Code paths including Mutex but not string% pprof --list=getdir /bin/ls ls.prof (Per-line) annotated source listing for getdir()% pprof --disasm=getdir /bin/ls ls.prof (Per-PC) annotated disassembly for getdir()% pprof --text localhost:1234 Outputs one line per procedure for localhost:1234% pprof --callgrind /bin/ls ls.prof Outputs the call information in callgrind format
Analyze callgrind output:
Use the kcachegrind tool to analyze the. callgrind output.
E.g. % pprof -- callgrind/bin/ls. Prof> ls. callgrind
% Kcachegrind ls. callgrind
Node information type output:
In various pprof chart output formats, the output is a call table with the execution time of each call function attached.
Each node represents a call relationship in the following format:
Class NameMethod Namelocal (percentage)of cumulative (percentage)
Local indicates the number of executions in the process body, while cumulative indicates the total number of executions in the function call process.
The percentage is determined by the size of the percentage. The purpose of this operation is to easily identify the bottleneck of the system execution, so as to visualize it.
The line with an arrow indicates the call relationship. E.g. vsnprintf total 18 times, of which _ io_old_unit3 Times, 6 times
Header metadata displayed:
/tmp/profiler2_unittest Total samples: 202 Focusing on: 202 Dropped nodes with <= 1 abs(samples) Dropped edges with <= 0 samples
The program name and total number of samples are given. If focus is on, the number of samples displayed in the set is displayed. The number of discarded nodes and edges is shown later.
Key output and discard:
You can configure pprof to get the output of the specified program fragment, and provide a regular expression. If a process call on the call stack satisfies the regular expression, the call process is output,
The rest are discarded.
E.g. Focus on vsnprintf; % pprof -- GV -- Focus = vsnprintf/tmp/profiler2_unittest test. Prof
Similarly, you can use the -- ignore option to determine which outputs are ignored.
By default, pprof runs in interactive mode. You can use help to obtain command parameters in this mode.
Output type settings:
--text |
Produces a textual listing. (Note: if you have an X display, anddot Andgv Installed, you will probably be happier with--gv Output .) |
--gv |
Generates annotated call-graph, converts to postscript, and displays via GV (requresdot Andgv Be installed ). |
--dot |
Generates the annotated call-graph in dot format and emits to stdout (requresdot Be installed ). |
--ps |
Generates the annotated call-graph in postscript format and emits to stdout (requresdot Be installed ). |
--pdf |
Generates the annotated call-graph in pdf format and emits to stdout (requiresdot Andps2pdf Be installed ). |
--gif |
Generates the annotated call-graph in GIF format and emits to stdout (requresdot Be installed ). |
--list=<regexp> |
Outputs source-code listing of routines whose name matches <Regexp>. Each line in the listing is annotated with flat and cumulative sample counts. In the presence of inlined CILS, the samples associated with inlined code tend to get assigned to a line that follows the location of the inlined call. A more precise accounting can be obtained by disconfiguring the routine using the -- disasm flag. |
--disasm=<regexp> |
Generates disassembly of routines that match <Regexp>, annotated with flat and cumulative sample counts and emits to stdout. |
Report granularity settings:
--addresses |
Produce one node per program address. |
--lines |
Produce one node per source line. |
--functions |
Produce one node per function (this is the default ). |
--files |
Produce one node per source file. |
Control Chart display format:
--nodecount=<n> |
This option controls the number of displayed nodes. the nodes are first sorted by decreasing cumulative count, and then only the top N nodes are kept. The default value is 80. |
--nodefraction=<f> |
This option provides another mechanic for discarding nodes from the display. if the cumulative count for a node is less than this option's value multiplied by the total count for the profile, the node is dropped. the default value is 0.005; I. e. nodes That account for less than half a percent of the total time are dropped. A node is dropped if either this condition is satisfied, or the -- nodecount condition is satisfied. |
--edgefraction=<f> |
This option controls the number of displayed edges. first of all, an edge is dropped if either its source or destination node is dropped. otherwise, the edge is dropped if the sample count along the edge is less than this option's value multiplied by Total count for the profile. The default value is 0.001; I. e., edges that account for less than 0.1% of the total time are dropped. |
--focus=<re> |
This option controls what region of the graph is displayed based on the regular expression supplied with the option. for any path in the callgraph, we check all nodes in the path against the supplied regular expression. if none of the nodes match, the path Is dropped from the output. |
--ignore=<re> |
This option controls what region of the graph is displayed based on the regular expression supplied with the option. for any path in the callgraph, we check all nodes in the path against the supplied regular expression. if any of the nodes match, the path Is dropped from the output. |