CPU profiler User Guide

Source: Internet
Author: User

Address: http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html

The CPU profiler usage process consists of three steps: connecting to the application, running the code, and analyzing the output result.

1. link the library into the application

To use CPU profiler during execution, you need to add the parameter-lprofiler during code link.

You can also use ld_preload, e.g.%
Env ld_preload = "/usr/lib/libprofiler. So" <binary> (this method is not recommended)

In this way, we do not open the CPU profiler but insert code, so we will always add-lprofiler during the link process.

2. Running the code

There are multiple ways to execute Profile

Method 1: use the environment variable cpuprofile to specify the profile output result file. to specify the profile file/usr/local/bin/my_binary_compiled_with_libprofiler_so

E.g. % env cpuprofile =/tmp/mybin. Prof/usr/local/bin/my_binary_compiled_with_libprofiler_so

Method 2: use parentheses to define the code block of the profile and call the function:ProfilerStart()AndProfilerStop()The function is defined in: <Google/profiler. h>

For more information about the use of profile, see the description of the profiler header file.

In addition, some environment variables can be used to better control the CPU profiler.

For example, cpuprofile_frequency =XSampling frequency

Cpuprofile_realtime = 1This parameter is not set by default. After this parameter is set, itimer_real will be used to replace itimer_prof for profile, but this value is not accurate.

3. Analyzing the output

Pprof is a script used to analyze the profile. Run per15 before using pprof. If you want to output icons, install dot. If you want output in -- GV mode, install GV.

There are several ways to call pprof:

% pprof /bin/ls ls.prof                       Enters "interactive" mode% pprof --text /bin/ls ls.prof                       Outputs one line per procedure% pprof --gv /bin/ls ls.prof                       Displays annotated call-graph via 'gv'% pprof --gv --focus=Mutex /bin/ls ls.prof                       Restricts to code paths including a .*Mutex.* entry% pprof --gv --focus=Mutex --ignore=string /bin/ls ls.prof                       Code paths including Mutex but not string% pprof --list=getdir /bin/ls ls.prof                       (Per-line) annotated source listing for getdir()% pprof --disasm=getdir /bin/ls ls.prof                       (Per-PC) annotated disassembly for getdir()% pprof --text localhost:1234                       Outputs one line per procedure for localhost:1234% pprof --callgrind /bin/ls ls.prof                       Outputs the call information in callgrind format

Analyze callgrind output:

Use the kcachegrind tool to analyze the. callgrind output.

E.g. % pprof -- callgrind/bin/ls. Prof> ls. callgrind

% Kcachegrind ls. callgrind

Node information type output:

In various pprof chart output formats, the output is a call table with the execution time of each call function attached.


Each node represents a call relationship in the following format:

Class NameMethod Namelocal (percentage)of cumulative (percentage)

Local indicates the number of executions in the process body, while cumulative indicates the total number of executions in the function call process.

The percentage is determined by the size of the percentage. The purpose of this operation is to easily identify the bottleneck of the system execution, so as to visualize it.

The line with an arrow indicates the call relationship. E.g. vsnprintf total 18 times, of which _ io_old_unit3 Times, 6 times

Header metadata displayed:

/tmp/profiler2_unittest      Total samples: 202      Focusing on: 202      Dropped nodes with <= 1 abs(samples)      Dropped edges with <= 0 samples

The program name and total number of samples are given. If focus is on, the number of samples displayed in the set is displayed. The number of discarded nodes and edges is shown later.

Key output and discard:

You can configure pprof to get the output of the specified program fragment, and provide a regular expression. If a process call on the call stack satisfies the regular expression, the call process is output,

The rest are discarded.

E.g. Focus on vsnprintf; % pprof -- GV -- Focus = vsnprintf/tmp/profiler2_unittest test. Prof

Similarly, you can use the -- ignore option to determine which outputs are ignored.

By default, pprof runs in interactive mode. You can use help to obtain command parameters in this mode.

Output type settings:

--text Produces a textual listing. (Note: if you have an X display, anddotAndgvInstalled, you will probably be happier with--gvOutput .)
--gv Generates annotated call-graph, converts to postscript, and displays via GV (requresdotAndgvBe installed ).
--dot Generates the annotated call-graph in dot format and emits to stdout (requresdotBe installed ).
--ps Generates the annotated call-graph in postscript format and emits to stdout (requresdotBe installed ).
--pdf Generates the annotated call-graph in pdf format and emits to stdout (requiresdotAndps2pdfBe installed ).
--gif Generates the annotated call-graph in GIF format and emits to stdout (requresdotBe installed ).
--list=<regexp>

Outputs source-code listing of routines whose name matches <Regexp>. Each line in the listing is annotated with flat and cumulative sample counts.

In the presence of inlined CILS, the samples associated with inlined code tend to get assigned to a line that follows the location of the inlined call. A more precise accounting can be obtained by disconfiguring the routine using the -- disasm flag.

--disasm=<regexp> Generates disassembly of routines that match <Regexp>, annotated with flat and cumulative sample counts and emits to stdout.

Report granularity settings:

--addresses Produce one node per program address.
--lines Produce one node per source line.
--functions Produce one node per function (this is the default ).
--files Produce one node per source file.

Control Chart display format:

--nodecount=<n> This option controls the number of displayed nodes. the nodes are first sorted by decreasing cumulative count, and then only the top N nodes are kept. The default value is 80.
--nodefraction=<f> This option provides another mechanic for discarding nodes from the display. if the cumulative count for a node is less than this option's value multiplied by the total count for the profile, the node is dropped. the default value is 0.005; I. e. nodes
That account for less than half a percent of the total time are dropped. A node is dropped if either this condition is satisfied, or the -- nodecount condition is satisfied.
--edgefraction=<f> This option controls the number of displayed edges. first of all, an edge is dropped if either its source or destination node is dropped. otherwise, the edge is dropped if the sample count along the edge is less than this option's value multiplied by
Total count for the profile. The default value is 0.001; I. e., edges that account for less than 0.1% of the total time are dropped.
--focus=<re> This option controls what region of the graph is displayed based on the regular expression supplied with the option. for any path in the callgraph, we check all nodes in the path against the supplied regular expression. if none of the nodes match, the path
Is dropped from the output.
--ignore=<re> This option controls what region of the graph is displayed based on the regular expression supplied with the option. for any path in the callgraph, we check all nodes in the path against the supplied regular expression. if any of the nodes match, the path
Is dropped from the output.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.