Performance analysis under Linux profiling (dynamic)

Source: Internet
Author: User
Tags systemtap valgrind

Profiling is a alternative to benchmarking that's often more effective, as it gives your more fine grained measurements f Or the components of the system you ' re measuring, thus minimising external influences from consideration. It also gives the relative cost of various components, further discounting external influence.

As a consequence of giving more fine grained information for a component, profiling are really just a special case of Monit Oring and often uses the same infrastructure. As systems become more complex, it's becoming increasingly important to know what monitoring tools is available. By the the-the-the-being able-to-drill down-to-software components as I-LL describe below, is a large advantage that open syste Ms has over closed ones.

Gnu/linux Profiling and monitoring tools is currently progressing rapidly, and is in some flux, but I ' ll summarise the R Eadily available utils below.

System Wide Profiling

The Linux kernel have recently implemented a very useful perf infrastructure for profiling various CPU and software events. To get the perf command, install Linux-tools-common on Ubuntu, linux-base on Debian, perf- Utils on ArchLinux, or perf on Fedora. Then you can profiles the system like:

$ perf record-a-G sleep  # Record system for 10s# display report

That would display this handy curses interface in basically any hardware platform, which can use to drill down to the A Rea of interest.

See Brendan Gregg's perf examples for a further up to date and detailed exploration of perf ' s capabilities.

Other system wide profiling tools to consider is sysprof and oprofile.

It's worth noting that profiling can is problematic on x86_64 at least, due to-fno-omit-frame-pointer being removed Crease performance, and a-bit fedora at least is going the same.

Application level Profiling

One can use perf to profiles a particular command too, with a variant of the above, like perf record-g $command, Or see Ingo Molnar's example of using perf to analyze the string comparison bottneck in ' Git GC '. There is other useful userspace tools available though. Here are an example profiling Ulc_casecoll, where the graphical profiles below is generated using the following commands.

Valgrind--tool=callgrind./a.outkcachegrind callgrind.out.*

Note Kcachegrind is part of the ' KDESDK ' package on my Fedora system, and can be used to read oprofile data (mentioned ABO ve) or profile Python code too.

Profiling Hardware Events

I ' ve detailed previously how important, efficient use of the memory hierarchy are for performance. Newer CPUs is providing counters to help tune your use of this hierarchy, and the previously mentioned Linux perf Tools, expose this well. Unfortunately my pentium-m laptop doesn ' t expose any cache counters, but the following example from Ingo Molnar, shows how Useful this technique can is.

Static char Array[1000][1000];int main (void) {  int i, J;  for (i = 0; i < K; i++) for    (j = 0; J < N; j + +)       array[j][i]++;  return 0;}

On hardware this supports enumerating cache hits and misses, you can run:

$ perf Stat--repeat 10-e cycles:u-E instructions:u-e l1-dcache-loads:u   -e l1-dcache-load-misses:u./a.outperforman Ce counter stats for './a.out ' (runs):        6,719,130 cycles:u                   (+-   0.662%)        5,084,792 instructions:u           #      0.757 IPC     (+-   0.000%)        1,037,032 l1-dcache-loads:u          (+-   0.009%)         1,003,604 l1-dcache-load-misses:u    (+-   0.003%)       0.003802098 seconds Time  elapsed   (+-  13.395%)

Note the large ratio of the cache misses.
Now if we change array[j][i]++, to array[i][j]++, and re-run Perf-stat:

$ perf Stat--repeat 10-e cycles:u-E instructions:u-e l1-dcache-loads:u   -e l1-dcache-load-misses:u./a.outperforman Ce counter stats for './a.out ' (runs):        2,395,407 cycles:u                   (+-   0.365%)        5,084,788 instructions:u           #      2.123 IPC     (+-   0.000%)        1,035,731 l1-dcache-loads:u          (+-   0.006%)            3,955 l1-dcache-load-misses:u    (+-   4.872%)       0.001806438 seconds Time  elapsed   ( +-   3.831%)

We can see the L1 cache are much more effective.
To identify hot spots to concentrate on your can use:

$ perf top-e l1-dcache-load-misses-e l1-dcache-loads   perftop:    1923 irqs/sec  kernel:0.0%  exact:  0.0% [l1-dcache-load-misses ...--------------------------------------------------------------------------------- -   weight    samples  pcnt funct DSO   ___    _______ _____ _____ ______________________      1.9       6184 98.8% func2/home/padraig/a.out      0.0  1.1% func1/home/padraig/a.out
Specialised Profilingsystem entry points
    • Strace-c $cmd
    • Ltrace-c $cmd
Heap Memory
    • Google Perftools (does CPU profiling too)
    • Go Perftools
    • Valgrind Massif
    • GLib mem tracing with Systemtap
/ o
    • I/O profiling with Systemtap
    • Process I/O profiling with Ioprofile (uses Strace & lsof)
    • Process I/O profiling with Iogrind
Gcc
    • Gprof Sampling
    • -finstrument-functions And_cyg_profile_func_enter
    • instrument functions and display with Graphvis
Misc
    • SYSTEMD, startup profiling
    • Bootchart, startup profiling
    • Latencytop
    • Online Web Page Profiler
    • Google Wide Profiling
    • Gotchas with Gprof and Kcachegrind
    • viewing profiling data with flame graphs

Performance analysis under Linux profiling (dynamic)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.