Perf Performance Tuning Tool usage understanding

Source: Internet
Author: User
Tags prefetch switches time in milliseconds

currently doing performance analysis of things, not how to contact perf, find a few articles combed a bit, according to the form of problems recorded here.

easy to see for yourself.

 

What is perf?

Linux Performance Tuning tool, 32 cores above the tool, software performance analysis. Installing perf is very easy in 2.6.31 and subsequent versions of the Linux kernel.

Almost all performance-related events can be handled.

What is a performance event.

Refers to hardware events or software things that occur in a processor or operating system that may affect program performance.

The main focus is where.

Algorithm optimization (spatial complexity, time complexity), code optimization (mentions execution speed, reduces memory consumption)

Evaluation of the use of hardware resources, such as the number of caches at all levels of access, the number of cache levels, pipeline downtime, the number of front-end bus access.

The evaluation program uses the operating system resources, the number of system calls, the number of context switches, and the number of task migrations.

Fundamentals.

The hardware uses the components of the PMC (performance Monitoring Unit) CPU to detect whether performance events occur and the number of occurrences in a given condition.

Software performance tests, built into kernel, are distributed across functional modules, statistics and operating system-related performance events.

How to use high-precision sampling.

If you need to take high-precision sampling, you need to add the suffix ": P" or ":p p" after the event when you are making performance things

0: No Precision guarantee
1: Sampling instructions good trigger performance time instruction deviation is constant (:P)
2: Try to ensure that the deviation is 0 (:p p)
3: Guaranteed deviation must be 0 (:p pp)

What are the common commands?

1. Perf List lists all events that can trigger perf sampling points (performance events supported by the current hardware environment)

The population is divided into three classes of hardware (hardware generation), software (kernel software generation), Tradepoint (static tracepoint triggering events in the kernel).

List of pre-defined events (to be used in-e): cpu-cycles OR Cycles [Hardware event] Processor cycle matter      Pieces stalled-cycles-frontend or Idle-cycles-frontend [Hardware event] Stalled-cycles-backend or Idle-cycles-backend                                   [Hardware Event] instructions [Hardware event] Cache-references [Hardware event] cache-misses [Hardware event] Branch-instr Uctions OR Branches [Hardware event] branch-misses [Hardware Eve                                          NT] Bus-cycles [Hardware event] Cpu-clock                              [Software event] task-clock [software event] Page-faults OR faults [Software event] minor-faults [software event] Major-fa                    Ults                   [Software event] context-switches OR CS [software event] Cpu-migrations OR migrations [software event] alignment-faults [software event                                    ] Emulation-faults [software event] L1-dcache-loads
  [Hardware Cache Event]                                   l1-dcache-load-misses [Hardware Cache event] L1-dcache-stores
  [Hardware Cache Event]                               l1-dcache-store-misses [Hardware Cache event] L1-dcache-prefetches
  [Hardware Cache Event]                                    l1-dcache-prefetch-misses [Hardware Cache event] L1-icache-loads
  [Hardware Cache Event]                               l1-icache-load-misses [Hardware Cache event] L1-icache-prefetches [Hardware Cache Event] l1-icache-prefetch-misses [Hardware cache event] Llc-loads
  [Hardware Cache Event]                                         llc-load-misses [Hardware Cache event] Llc-stores
  [Hardware Cache Event]                                     llc-store-misses [Hardware Cache event] Llc-prefetches
  [Hardware Cache Event]                                         llc-prefetch-misses [Hardware Cache event] Dtlb-loads                                        [Hardware Cache event] dtlb-load-misses [Hardware cache event] Dtlb-stores [Hardware Cache event] dtlb-store-misses [Hardware C                               Ache event] dtlb-prefetches [Hardware cache event] Dtlb-prefetch-misses [Hardware Cache Event] Itlb-loaDS [Hardware Cache event] Itlb-load-misses [Ha                                 Rdware Cache Event] Branch-loads [Hardware cache event] Branch-load-misses [Hardware Cache Event]

2, the overall performance of the Perf Stat Analysis program

Uses 10 typical events to dissect the application.

Task-clock: The target task really consumes processor time in milliseconds, which we call task execution time,

The following is the processor occupancy of the task (the ratio of execution time to duration)

Duration value The total time from the task submission to the end of the task (the total time will be printed after the stat ends).

Context-switches: The number of context switches, the first half is the number of switches, followed by the average number of occurrences per second (M is 10 of 6).

Cpu-migrations: Processor migration, Linux for the location of each processor load balancer,

A processor migration occurs when a task is moved from one processor to another under certain conditions.

Page-fault: Page faults, Linux memory management subsystem uses a paging mechanism,

When an application requests a page that has not been established, the requested page is not in memory, or the requested page is in memory,

However, the mapping of physical and virtual addresses has not been established, triggering a fault on the pages.

Cycles: Number of processor cycles consumed by the task

Instructions: Number of processor instructions generated during task execution, IPC (instructions perf cycle)

IPC is an important indicator for evaluating processor and application performance. (Many instructions require multiple processing cycles to complete),

The larger the IPC the better, the more the program takes advantage of the processor's characteristics.

Branches: The number of branch instructions that the program encountered during execution.

Branch-misses: Number of branch instructions for predicting errors

Cache-misses:cache the number of aging

Cache-references:cache Number of Hits

The common parameters are as follows

-E, specifying the performance event
-P, specifying the PID-T for the profiling process,
specifying the TID-r n for the thread to be analyzed
, continuous analysis of N-
D, full performance analysis, and more performance events


Performance counter stats for process id ' 21787 ': One analysis results in the following:

 42677.253367 Task-clock # 0.142 CPUs utilized 587,906 Contex                 
              T-switches # 0.014 M/sec 29,209 cpu-migrations # 0.001 m/sec                    117 Page-faults # 0.000 M/sec 82,341,400,508 Cycles # 1.929 GHz [83.48%] 61,262,984,952 stalled-cycles-frontend # 74.40% frontend cycl ES idle [83.28%] 43,113,701,768 stalled-cycles-backend # 52.36% Backend Cycles idle [66.72%] 44,023,301, 495 Instructions # 0.53 Insns per cycle # 1.39 STA
      lled cycles per INSN [83.50%] 8,137,448,528 branches # 190.674 m/sec [83.22%] 430,957,756 branch-misses # 5.30% of all branches [83.34%] 300.393753095 seconds time Elaps Ed 

3, perf top real-time display system/Process performance statistics

Default performance event "Cycles CPU cycles" for full-system performance profiling

The common parameters are as follows:

-P: Specify the process PID
-T: Specify the thread's Tid-a
: Analyze the performance of the entire system (default)-
D: Interface refresh cycle, default is 2 seconds


In the Samples pcnt function DSO result output, the ratio is the percentage of performance time raised by the symbol in the entire monitoring domain, often referred to as heat.

_______ _____ ______________________________________________________________________________________ _________ 61.00 19.4% Native_write_msr_safe [kernel] 18.00 5.7% jvm_i nternstring libjvm.so 17.00 5.4% Find_busiest_gro                                                                             Up [kernel] 17.00 5.4% _spin_lock                                                                    [Kernel] 12.00 3.8% dev_hard_start_xmit                                                                           [Kernel] 11.00 3.5% tg_load_down                                                                             [Kernel] 9.00 2.9% futex_wake                                                                               [Kernel] 8.00 2.5% Do_futex [Kernel] 7.00 2.2% Load_balance_fair [kernel] 7.00 2.2% Weig hted_cpuload [kernel] 7.00 2.2% update_cfs_share                                                            s [kernel] 7.00 2.2% jvm_latestuserdefinedloader                                                                        libjvm.so 6.00 1.9% Update_cfs_load [Kernel] 5.00 1.6% _zn16systemdictionary30resolve_instance_class_or_nu                                                                         Lle12symbolhandle6handles1_p6thread libjvm.so 5.00 1.6% BR_SYSFS_DELBR              [Bridge] 5.00 1.6% futex_wait

4. Perf Record/report Record System/Process performance events over time

The data file is generated by default in the current directory: Perf.data

The report reads the generated perf.data file, and the-i parameter specifies the path

Understanding Perf is the beginning of performance analysis.

http://www.ibm.com/developerworks/cn/linux/l-cn-perf1/


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.