1.VTune Introduction
The VTune Visual Performance Analyzer (Intel VTune Performance Analyzer) is a tool for analyzing and optimizing program performance as Intel provides developers with an analysis tool specifically designed to find hardware and software performance bottlenecks. It determines the hotspot (hotspot) of the program and finds the cause of unsatisfactory performance, allowing the developer to optimize the program accordingly.
The VTune Performance Analyzer discovers and locates performance issues in the program by collecting performance data from the current system, organizing and presenting the data in different ways from the system to the source code, identifying potential performance issues, and proposing improvements.
2.VTune Download and Installation
The download and installation of VTune is cumbersome, and the following is a brief introduction to the download process of the VTune software and the installation process in the Linux environment.
Downloads for 2.1 VTune
VTune's official version of the price is very expensive, you can choose to download the trial version-download link. To download the trial version you need to register your account and get a free 31-day trial. After the registration is completed, an email will be sent to the registration email containing the download link and registration code for the software.
Click the download link in the mailbox, select "Linux System Products" and select the version number of the software you want to download, this document takes the "vtune_amplifier_xe_2013_update17.tar.gz" software version number as an example.
Installation of 2.2 VTune
To move the downloaded software installation package to a Linux system, unzip the installation package:
Tar zxvf vtune_amplifier_xe_2013_update17.tar.gz
Enter the extracted folder, execute "install.sh" script, all follow the default settings, according to the installation wizard.
Once the installation is complete, you will need to perform the files that were obtained after the VTune installation was successful:
source/home/.../intel/vtune_amplifier_xe_2017.1.0.486011/amplxe-vars.sh
Use the "Amplxe-gui" command to start the VTune software, as shown in Figure 2.1.
Figure 2.1 VTune Start command
3. Use of VTune
In a Linux environment, start the VTune Performance Analyzer, as shown in Figure 3.1, and click the New Project button to create a new performance analysis project.
Figure 3.1 New Project
As shown in Figure 3.2, select the target file to analyze and fill in the execution parameters of the parsed file.
Figure 3.2 Target File selection
As shown in Figure 3.3, select the target project and right-click to create a new analysis type for the target file.
Figure 3.3 Creating a new analysis type
As shown in Figure 3.4, the Intel VTune Performance Analyzer, the types of performance that can be analyzed are: "Algorithm analysis", "microarchitecture analyses", "Knights Corner Platform Analysis "and" Custom analysis "four categories.
Figure 3.4 VTune Analysis Type
As shown in Figure 3.5, the algorithm analysis (algorithmic analysis) is the most widely used analytical type. It contains the four seed analysis types "basic Hotspots (base hotspot)", "Advanced Hotspots", "Concurrency (concurrency)" and "Locks and Waits (Resource lock and Wait)". The use of basic Hotspots (basic hotspot) is described in detail below.
Figure 3.5 Algorithm Analysis subclass
4. (Basic Hotspots) basic performance hot Spot Analysis
As shown in Figure 4.1, according to the 3rd chapter, select the target program to be analyzed and select the basic performance hot spot analysis in the algorithm analysis. To set the sampling interval for the CPU, click the Start button in the upper right corner to start the analysis of the target program.
Figure 4.1 Establishment of basic performance analysis (Basic Hotspots)
As shown in Figure 4.2, when you click Start to analyze the data, VTune starts running the target program and collects the relevant performance data, and when the collection is complete, you need to manually stop the data collection.
Figure 4.2 Stop collecting data (Basic Hotspots)
as shown in Figure 4.3, after stopping the data analysis, 8 types of data can be obtained, namely "Analysis Target", "Analytical Type", "Collection Log", "Summary", "bottom-up", "caller/ Callee "," Top-down Tree "and" Tasks and Frames ". There are three types of data such as "Analysis Target", "Analytical Type" and "Collection Log", and do not analyze too much. The main analysis of the other types of data contained in the content.
Figure 4.3 Collection of data classifications (Basic Hotspots)
4.1 Summary
As shown in Figure 4.4, the summary main analysis data are: "Elapsed time", "Top Hotspots (High hot section)", "CPU usage histogram (CPU use histogram)" and "Collection and Platform info (gather information and platform information).
Figure 4.4 Summary Data display (Basic Hotspots)
As shown in Figure 4.5, the Elapsed time information mainly includes the number of bus threads, the time spent in the synchronization and line libraries functions, the spin time (the CPU waits for a spin wait time for other synchronization resources to process), the CPU (the total time spent by the CPU to run the program), and the pause time.
Figure 4.5 Elapsed Time (Basic Hotspots)
As shown in Figure 4.6, the top hotspots information lists the most active (and time-consuming) parts of the VTune analysis program, such as spin locks, functions, and so on.
Figure 4.6 Top Hotspots (Basic Hotspots)
As shown in Figure 4.7, CPU usage histogram information, showing the CPU usage histogram.
Figure 4.7 CPU Usage Histogram (Basic Hotspots)
As shown in Figure 4.8, Collection and Platform info contains information about the application command line, operating system, CPU, and so on.
Figure 4.8 Collection and Platform info (Basic Hotspots) 4.2 bottom-up
As shown in Figure 4.9, bottom-up can see the cost of the function/module/thread invocation time, with the main analysis of the data: process, thread, module, function, and call stack information. You can display the program's process, thread number, start address of the function, CPU overhead time, CPU spin time and other information.
Figure 4.9 bottom-up (Basic Hotspots) 4.3 caller/callee
As shown in Figure 4.10, the main data analyzed by Caller/callee are: Total CPU utilization time, self-utilization time of each function, self-cost time of each function, caller and callee of each function, etc.
Figure 4.10 Caller/callee (Basic Hotspots) 4.4 top-down Tree
As shown in Figure 4.11, the Top-down tree shows the time and the ratio of each call, in the form of trees, that can be expanded one layer at a time from the most expensive places, to find key functions to analyze its performance. The content and Caller/callee of the analysis are basically the same.
Figure 4.11 Top-down Tree (Basic Hotspots) 4.5 Tasks and Frames
As shown in Figure 4.12, the Tasks and frames detail the program's analysis time, CPU usage time, and the running time of the process and individual threads in the form of a histogram.
Figure 4.12 Tasks and Frames (Basic Hotspots)
5. Summary
VTune can help users locate "hot spots" in the program, so-called "hotspots" are the code snippets that spend the longest time in a program. The VTune Performance Analyzer collects performance data on applications and systems, and then displays them graphically and in tabular form. From these displayed data, the user is able to analyze the performance of the application to know which part of the program is performing slowly and why it is performing slowly.
6. Disclaimer
Internal communication documents, if found related errors or suggestions, please contact the document creator in time to revise and update.