Performance analysis of multi-core multithreading using Oprofile

Source: Internet
Author: User
Tags benchmark

Use Oprofile performance analysis of multi-core multithreading

Yang Xiaohua

工欲善其事, its prerequisite

Brief introduction of---Mozi's performance analysis tools

In the process of continuous tuning of applications, in addition to the development of a complete test benchmark (Benchmark), but also need a straight in the key tool-performance analysis tools.

Depending on the complexity of the tool and the functionality provided, you can divide the performance tools into two levels:

Basic timing Tools

In ordinary life, the stopwatch is the simplest timing tool. According to this idea, you can put the timer function anywhere in the code and call it multiple times, so that you can measure the elapsed time of the entire application or part of it. The analysis method is not precise enough and the error is big.

Software analysis Tools

At present, there are two different kinds of software analysis tools: sampling and inserting piles.

Ø Sampling Type Analysis tool

Record related performance information, such as processor instruction pointers, thread IDs, processor IDs, and event counters, mainly through periodic interrupts. This method has little overhead and high precision. In Linux systems, Oprofile and the Intel VTune Performance Analyzer are more common.

Ø Inserting pile type analysis tool

That is, you can use a direct binary insert, or you can insert analysis code into your application by using the compiler. This approach is similar to adding a timer function to your application, and it brings a lot of overhead, but provides more functionality, such as call trees, call times, and function overhead. In Linux systems, GPROF and the Intel VTune Performance Analyzer are more common.

This paper will use the sampling tool Oprofile to analyze the performance of multi-core multithreaded program and play a useful role. ways to measure performance gains

With the development of science and technology, the structure of computer system is developing toward multi-core, which pushes concurrent programming to the spotlight, but how to measure the performance benefit of concurrent program design.

Had to think of the outstanding contribution of Gene Amdahl in 1967, he proposed the Amdahl law can calculate the parallel program relative to the optimal serial algorithm in the performance improvement of the theoretical maximum value.

Amdahl Law

1

Acceleration ratio = ————————

S+ (1-s)/n+h (n)

Where S represents the proportion of the serial portion of the executing program, n represents the number of processor cores, and H (n) represents the system overhead.

Because the Amdahl law itself makes several assumptions, but these assumptions in the real world is not necessarily correct, so that the computer industry disheartened for many years, that according to Amdahl Law, the development of greater parallelism of the performance gains may be negligible, Until the advent of the law of Gustafson, only to change the status quo.

On the basis of Sandia laboratory work, E.barsis proposed the Gustafson law:

Extended acceleration Ratio =n+ (1-n) *s

Where S represents the proportion of the serial part in the execution program, and N represents the number of processor cores.

Fortunately, Shi proved in 1996 that the Gustafson law is equivalent to the Amdahl law. Brief introduction of oprofile working principle

Depending on the structure of the CPU system, Oprofile supports two types of sampling: event-based (event Based) sampling and time Based based sampling.

If there is a performance count register inside the CPU, the Oprofile is based on event sampling, recording the number of occurrences of a particular event, such as a branch prediction event, and sampling once when the set value is reached. Conversely, based on time sampling, mainly by means of the operating system clock interrupt mechanism, every time the clock interruption occurs when the sample. It is not difficult to see, based on the time sampling method, the test program can not be shielded interrupt, its accuracy is lower than the event sampling.

For the x86 architecture, different models of CPUs, the sampling method is also different, the specific details as shown in the following table:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.