I hope to answer this question after in-depth study-"Who knows the performance and advantages and disadvantages of the program designed using OpenMP, Cuda, Mpi, and TBB"

Source: Internet
Author: User

Discover this problem by chance ---- Who knows the performance and advantages and disadvantages of the program designed with OpenMP, Cuda, Mpi, and TBB. Please kindly advise me ~

I hope you can have a better understanding of this after learning it!

 

This problem is too big. It may not be clear to say three or two sentences.

Let's take a look at the parallel programming mode. There are shared memory and distributed, pure Data Parallel and task parallel, and supported programming languages, implementation methods (Language extension or class library template )...

After understanding this, you can probably know which method is suitable for your application or algorithm requirements.

There is no absolute distinction between good and bad, just like the question of who is better in programming languages that everyone has been talking about.{
Logclickcount (this, 111 );
} "Href =" http://hi.csdn.net/intel_iclifort "target =" _ blank "> intel_iclifort

My opinion is:
Advantages and disadvantages: The OpenMP design is a little simpler. You only need to use one statement to optimize the program performance (key code 1). The performance of matrix multiplication alone is very good. However, using MPI is a little complicated, but it can also achieve good results.

Key code 1:

Void parallelmxm (float C [N] [N], float a [n] [N], float B [N] [N]) {

# Pragma OMP parallel for Schedule (dynamic)

For (INT I = 0; I <n; ++ I ){

For (Int J = 0; j <n; ++ J ){

Float sum = 0;

For (int K = 0; k <n; ++ K ){

Sum + = A [I] [k] * B [k] [J];

}

C [I] [J] = sum;

}}}{
Logclickcount (this, 111 );
} "Href =" http://hi.csdn.net/dxmgood "target =" _ blank "> dxmgood

 


If there is a complex interaction between the threads of the program, it is better to use raw threading. For others, use OpenMP and TBB.
In terms of program style, it is better to use OpenMP for C Programs and TBB for C ++.{
Logclickcount (this, 111 );
} "Href =" http://hi.csdn.net/horreaper "target =" _ blank "> horreaper

The last one is the most profound answer!

Looking forward to the final answer!

If there is a complex interaction between the threads of the program, it is better to use raw threading. For others, use OpenMP and TBB.
In terms of program style, it is better to use OpenMP for C Programs and TBB for C ++.{
Logclickcount (this, 111 );
} "Href =" http://hi.csdn.net/horreaper "target =" _ blank "> horreaper

 

I am also working on high-performance computing recently, but I am just getting started. So I am very simple and hope that my fingers are high.
The libraries or standards cited by the author are used for parallel computing, but their respective focuses or implementation of parallel methods are different.
MPI is the message passing interface used to transmit information between computers. That is to say, it is mainly for computer cluster parallel or super computer parallel. I don't know much about it because I don't have a cluster in the experiment.
OpenMP mainly implements parallel execution through some compiled pre-processing commands, which is like "# pragma..." in C/C ++... the latest version is 3.0, and GCC and Intel C ++ support the latest version. Microsoft's 2.0 and support only. OpenMP mainly implements multithreading, that is, playing a role on a computer with multiple CPU processors. However, there are also OpenMP papers used in the cluster. I did not pay attention to them, but I only know. OpenMP only adds some pre-processor commands, and its purpose is to ensure that the compiled program can run normally in parallel and in serial mode at the same time, therefore, its biggest advantage is that it can make the current Serial C program a multi-thread after a small change. It is said that OpenMP mainly targets loop parallelism. I just learned it from the beginning and didn't understand it.
TBB is an Intel product. Based on STL, it won the jolt award and is a very good thread library. Its biggest advantage is that it has a good structure and a higher degree of abstraction than OpenMP, in many cases, it is better to use TBB to write new programs, but it does not support C and only supports C ++. For graphics, if you know the opencv library, TBB is used in this library.
I was going to use TBB in parallel for the CPU, but I saw the FAQ on TBB's website. I wrote that we should use OpenMP as much as possible in the program. Now I decided to learn OpenMP first, openMP and TBB can coexist, but it cannot be completed overnight. The FAQ on Intel's TBB official website is excerpted as follows:
Everyone shoshould use OpenMP as much as they can. it is easy to use, it is standard, it is supported by all major compilers, And it exploits parallelism well. but it is very loop oriented, and does not address algorithm or data structure level parallelism. when OpenMP works for your code, you should use it. we 've seen it used to great advanatage in financial applications, MP3 codecs, Scientific Programs and high definition video editing software. openMP is best geared for Fortran and C code.
Cuda is a programming library developed by NVIDIA for its GPU, just like TBB developed by Intel for CPU. GPU is always parallel. It features data-intensive parallelism, but there were few programming interfaces in the past. Experts needed OpenGL to use GPU for parallel computing. Until NVIDIA develops cuda, we can use C/C ++ for GPU programming. The aforementioned class libraries or standards are for the CPU, while Cuda is for the GPU, so we can imagine that the above libraries can be used together with Cuda. In fact, many solutions have been completed, such as using MPI and Cuda to implement high-performance GPU cluster collaboration, OpenMP, TBB, and NVIDIA to implement collaboration between CPU and GPU.
It should also be pointed out that in terms of programming, OpenMP and TBB compilers support a wide range, while Cuda is newer than the previous two, so there are not many compilers supported, while TBB is only a class library, it is based on STL, so it should be easier to work with Cuda. Of course, the cooperation between Cuda and OpenMP has already been implemented.
I just say so much from the implementation method. As for the advantages and disadvantages, my ability is not sufficient to evaluate. I mainly want to study the heterogeneous collaboration between CPU and GPU, which may use OpenMP, TBB, and Cuda. If the author and other experts share the same idea, we can discuss it together. {
Logclickcount (this, 111 );
} "Href =" http://hi.csdn.net/snow_bird "target =" _ blank "> snow_bird

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.