OpenMP configuration and ease of use

Source: Internet
Author: User

1.VS configuration OpenMP

Project Properties--c/s + +---and OpenMP support, drop-down menu select Yes (/OPENMP)

2. Simple to use

(1) test the machine is a few cores.

Add the following

Indicates that the computer is 8 cores, or 8 threads. Code

#include <omp.h>

#include <iostream>

int main ()

{

    std::cout << "Parallel begin:\n" ;

    #pragma omp parallel

    {

        std::cout << omp_get_thread_num ();

    }

    Std::cout << "\ parallel end.\n";

    Std::cin.get ();

    return 0;

}


The results of the operation are as follows:

Indicates that the computer is 8 cores, or 8 threads. (2) easy to use

The parallel guidance directive is used in OMP to identify parallel segments in the code in the form of:

#pragma omp parallel

           {

             Each thread executes the code in curly braces

            }


Parallel                                       

Parallel indicates that subsequent statements will be executed in parallel by multiple threads, and the statement (or block of statements) after "#pragma omp parallel" is called parallel region.

The order of execution of multiple threads is not guaranteed.

For                                       

We typically divide a computationally large task into one part of the compute task, which is designed to shorten the computational time. The key here is that each thread performs a different calculation (the data of the operation differs or the computing task itself), and multiple threads work together to complete all calculations.

OpenMP for indicates that multiple iterations of the C + + for loop are divided into threads (dividing the iterations of each thread's iteration, and all the iterations of all the threads are exactly the same for all iterations of the C + + for loop), where C + + for loops require some limitations to be able to execute C + + The for before determines the number of loops, for example C + + for should not contain break etc.

The form of use is:

1)

#pragma omp parallel for for

         (...)

2)

#pragma omp parallel

        {//Note: Curly braces must be another line

         #pragma omp for for

          (...)

        }

Note: The second form of the parallel block does not appear in the parallel guidance command.

The first form of the scope is just the for loop that follows, and the second form can have multiple for guidance directives in the entire parallel block.

When dealing with multiple for loops, if it is a OpenMP3.0 and a later version, the task is already supported, and the problem of irregular loops and recursive function calls can be resolved more effectively.

The basic idea is: Create a thread group, the main thread is responsible for creating the task, the negative thread is responsible for executing the task

#pragma omp parallel

{

  #pragma omp single

  {for

    (i = 0;i < N; ++i)

    {for

      (j = 0; j < M; ++j)

      {

        #pragma omp task

          {

             //Calculate

          part

       }

}}}}
(3) Practical examples

1) There is a simple test () function, and then in Main (), use a For loop to run the test () function 8 times.

#include <iostream>
#include <time.h>
void Test ()
{
    int a = 0;
    for (int i=0;i<100000000;i++)
        a++;
}
int main ()
{
    clock_t t1 = Clock ();
    for (int i=0;i<8;i++)
        test ();
    clock_t t2 = Clock ();
    std::cout<< "Time:" <<t2-t1<<std::endl;
}
After the compilation is run, the print time is: 1.971 seconds. Below we use a sentence to put the above programming multi-core operation.

#include <iostream>
#include <time.h>
void Test ()
{
    int a = 0;
    for (int i=0;i<100000000;i++)
        a++;
}
int main ()
{
    clock_t t1 = Clock ();
    #pragma omp parallel for for
    (int i=0;i<8;i++)
        test ();
    clock_t t2 = Clock ();
    std::cout<< "Time:" <<t2-t1<<std::endl;
}

After compiling and running, the time to print is: 0.546 seconds, almost 1/4 of the above time.

When the compiler discovers #pragma omp parallel for, it automatically divides the following for loop into n parts, (n is the number of computer CPU cores), then assigns each copy to a kernel to execute, and multiple cores are executed in parallel. The following code validates this analysis.

#include <iostream>
int main ()
{
#pragma omp parallel for
    (int i=0;i<10;i++)
        std:: cout<<i<<std::endl;
    return 0;
}

You will find that the console prints 0 3 4 5 8 9 6 7 1 2. Note: Because each core is executed in parallel, the order in which it is printed may not be the same each time it is executed.

Let's talk about the race condition (race condition), which is the toughest problem for all multithreaded programming. The problem can be described as: When multiple threads are executing in parallel, it is possible that multiple threads read and write a variable at the same time, leading to unpredictable results. For example, for array A with 10 shaping elements, we use the For loop to find the sum of each element and save the result in the variable sum.

#include <iostream>
int main ()
{
    int sum = 0;
    int a[10] = {1,2,3,4,5,6,7,8,9,10};
#pragma omp parallel for for
    (int i=0;i<10;i++)
        sum = sum + a[i];
    std::cout<< "sum:" <<sum<<std::endl;
    return 0;
}

If we comment out #pragma omp parallel for, let the program execute in the traditional serial way first, obviously, sum=55. However, when executed in parallel, sum becomes another value, such as sum=49 during a run. The reason for this is that when a thread a executes sum = SUM + a[i], the other line B is just updating sum, and a is still accumulating with the old sum, and an error occurs.

So how do you implement parallel array summation with OpenMP? Let's start with a basic solution. The idea of the scheme is to first generate an array of Sumarray, whose length is the number of threads executing in parallel (by default, that number equals the number of cores of the CPU), and in the For loop, let each thread update its own thread in the corresponding Sumarray element into sum, the code is as follows

#include <iostream>
#include <omp.h>
int main () {
    int sum = 0;
    int a[10] = {1,2,3,4,5,6,7,8,9,10};
    int corenum = Omp_get_num_procs ();//number
    of processors int* Sumarray = new int[corenum];//corresponds to the number of processors, Sir, an array for
    (int i=0;i <corenum;i++)//Initialize the elements of the array to 0
        sumarray[i] = 0;
#pragma omp parallel for for
    (int i=0;i<10;i++)
    {
        int k = Omp_get_thread_num ();//Get the ID
        of each thread SUMARRAY[K] = Sumarray[k]+a[i];
    }
    for (int i = 0;i<corenum;i++)
        sum = sum + sumarray[i];
    std::cout<< "sum:" <<sum<<std::endl;
    return 0;
}
Note that in the above code, we use the Omp_get_num_procs () function to get the number of processors, with the Omp_get_thread_num () function to get the ID of each thread, in order to use these two functions, we need to include< Omp.h>

Although the above code has achieved its purpose, it has produced more additional operations, such as the Sumarray of the group, and finally the use of a For loop to add up its elements, there is no easier way. The answer is yes, OpenMP provides us with another tool, the reduction, as shown in the following code:

#include <iostream>
int main () {
    int sum = 0;
    int a[10] = {1,2,3,4,5,6,7,8,9,10};
#pragma omp parallel for Reduction (+:sum) for
    (int i=0;i<10;i++)
        sum = sum + a[i];
    std::cout<< "sum:" <<sum<<std::endl;
    return 0;
}

In the above code, we add reduction (+:sum) to the #pragma omp parallel for, which means to tell the compiler that you want to run the following for loop, but each thread has to save a copy of the variable sum, and after the loop is over, All threads add up their sum as the final output.

Reduction is convenient, but it only supports some basic operations, such as +,-,*,&,|,&&,| | such as In some cases, we need to avoid race condition, but the operation involved is beyond the scope of reduction, what should be done. This will use the other tool of OpenMP, critical. Take a look at the following example, in which we find the maximum value of array A and save the result in Max.

#include <iostream>
int main () {
    int max = 0;
    int a[10] = {11,2,33,49,113,20,321,250,689,16};
#pragma omp parallel for for
    (int i=0;i<10;i++)
    {
        int temp = a[i];
#pragma omp critical
        {
            if (temp > Max)
                max = temp;
        }
    }
    std::cout<< "Max:" <<max<<std::endl;
    return 0;
}
In the example above, the for loop is automatically divided into n parts to execute in parallel, but we use the #pragma omp critical to enclose the IF (Temp > max) max = temp, which means that each thread executes the for inside statement in parallel, But when you execute into critical, be aware that there is no other thread is impersonating in it, and if so, wait for the other threads to execute. This avoids the race condition problem, but it is obvious that it will be executed at a lower speed because of the possibility of a thread waiting condition.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.