OpenMP for Parallel Computing

Source: Internet
Author: User

In the previous article, we introduced the basic concepts of parallel computing and OpenMP.

OpenMP provides high-level abstraction of parallel description, reducing the difficulty and complexity of parallel programming. In this way, programmers can devote more energy to parallel algorithms rather than the specific implementation details. OpenMP is a good choice for multi-threaded programming based on data diversity. At the same time, the use of OpenMP also provides greater flexibility and can easily adapt to different parallel system configurations. Thread granularity and load balancing are difficulties in traditional multi-threaded programming. However, in OpenMP, The OpenMP library takes over these two aspects from programmers. However, as a high-level abstraction, OpenMP is not suitable for scenarios where complicated threads need to be synchronized and mutually exclusive. Another disadvantage of OpenMP is that it cannot be used in non-shared memory systems (such as computer clusters). Generally, MPI is used in many such systems.

To use OpenMP in Visual Studio, you only needOpenMP support enabled(The parameter is/OpenMP), so that VC ++ can support OpenMP syntax during compilation. When writing an OpenMP program, add # include <OMP. h>. The following is an example:

#include <stdio.h>#include <omp.h>#include <windows.h>#define MAX_VALUE 10000000double _test(int value){    int index = 0;    double result = 0.0;    for(index = value + 1; index < MAX_VALUE; index +=2 )        result += 1.0 / index;    return result;}void OpenMPTest(){    int index= 0;    int time1 = 0;    int time2 = 0;    double value1 = 0.0, value2 = 0.0;    double result[2];    time1 = GetTickCount();    for(index = 1; index < MAX_VALUE; index ++)        value1 += 1.0 / index;    time1 = GetTickCount() - time1;    memset(result , 0, sizeof(double) * 2);    time2 = GetTickCount();#pragma omp parallel for    for(index = 0; index < 2; index++)        result[index] = _test(index);    value2 = result[0] + result[1];    time2 = GetTickCount() - time2;    printf("time1 = %d,time2 = %d\n",time1,time2);    return;}int main(){    OpenMPTest();    system("pause");    return 0;}
View code

A key statement is used in this example:

#pragma omp parallel for

This sentence represents the basic syntax rules for using OpenMP in C ++:#Pragma OMPCommand[Clause[Clause]…]

1. OpenMP commands and library functions

OpenMP includes the following:Command:

  • Parallel:Before a code snippet, this code will be executed in parallel by multiple threads.
  • For: Used before a for loop, the loop is allocated to multiple threads for parallel execution, and there must be no relevance between each loop
  • Parallel: The combination of parallel and for statements is also used before a for loop, indicating that the code of the For Loop will be executed in parallel by multiple threads.
  • Sections:Used before code segments that may be executed in parallel
  • Parallel sections:The combination of parallel and sections statements
  • Critical:Before a code critical section
  • Single:Before a code segment that is only executed by a single thread, it indicates that the code segment that follows will be executed by a single thread.
  • Barrier:Used for thread synchronization of code in the parallel zone. When all threads are executed to the barrier, they must be stopped until all threads are executed to the barrier.
  • Atomic:Used to update a specified memory area.
  • MASTER:Specifies that a block of code is executed by the main thread.
  • Ordered:Used to specify the parallel area for sequential execution
  • Threadprivate:Specifies that a variable is private to the thread.

In addition to the preceding commands, OpenMP also has some library functions, which are listed belowLibrary functions:

  • Omp_get_num_procs: Returns the number of processors that run the thread on multiple processors.
  • Omp_get_num_threads:Returns the number of active threads in the current parallel area.
  • Omp_get_thread_num:Return thread number
  • Omp_set_num_threads:Set the number of threads for parallel Code Execution
  • Omp_init_lock:Initialize a simple lock
  • Omp_set_lock:Lock Operation
  • Omp_unset_lock:The unlock operation must be paired with the omp_set_lock function.
  • Omp_destroy_lock:The pair operation function of the omp_init_lock function to close a lock.

OpenMP also includes the followingClause:

  • PRIVATE:Specify that each thread has its own private copy of Variables
  • Firstprivate:Specify that each thread has its own private copy of the variable, and the variable will be inherited from the initial value in the main thread
  • Lastprivate:It is mainly used to specify the value of the private variable in the thread and copy it back to the corresponding variable in the main thread after the parallel processing ends.
  • Reduce:It is used to specify that one or more variables are private, and these variables need to execute the specified operation after the parallel processing ends.
  • Nowait:Ignore the specified waiting
  • Num_threads:Number of threads
  • Schedule:Specify how to schedule for loop iteration
  • Shared:One or more variables are shared among multiple threads.
  • Ordered:Used to specify the order of for loop execution
  • Copyprivate:The shared variable used to specify variables in the single command for multiple threads.
  • Copyin:The value of a threadprivate variable must be initialized using the value of the main thread.
  • Default:Used to specify the usage of variables in the parallel processing area. The default value is shared.

2. parallel command usage

Parallel is used to construct a parallel block. It can also be used with other commands, such as for and sections. The usage is as follows:

# Pragma OMP parallel [For | sections] [clause [clause]…] {// Code to be executed in parallel}

 

For example, you can write a simple code that outputs prompt information in parallel:

#pragma omp parallel num_threads(8){    printf(“Hello, World!, ThreadId=%d\n”, omp_get_thread_num() );}

The following results will be obtained during local testing:

The result shows that the printf function is executed by eight threads, and the execution order of each thread is not determined. Compared with the traditional thread function creation, OpenMP is equivalent to repeatedly calling a thread function for a thread entry function to create a thread and wait for the thread to finish running. If num_threads (8) is removed from the Code above to specify the number of threads, the number of threads will be created based on the actual number of CPU cores.

3. For command usage

The for command is used to allocate a for loop to multiple threads for execution. Generally, the for command can be used together with the parallel command to form the parallel for command, or separately used in the parallel block of the parallel statement. The syntax is as follows:

# Pragma OMP [parallel] for [clause] For Loop statement

For example, there is an example:

#pragma omp parallel forfor ( int j = 0; j < 4; j++ ){    printf("j = %d, ThreadId = %d\n", j, omp_get_thread_num());}

The following result is displayed:

The results show that the for loop statement is allocated to different threads for separate execution. Note that if the parallel keyword is not added, the four loops will be executed in the same thread and the results will be as follows:

  

4. Use of sections and Section

The Section statement is used in the sections statement to divide the code in the sections statement into several different segments, and each segment is executed in parallel. The usage is as follows:

# Pragma OMP [parallel] sections [clause] {# pragma OMP section {// code block }}

For example, there is an example:

#pragma omp parallel sections {#pragma omp section    printf("section 1 ThreadId = %d\n", omp_get_thread_num());#pragma omp section    printf("section 2 ThreadId = %d\n", omp_get_thread_num());#pragma omp section    printf("section 3 ThreadId = %d\n", omp_get_thread_num());#pragma omp section    printf("section 4 ThreadId = %d\n", omp_get_thread_num());}

The following result is displayed:

The result shows that the code in each section is executed in parallel (allocated to different threads. When using the Section statement, note that this method must ensure that the code execution time in each section is not much different, otherwise, the execution time of a section is too long than that of other sections to achieve the parallel execution effect.

If you split the above Code into two sections:

#pragma omp parallel sections {#pragma omp section    printf("section 1 ThreadId = %d\n", omp_get_thread_num());#pragma omp section    printf("section 2 ThreadId = %d\n", omp_get_thread_num());}#pragma omp parallel sections {#pragma omp section    printf("section 3 ThreadId = %d\n", omp_get_thread_num());#pragma omp section    printf("section 4 ThreadId = %d\n", omp_get_thread_num());}

The result is as follows:

It can be seen that two sections are executed in serial mode, while the Section is executed in parallel.

 

Section:

The for statement is automatically used by the system to distribute tasks. As long as there is no time gap between each loop, the allocation is very even, using section to divide threads is a way of manually dividing threads. The concurrency of the end depends on the programmer.

The several OpenMP commands parallel, for, sections, and section mentioned in this article are actually used to create threads. This method of thread creation is more convenient than the traditional method of calling and creating thread functions, and more efficient.

 

  

 

OpenMP for Parallel Computing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.