First, let's look at the simplest OpenMP parallel program. If your machine is dual-core, you will know the effect.
// Add/qopenmp to the compilation options
# Include <stdio. h>
Int main (INT argc, char * argv [])
{
# Pragma OMP parallel
For (INT I = 0; I <10; I ++)
{
Printf ("I = % d/N", I );
}
Return 0;
}
1. Concept of fork/join Parallel Execution ModeOpenMP is a set of compiler commands and library functions. It is designed and used for parallel programs on shared storage computers. In the previous article, we have tried a parallel for command of OpenMP. From the previous article, we can also find that the non-parallel part of code can be executed only after all the programs executed in OpenMP are finished. This is the standard parallel mode fork/join-type parallel mode. The shared storage-type parallel program uses fork/join-type parallel. The basic idea of code execution in standard parallel mode is that there is only one main thread at the beginning of the program. The serial part of the program is executed by the main thread, and the parallel part is executed by deriving other threads, however, if the parallel part does not end, the serial part will not be executed. For example, the following code in the previous article: int main (INT argc, char * argv []) {clock_t T1 = clock (); # pragma OMP parallel for (Int J = 0; j <2; j ++) {test ();} clock_t t2 = clock (); printf ("Total time = % d/N", t2-t1); test (); Return 0;} before the code in the for loop is not executed, clock_t t2 = clock (); this line of code will not be executed. If it is compared with calling a thread to create a function, it is equivalent to creating a thread first and waiting for the thread to finish executing, therefore, this parallel mode is created in the main thread The thread does not run in parallel with the main thread.
2. Introduction to OpenMP commands and library functionsThe following describes the usage of the basic and common commands of OpenMP. in C/C ++, the format of the OpenMP command is
#
Pragma OMP
Command
[
Clause
[
Clause
]…]The parallel for mentioned above is an instruction. In some books, the "instruction" of OpenMP is also called "compile instruction", and the following clause is optional. For example: # pragma OMP parallel private (I, j) parallel is an instruction, while private is a clause. For the sake of convenience, a line containing # pragma and OpenMP commands is called a statement, for example, the above line is called the parallel statement. OpenMP Commands include the following:
ParallelBefore a code segment, it indicates that the Code will be executed in parallel by multiple threads.
ForBefore a for loop is used, the loop is allocated to multiple threads for parallel execution, and there must be no correlation between each loop.
ParallelThe combination of parallel and for statements is also used before a for loop, indicating that the code of the For Loop will be executed in parallel by multiple threads.
SectionsBefore a code segment that may be executed in parallel
Parallel sections, Parallel and sections
CriticalBefore a code critical section
SingleBefore a code segment that is only executed by a single thread, it indicates that the subsequent code segment will be executed by a single thread. Flush,
BarrierUsed for thread synchronization of code in the parallel zone. When all threads are executed to the barrier, they must be stopped until all threads are executed to the barrier.
Atomic, Used to specify a memory area to be updated by the brake
MasterSpecifies that a block of code is executed by the main thread.
OrderedUsed to specify the sequential execution of loops in the parallel area
ThreadprivateTo specify that a variable is private to the thread. In addition to the preceding commands, OpenMP also provides some library functions. The following lists several common library functions:
Omp_get_num_procs, Returns the number of processors that run the thread.
Omp_get_num_threadsReturns the number of active threads in the current parallel area.
Omp_get_thread_num, Returns the thread number.
Omp_set_num_threadsSets the number of threads for parallel code execution.
Omp_init_lock, Initialize a simple lock
Omp_set_lock, Lock operation
Omp_unset_lock, Unlock operation, which must be paired with the omp_set_lock function.
Omp_destroy_lock, The pair operation function of the omp_init_lock function. To disable an OpenMP lock, the following clauses are available:
Private,Specify that each thread has its own private copy of the variable.
Firstprivate
,Specify that each thread has its own private copy of the variable, and the variable will be inherited from the initial value in the main thread.
Lastprivate
,It is mainly used to specify the value of the private variable in the thread and copy it back to the corresponding variable in the main thread after the parallel processing ends.
Reduce
,It is used to specify that one or more variables are private, and these variables need to execute the specified operation after the parallel processing ends.
Nowait
,Ignore the specified waiting
Num_threads
,Number of threads
Schedule
,Specify how to schedule for loop iteration
Shared
,One or more variables are shared among multiple threads.
Ordered
,Used to specify the order of for loop execution
Copyprivate
,The shared variable used to specify variables in the single command for multiple threads.
Copyin
,The value of a threadprivate variable must be initialized using the value of the main thread.
Default
,Used to specify the usage of variables in the parallel processing area. The default value is shared.
3. parallel command usageParallel is used to construct a parallel block. It can also be used with other commands, such as for and sections. In C/C ++, parallel is used as follows: # pragma OMP parallel [For | sections] [clause [clause]…] {// Code} the parallel statement must be followed by a braces to enclose the code to be executed in parallel. Void main (INT argc, char * argv []) {# pragma OMP parallel {printf ("Hello, world! /N ") ;}} executing the above Code will print the following result Hello, world! Hello, world! Hello, world! Hello, world! It can be seen that the code in the parallel statement has been executed four times, indicating that a total of four threads have been created to execute the code in the parallel statement. You can also specify how many threads are used for execution. The num_threads clause must be used: void main (INT argc, char * argv []) {# pragma OMP parallel num_threads (8) {printf ("Hello, world !, Threadid = % d/N ", omp_get_thread_num () ;}execute the above Code and the following results will be printed: Hello, world !, Threadid = 2 Hello, world !, Threadid = 6 Hello, world !, Threadid = 4 Hello, world !, Threadid = 0 Hello, world !, Threadid = 5 Hello, world !, Threadid = 7 Hello, world !, Threadid = 1 Hello, world !, Threadid = 3 different from threadid, we can see that eight threads are created to execute the above Code. Therefore, the parallel command is used to create multiple threads for a piece of code to execute it. Each line of code in the parallel block is repeatedly executed by multiple threads. Compared with the traditional thread function creation, it is equivalent to repeatedly calling a thread function for a thread entry function to create a thread and wait for the thread to finish execution.
4. usage of the for commandThe for command is used to allocate a for loop to multiple threads for execution. Generally, the for command can be used together with the parallel command to form the parallel for command, or separately used in the parallel block of the parallel statement. # Pragma OMP [parallel] [
Clause]
For
Loop statementFirst, let's take a look at the effect of using the for statement separately: Int J = 0; # pragma OMP for (j = 0; j <4; j ++) {printf ("j = % d, threadid = % d/N", J, omp_get_thread_num ();} run the above Code and print the following result J = 0, threadid = 0j = 1, threadid = 0j = 2, threadid = 0j = 3, threadid = 0. From the result, we can see that all four cycles are executed in one thread, it can be seen that the for command works only when combined with the parallel command. For example, the following code combines parallel and for into parallel for: Int J = 0; # pragma OMP parallel for (j = 0; j <4; j ++) {printf ("j = % d, threa Did = % d/N ", J, omp_get_thread_num ();} the following result is printed after execution: J = 0, threadid = 0j = 2, threadid = 2j = 1, threadid = 1j = 3, threadid = 3 the visible loop is allocated to four different threads for execution. The above code can also be rewritten to the following form: Int J = 0; # pragma OMP parallel {# pragma OMP for (j = 0; j <4; j ++) {printf ("j = % d, threadid = % d/N", J, omp_get_thread_num () ;}} execute the above Code and print the following results: j = 1, threadid = 1j = 3, threadid = 3j = 2, threadid = 2j = 0, threadid = 0 can also have multiple for statements in a parallel block, such as: Int J; # pragma OMP parallel {# pragma OMP for (j = 0; j <100; j ++ ){...} # Pragma OMP for (j = 0; j <100; j ++ ){... }...} In a for loop statement, it is only possible to write according to certain specifications, that is, the statements in the for loop parentheses must be written according to certain specifications, the for statement parentheses contain three statements: for (I = start; I <end; I ++) I = start; which is the first statement in the for loop, it must be written as "variable = initial value. For example, I = 0i <end; is the second statement in the for loop. This statement can be written in one of the following four forms: variable <Boundary Value variable <= Boundary Value variable> = Boundary Value variable, for example, I> 10 I <10 I> = 10 I> 10. The last statement I ++ can one of the following nine statements
I ++ II ---- II + = inci-= inci = I + Inc I = inc + II = I-Inc such as I + = 2; I-= 2; I = I + 2; I = I-2; all conform to the standard.
5. Use of sections and section commandsThe Section statement is used in the sections statement to divide the code in the sections statement into several different segments, and each segment is executed in parallel. Usage: # pragma OMP [parallel] sections [
Clause] {# Pragma OMP section {
Code block
}
}Let's take a look at the following example code: void main (INT argc, char * argv) {# pragma OMP parallel sections {# pragma OMP section printf ("section 1 threadid = % d/N", omp_get_thread_num ()); # pragma OMP section printf ("section 2 threadid = % d/N", omp_get_thread_num (); # pragma OMP section printf ("section 3 threadid = % d/N ", omp_get_thread_num (); # pragma OMP section printf ("section 4 threadid = % d/N", omp_get_thread_num ();} the following result is printed after execution: S Ection 1 threadid = 0 section 2 threadid = 2 Section 4 threadid = 3 Section 3 threadid = 1 from the results, we can find that the code in section 4th is executed earlier than that in section 3rd, the code in each section is executed in parallel, and each section is allocated to different threads for execution. When using the Section statement, note that this method must ensure that the code execution time in each section is not much different, otherwise, the execution time of a section is too long than that of other sections to achieve the parallel execution effect. The above code can also be rewritten to the following form: void main (INT argc, char * argv) {# pragma OMP parallel {# pragma OMP sections {# pragma OMP section printf ("section 1 threadid = % d/N", omp_get_thread_num ()); # pragma OMP section printf ("section 2 threadid = % d/N", omp_get_thread_num ());} # pragma OMP sections {# pragma OMP section printf ("section 3 threadid = % d/N", omp_get_thread_num ()); # pragma OMP section printf ("section 4 threadid = % d/ N ", omp_get_thread_num () ;}} the following results are printed after execution: section 1 threadid = 0 section 2 threadid = 3 Section 3 threadid = 3 Section 4 threadid = 1 the difference between this method and the previous method is that the two sections statements are executed in serial mode, that is, the code in the second sections statement can only be executed after the code in the first sections statement is executed. The for statement is automatically implemented by the system. As long as there is no time gap between each loop, the allocation is very even, using Sections to divide threads is a way of manually dividing threads, and the ultimate parallelism depends on programmers. The several OpenMP commands parallel, for, sections, and section mentioned in this article are actually used to create threads. This method of thread creation is more convenient than the traditional method of calling and creating thread functions, and more efficient. Of course, after the thread is created, the variables in the thread are shared or other methods. Is the variable defined in the main thread the same as that in the traditional thread Creation Mode after it is included in the parallel block? How is the created thread scheduled? And so on.