OpenMP Parallel Program Design

Source: Internet
Author: User

 

OpenMP parallel programming (1)
OpenMP is a library that supports parallel design of shared storage. It is especially suitable for parallel programming on multi-core CPUs. Today, I tried OpenMP parallel program design on a dual-core CPU machine and found that the efficiency is beyond our imagination. I will write it out and share it with you.
In the Properties dialog box of the project in vc8.0, on the "language" page under "C/C ++" under "configuration properties" in the left border, change OpenMP support to "yes/(OpenMP)" to support OpenMP.
First look at a simple use of the OpenMP Program
Int main (INT argc, char * argv [])
{
# Pragma OMP parallel
For (INT I = 0; I <10; I ++)
{
Printf ("I = % d/N", I );
}
Return 0;
}
After this program is executed, the following results are printed:
I = 0
I = 5
I = 1
I = 6
I = 2
I = 7
I = 3
I = 8
I = 4
I = 9
The content in the for loop statement is executed in parallel. (The printed results of each operation may be different)
The # pragma OMP parallel for statement is used to specify that the for loop statement is executed in parallel. Of course, the content in the for loop must be able to be executed in parallel, that is, each cycle is independent of each other, and the next cycle does not depend on the previous loop.
For details about the # pragma OMP parallel for statement and related OpenMP commands and functions, as long as you know that this statement will convert the content in the for loop to parallel execution.
Will the efficiency of converting the statements in the for loop into parallel execution be improved? I think this is what we are most concerned about. Below is a simple test program for testing:
Void test ()
{
Int A = 0;
Clock_t T1 = clock ();
For (INT I = 0; I <100000000; I ++)
{
A = I + 1;
}
Clock_t t2 = clock ();
Printf ("time = % d/N", t2-t1 );
}
Int main (INT argc, char * argv [])
{
Clock_t T1 = clock ();
# Pragma OMP parallel
For (Int J = 0; j <2; j ++ ){
Test ();
}
Clock_t t2 = clock ();
Printf ("Total time = % d/N", t2-t1 );
Test ();
Return 0;
}
The test () function executes 0.1 billion cycles, mainly used to execute a long operation.
In the main () function, call the test () function in a loop for only two cycles. Let's take a look at the running results on the dual-core CPU:
Time = 297
Time = 297
Total time = 297
Time = 297
We can see that the two test () function calls in the for loop both took 297 ms, but the total time for printing was only 297 ms, followed by the separately executed test () the time consumed by the function is also 297 Ms. It can be seen that the efficiency is doubled after parallel computing is used.

OpenMP parallel program design (2)
1. Concept of fork/join Parallel Execution Mode
OpenMP is a set of compiler commands and library functions. It is designed and used for parallel programs on shared storage computers.
In the previous article, we have tried a parallel for command of OpenMP. From the previous article, we can also find that the non-parallel part of code can be executed only after all the programs executed in OpenMP are finished. This is the standard parallel mode fork/join-type parallel mode. The shared storage-type parallel program uses fork/join-type parallel.
The basic idea of code execution in standard parallel mode is that there is only one main thread at the beginning of the program. The serial part of the program is executed by the main thread, and the parallel part is executed by deriving other threads, however, if the parallel part does not end, the serial part will not be executed, as shown in the following code in the previous article:
Int main (INT argc, char * argv [])
{
Clock_t T1 = clock ();
# Pragma OMP parallel
For (Int J = 0; j <2; j ++ ){
Test ();
}
Clock_t t2 = clock ();
Printf ("Total time = % d/N", t2-t1 );
Test ();
Return 0;
}
Before the code in the for loop is executed, clock_t t2 = clock (); this line of code is not executed. If it is compared with the call thread to create a function, it is equivalent to creating a thread first, and wait until the thread execution is complete. Therefore, in this parallel mode, the threads created in the main thread do not run in parallel with the main thread.
2. Introduction to OpenMP commands and library functions
The following describes the usage of the basic commands and Common commands of OpenMP,
In C/C ++, the format used by the OpenMP command is
# Pragma OMP Command [clause [clause]…]
The parallel for mentioned above is an instruction. In some books, the "instruction" of OpenMP is also called "compile instruction", and the following clause is optional. For example:
# Pragma OMP parallel private (I, j)
Parallel is the instruction, and private is the clause.
To facilitate the description, the line containing the # pragma and OpenMP commands is called a statement. For example, the line above is called a parallel statement.
OpenMP Commands include the following:
Parallel, used before a code snippet, indicates that the Code will be executed in parallel by multiple threads.
For, used before a for loop, the loop is allocated to multiple threads for parallel execution, must ensure that there is no correlation between each loop.
The combination of parallel for, parallel and for statements is also used before a for loop, indicating that the code of the For Loop will be executed in parallel by multiple threads.
Sections, used before code segments that may be executed in parallel
The combination of parallel sections, parallel, and sections statements
Critical, used before a code critical section
Single is used before a code segment that is only executed by a single thread. It indicates that the code segment that follows will be executed by a single thread.
Flush,
Barrier is used for thread synchronization of code in the parallel zone. It is stopped when all threads are executed to barrier until all threads are executed to barrier.
Atomic, used to specify a memory area to be updated by the brake
Master, used to specify that a block of code is executed by the main thread
Ordered, used to specify the sequential execution of loops in parallel Areas
Threadprivate: specifies that a variable is private to the thread.
In addition to the preceding commands, OpenMP also provides some library functions. The following lists several common library functions:
Omp_get_num_procs returns the number of processors that run the thread on multiple processors.
Omp_get_num_threads, returns the number of active threads in the current parallel area.
Omp_get_thread_num, returns the thread number
Omp_set_num_threads: specifies the number of threads used to execute code in parallel.
Omp_init_lock, initialize a simple lock
Omp_set_lock: Lock Operation
Omp_unset_lock: The unlock operation. It must be paired with the omp_set_lock function.
The pair operation function of the omp_destroy_lock and omp_init_lock functions to close a lock.
OpenMP clauses include the following:
PRIVATE: specifies that each thread has its own private copy of the variable.
Firstprivate: specifies that each thread has its own private copy of the variable, and the variable must be inherited from the initial value in the main thread.
Lastprivate is mainly used to specify the value of the private variable in the thread to be copied back to the corresponding variable in the main thread after the parallel processing ends.
Reduce is used to specify that one or more variables are private, and after parallel processing, these variables need to execute the specified operation.
Nowait, ignoring the specified waiting
Num_threads, specifies the number of threads
Schedule, specifying how to schedule for loop iteration
Shared, specifying one or more variables for sharing among multiple threads
Ordered, used to specify that the for loop is executed in order
Copyprivate, used to specify variables in the single command to share variables for multiple threads
Copyin, used to specify the value of a threadprivate variable to be initialized using the value of the main thread.
Default is used to specify the usage of variables in the parallel processing area. The default value is shared.
3. parallel command usage
Parallel is used to construct a parallel block. It can also be used with other commands, such as for and sections.
In C/C ++, parallel is used as follows:
# Pragma OMP parallel [For | sections] [clause [clause]…]
{
// Code
}
The parallel statement is followed by a braces to enclose the code to be executed in parallel.
Void main (INT argc, char * argv []) {
# Pragma OMP parallel
{
Printf ("Hello, world! /N ");
}
}
After the above code is executed, the following results are printed:
Hello, world!
Hello, world!
Hello, world!
Hello, world!
It can be seen that the code in the parallel statement has been executed four times, indicating that a total of four threads have been created to execute the code in the parallel statement.
You can also specify how many threads are used for execution. The num_threads clause must be used:
Void main (INT argc, char * argv []) {
# Pragma OMP parallel num_threads (8)
{
Printf ("Hello, world !, Threadid = % d/N ", omp_get_thread_num ());
}
}
After you run the preceding Code, the following results are printed:
Hello, world !, Threadid = 2
Hello, world !, Threadid = 6
Hello, world !, Threadid = 4
Hello, world !, Threadid = 0
Hello, world !, Threadid = 5
Hello, world !, Threadid = 7
Hello, world !, Threadid = 1
Hello, world !, Threadid = 3
Different threadids show that eight threads are created to execute the above Code. Therefore, the parallel command is used to create multiple threads for a piece of code to execute it. Each line of code in the parallel block is repeatedly executed by multiple threads.
Compared with the traditional thread function creation, it is equivalent to repeatedly calling a thread function for a thread entry function to create a thread and wait for the thread to finish execution.
4. usage of the for command
The for command is used to allocate a for loop to multiple threads for execution. Generally, the for command can be used together with the parallel command to form the parallel for command, or separately used in the parallel block of the parallel statement.
# Pragma OMP [parallel] for [clause]
For Loop statement
First, let's take a look at the effect of using the for statement separately:
Int J = 0;
# Pragma OMP
For (j = 0; j <4; j ++ ){
Printf ("j = % d, threadid = % d/N", J, omp_get_thread_num ());
}
Run the above Code and print the following results:
J = 0, threadid = 0
J = 1, threadid = 0
J = 2, threadid = 0
J = 3, threadid = 0
From the results, we can see that all the four cycles are executed in one thread. It can be seen that the for command must be combined with the parallel command for the effect:
The following code combines parallel and for into parallel:
Int J = 0;
# Pragma OMP parallel
For (j = 0; j <4; j ++ ){
Printf ("j = % d, threadid = % d/N", J, omp_get_thread_num ());
}
The following results are printed after execution:
J = 0, threadid = 0
J = 2, threadid = 2
J = 1, threadid = 1
J = 3, threadid = 3
The visible loop is allocated to four different threads for execution.
The above code can also be rewritten to the following form:
Int J = 0;
# Pragma OMP parallel
{
# Pragma OMP
For (j = 0; j <4; j ++ ){
Printf ("j = % d, threadid = % d/N", J, omp_get_thread_num ());
}
}
After you execute the preceding Code, the following results are printed:
J = 1, threadid = 1
J = 3, threadid = 3
J = 2, threadid = 2
J = 0, threadid = 0
In a parallel block, you can also have multiple for statements, such:
Int J;
# Pragma OMP parallel
{
# Pragma OMP
For (j = 0; j <100; j ++ ){
...
}
# Pragma OMP
For (j = 0; j <100; j ++ ){
...
}
...
}
In a for loop statement, the statement must be written according to certain specifications, that is, the statements in the for loop parentheses must be written according to certain specifications. There are three statements in the for statement parentheses.
For (I = start; I <end; I ++)
I = start; is the first statement in the for loop. It must be written as "variable = initial value. For example, I = 0
I <end; is the second statement in the for loop. This statement can be written in one of the following four forms:
Variable <Boundary Value
Variable <= Boundary Value
Variable> Boundary Value
Variable> = Boundary Value
For example, I> 10 I <10 I> = 10 I> 10, etc.
The last statement I ++ can be written in one of the following nine ways:
I ++
++ I
I --
-- I
I + = inc
I-= inc
I = I + Inc
I = inc + I
I = I-Inc
For example, I + = 2; I-= 2; I = I + 2; I = I-2; all conform to the standard.
5. Use of sections and section commands
The Section statement is used in the sections statement to divide the code in the sections statement into several different segments, and each segment is executed in parallel. The usage is as follows:
# Pragma OMP [parallel] sections [clause]
{
# Pragma OMP Section
{
Code block
}
}
Let's take a look at the following example code:
Void main (INT argc, char * argv)
{
# Pragma OMP parallel sections {
# Pragma OMP Section
Printf ("Section 1 threadid = % d/N", omp_get_thread_num ());
# Pragma OMP Section
Printf ("Section 2 threadid = % d/N", omp_get_thread_num ());
# Pragma OMP Section
Printf ("Section 3 threadid = % d/N", omp_get_thread_num ());
# Pragma OMP Section
Printf ("Section 4 threadid = % d/N", omp_get_thread_num ());
}
After execution, the following results are printed:
Section 1 threadid = 0
Section 2 threadid = 2
Section 4 threadid = 3
Section 3 threadid = 1
From the results, we can find that the code in section 4th is executed earlier than that in section 3rd, indicating that the code in each section is executed in parallel and each section is allocated to different threads for execution.
When using the Section statement, note that this method must ensure that the code execution time in each section is not much different, otherwise, the execution time of a section is too long than that of other sections to achieve the parallel execution effect.
The above code can also be rewritten to the following form:
Void main (INT argc, char * argv)
{
# Pragma OMP parallel {
# Pragma OMP sections
{
# Pragma OMP Section
Printf ("Section 1 threadid = % d/N", omp_get_thread_num ());
# Pragma OMP Section
Printf ("Section 2 threadid = % d/N", omp_get_thread_num ());
}
# Pragma OMP sections
{
# Pragma OMP Section
Printf ("Section 3 threadid = % d/N", omp_get_thread_num ());
# Pragma OMP Section
Printf ("Section 4 threadid = % d/N", omp_get_thread_num ());
}
}
After execution, the following results are printed:
Section 1 threadid = 0
Section 2 threadid = 3
Section 3 threadid = 3
Section 4 threadid = 1
The difference between this method and the previous method is that the two sections statements are executed serially, that is, the code in the second sections statement can only be executed after the code in the first sections statement is executed.
The for statement is automatically implemented by the system. As long as there is no time gap between each loop, the allocation is very even, using Sections to divide threads is a way of manually dividing threads, and the ultimate parallelism depends on programmers.
The several OpenMP commands parallel, for, sections, and section mentioned in this article are actually used to create threads. This method of thread creation is more convenient than the traditional method of calling and creating thread functions, and more efficient.
Of course, after the thread is created, the variables in the thread are shared or other methods. Is the variable defined in the main thread the same as that in the traditional thread Creation Mode after it is included in the parallel block? How is the created thread scheduled? And so on.

 

 

----------------------

Turn from (http://blog.csdn.net/drzhouweiming/archive/2006/08/28/1131537.aspx), learn from yourself. This is quite useful!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.