OpenMP parallel program design (2)

Source: Internet
Author: User
OpenMP parallel program design (2)

1. Concept of fork/join Parallel Execution Mode

2. Introduction to OpenMP commands and library functions

3. parallel command usage

4. usage of the for command

5. Use of sections and section commands

1. Concept of fork/join Parallel Execution Mode

OpenMP is a collection of compiler commands and library functions, mainly for shared storage of Parallel ComputingProgramDesigned for use.

Previous ArticleArticleHas tried a parallel for command of OpenMP. From the previous article, we can also find that the non-parallel part of OpenMP can be executed only after all the programs are executed in parallel.Code. This is the standard parallel mode fork/join-type parallel mode. The shared storage-type parallel program uses fork/join-type parallel.

The basic idea of code execution in standard parallel mode is that there is only one main thread at the beginning of the program. The serial part of the program is executed by the main thread, and the parallel part is executed by deriving other threads, however, if the parallel part does not end, the serial part will not be executed, as shown in the following code in the previous article:

Int main (INT argc, char * argv [])

{

Clock_t T1 = clock ();

# Pragma OMP parallel

For (Int J = 0; j <2; j ++ ){

Test ();

}

Clock_t t2 = clock ();

Printf ("Total time = % d \ n", t2-t1 );

Test ();

Return 0;

}

Before the code in the for loop is executed, clock_t t2 = clock (); this line of code is not executed. If it is compared with the call thread to create a function, it is equivalent to creating a thread first, and wait until the thread execution is complete. Therefore, in this parallel mode, the threads created in the main thread do not run in parallel with the main thread.

2. Introduction to OpenMP commands and library functions

The following describes the usage of the basic commands and Common commands of OpenMP,

In C/C ++, the format used by the OpenMP command is

#Pragma OMPCommand[Clause[Clause]…]

The parallel for mentioned above is an instruction. In some books, the "instruction" of OpenMP is also called "compile instruction", and the following clause is optional. For example:

# Pragma OMP parallel private (I, j)

Parallel is the instruction, and private is the clause.

To facilitate the description, the line containing the # pragma and OpenMP commands is called a statement. For example, the line above is called a parallel statement.

OpenMP Commands include the following:

ParallelBefore a code segment, it indicates that the Code will be executed in parallel by multiple threads.

ForBefore a for loop is used, the loop is allocated to multiple threads for parallel execution, and there must be no correlation between each loop.

ParallelThe combination of parallel and for statements is also used before a for loop, indicating that the code of the For Loop will be executed in parallel by multiple threads.

SectionsBefore a code segment that may be executed in parallel

Parallel sections, Parallel and sections

CriticalBefore a code critical section

SingleBefore a code segment that is only executed by a single thread, it indicates that the subsequent code segment will be executed by a single thread.

Flush,

BarrierUsed for thread synchronization of code in the parallel zone. When all threads are executed to the barrier, they must be stopped until all threads are executed to the barrier.

Atomic, Used to specify a memory area to be updated by the brake

MasterSpecifies that a block of code is executed by the main thread.

OrderedUsed to specify the sequential execution of loops in the parallel area

ThreadprivateTo specify that a variable is private to the thread.

In addition to the preceding commands, OpenMP also provides some library functions. The following lists several common library functions:

Omp_get_num_procs, Returns the number of processors that run the thread.

Omp_get_num_threadsReturns the number of active threads in the current parallel area.

Omp_get_thread_num, Returns the thread number.

Omp_set_num_threadsSets the number of threads for parallel code execution.

Omp_init_lock, Initialize a simple lock

Omp_set_lock, Lock operation

Omp_unset_lock, Unlock operation, which must be paired with the omp_set_lock function.

Omp_destroy_lock, Omp_init_lock function pair operation function, close a lock

OpenMP clauses include the following:

Private,Specify that each thread has its own private copy of the variable.

Firstprivate,Specify that each thread has its own private copy of the variable, and the variable will be inherited from the initial value in the main thread.

Lastprivate,It is mainly used to specify the value of the private variable in the thread and copy it back to the corresponding variable in the main thread after the parallel processing ends.

Reduce,It is used to specify that one or more variables are private, and these variables need to execute the specified operation after the parallel processing ends.

Nowait,Ignore the specified waiting

Num_threads,Number of threads

Schedule,Specify how to schedule for loop iteration

Shared,One or more variables are shared among multiple threads.

Ordered,Used to specify the order of for loop execution

Copyprivate,The shared variable used to specify variables in the single command for multiple threads.

Copyin,The value of a threadprivate variable must be initialized using the value of the main thread.

Default,Used to specify the usage of variables in the parallel processing area. The default value is shared.

3. parallel command usage

Parallel is used to construct a parallel block. It can also be used with other commands, such as for and sections.

In C/C ++, parallel is used as follows:

# Pragma OMP parallel [For | sections] [clause [clause]…]

{

// Code

}

The parallel statement is followed by a braces to enclose the code to be executed in parallel.

Void main (INT argc, char * argv []) {

# Pragma OMP parallel

{

Printf ("Hello, world! \ N ");

}

}

After the above code is executed, the following results are printed:

Hello, world!

Hello, world!

Hello, world!

Hello, world!

It can be seen that the code in the parallel statement has been executed four times, indicating that a total of four threads have been created to execute the code in the parallel statement.

You can also specify how many threads are used for execution. The num_threads clause must be used:

Void main (INT argc, char * argv []) {

# Pragma OMP parallel num_threads (8)

{

Printf ("Hello, world !, Threadid = % d \ n ", omp_get_thread_num ());

}

}

After you run the preceding Code, the following results are printed:

Hello, world !, Threadid = 2

Hello, world !, Threadid = 6

Hello, world !, Threadid = 4

Hello, world !, Threadid = 0

Hello, world !, Threadid = 5

Hello, world !, Threadid = 7

Hello, world !, Threadid = 1

Hello, world !, Threadid = 3

Different threadids show that eight threads are created to execute the above Code. Therefore, the parallel command is used to create multiple threads for a piece of code to execute it. Each line of code in the parallel block is repeatedly executed by multiple threads.

Compared with the traditional thread function creation, it is equivalent to repeatedly calling a thread function for a thread entry function to create a thread and wait for the thread to finish execution.

4. usage of the for command

The for command is used to allocate a for loop to multiple threads for execution. Generally, the for command can be used together with the parallel command to form the parallel for command, or separately used in the parallel block of the parallel statement.

# Pragma OMP [parallel] [Clause]

ForLoop statement

First, let's take a look at the effect of using the for statement separately:

Int J = 0;

# Pragma OMP

For (j = 0; j <4; j ++ ){

Printf ("j = % d, threadid = % d \ n", J, omp_get_thread_num ());

}

Run the above Code and print the following results:

J = 0, threadid = 0

J = 1, threadid = 0

J = 2, threadid = 0

J = 3, threadid = 0

From the results, we can see that all the four cycles are executed in one thread. It can be seen that the for command must be combined with the parallel command for the effect:

The following code combines parallel and for into parallel:

Int J = 0;

# Pragma OMP parallel

For (j = 0; j <4; j ++ ){

Printf ("j = % d, threadid = % d \ n", J, omp_get_thread_num ());

}

The following results are printed after execution:

J = 0, threadid = 0

J = 2, threadid = 2

J = 1, threadid = 1

J = 3, threadid = 3

The visible loop is allocated to four different threads for execution.

The above code can also be rewritten to the following form:

Int J = 0;

# Pragma OMP parallel

{

# Pragma OMP

For (j = 0; j <4; j ++ ){

Printf ("j = % d, threadid = % d \ n", J, omp_get_thread_num ());

}

}

After you execute the preceding Code, the following results are printed:

J = 1, threadid = 1

J = 3, threadid = 3

J = 2, threadid = 2

J = 0, threadid = 0

In a parallel block, you can also have multiple for statements, such:

Int J;

# Pragma OMP parallel

{

# Pragma OMP

For (j = 0; j <100; j ++ ){

...

}

# Pragma OMP

For (j = 0; j <100; j ++ ){

...

}

...

}

In a for loop statement, the statement must be written according to certain specifications, that is, the statements in the for loop parentheses must be written according to certain specifications. There are three statements in the for statement parentheses.

For (I = start; I <end; I ++)

I = start; is the first statement in the for loop. It must be written as "variable = initial value. For example, I = 0

I <end; is the second statement in the for loop. This statement can be written in one of the following four forms:

Variable <Boundary Value

Variable <= Boundary Value

Variable> Boundary Value

Variable> = Boundary Value

For example, I> 10 I <10 I> = 10 I> 10, etc.

The last statement I ++ can be written in one of the following nine ways:
I ++

++ I

I --

-- I

I + = inc

I-= inc

I = I + Inc

I = inc + I

I = I-Inc

For example, I + = 2; I-= 2; I = I + 2; I = I-2; all conform to the standard.

5. Use of sections and section commands

The Section statement is used in the sections statement to divide the code in the sections statement into several different segments, and each segment is executed in parallel. The usage is as follows:

# Pragma OMP [parallel] sections [Clause]

{

# Pragma OMP Section

{

Code block

}

}

Let's take a look at the following example code:

Void main (INT argc, char * argv)

{

# Pragma OMP parallel sections {

# Pragma OMP Section

Printf ("Section 1 threadid = % d \ n", omp_get_thread_num ());

# Pragma OMP Section

Printf ("Section 2 threadid = % d \ n", omp_get_thread_num ());

# Pragma OMP Section

Printf ("Section 3 threadid = % d \ n", omp_get_thread_num ());

# Pragma OMP Section

Printf ("Section 4 threadid = % d \ n", omp_get_thread_num ());

}

After execution, the following results are printed:

Section 1 threadid = 0

Section 2 threadid = 2

Section 4 threadid = 3

Section 3 threadid = 1

From the results, we can find that the code in section 4th is executed earlier than that in section 3rd, indicating that the code in each section is executed in parallel and each section is allocated to different threads for execution.

When using the Section statement, note that this method must ensure that the code execution time in each section is not much different, otherwise, the execution time of a section is too long than that of other sections to achieve the parallel execution effect.

The above code can also be rewritten to the following form:

Void main (INT argc, char * argv)

{

# Pragma OMP parallel {

# Pragma OMP sections

{

# Pragma OMP Section

Printf ("Section 1 threadid = % d \ n", omp_get_thread_num ());

# Pragma OMP Section

Printf ("Section 2 threadid = % d \ n", omp_get_thread_num ());

}

# Pragma OMP sections

{

# Pragma OMP Section

Printf ("Section 3 threadid = % d \ n", omp_get_thread_num ());

# Pragma OMP Section

Printf ("Section 4 threadid = % d \ n", omp_get_thread_num ());

}

}

After execution, the following results are printed:

Section 1 threadid = 0

Section 2 threadid = 3

Section 3 threadid = 3

Section 4 threadid = 1

The difference between this method and the previous method is that the two sections statements are executed serially, that is, the code in the second sections statement can only be executed after the code in the first sections statement is executed.

The for statement is automatically implemented by the system. As long as there is no time gap between each loop, the allocation is very even, using Sections to divide threads is a way of manually dividing threads, and the ultimate parallelism depends on programmers.

The several OpenMP commands parallel, for, sections, and section mentioned in this article are actually used to create threads. This method of thread creation is more convenient than the traditional method of calling and creating thread functions, and more efficient.

of course, after a thread is created, the variables in the thread are shared or in other ways, is the variable defined in the main thread the same as that in the conventional thread creation method after it is included in the parallel block? How is the created thread scheduled? And so on.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.