Study Notes:openmp Gramma and Notes

Source: Internet
Author: User

1/OPENMP is just the extension of the compiler, annotated with the #pragma directive (compiler Guidance command). If it is not parallel, the compiler ignores it and executes the code in parallel without error. The effect is that you can easily parallel a piece of code without major changes.

The biggest difference between 2/MIMD and SIMD is that MIMD means using multi-core and SIMD is the same core.

3/using OpenMP requires adding-FOPENMP to gcc compiler and including header files if required to use built-in run variables

4/

All the compiler guidance directives for OpenMP are prefaced with #pragma omp (note that it is only the beginning, which requires instructions or commands to form a guidance command), and then a new instruction or command can be added separately.

The directive format for OpenMP is:

#pragmaomp Parallel [for | sections] [clause [clause] ...]

Parallel is used to open a parallel domain. After parallel, you need to enclose the parallel domain using curly braces {}. You can use the num_threads (x) clause to specify the number of threads.

However, if you just open the parallel domain and do not share the task, it is wasteful of the value of parallel computing, so it is necessary to share the task statement. Task sharing statements have for and sections. Where the For statement is only applicable to loops, when there is no cyclic dependency on the loop, the use of the For statement for task sharing can divide the number of loops within a certain range to one thread, and the for statement has its own schedule clause. Used to specify a different policy to assign the number of loops to a thread: static (how many per thread can be estimated beforehand), dynamic (the number of iterations that is allocated first, size determines each allocation), guided (decrement allocation, size specified to no longer decrement), Runtime (see the decision to use any of the above).

Static is the system default dispatch mode for task assignments. When static does not specify size (the number of iterations per thread), the default size is (the total of iterations/total number of threads), and if it is not an integer, multithreading the lines of the preceding numbers to magnanimous.


No matter which line enters upgradeable is started, the thread with ID 0 within the team will always execute 0,1,2,3 corresponding iteration, the thread with ID 1 within the team will always execute 4,5,6,7 corresponding iteration, the thread with ID 2 within the team will always execute 8,9,10,11 thread.

Let's talk about another task sharing statement: sections

Sections and for ordered are required to start with the description sections (ordered) and then explain the section (ordered)

The task assignment that differs from for is that the computer is automatically hand-provisioned, and the task assignment for sections is manual. The following rules apply:

#pragma omp parallel sections{

#pragma omp section{

。。。

}

#pragma omp section{

。。。

}

}

This example is very good:

Sections guarantees that each section is run by a different thread. But the sections is serial. Note that this approach needs to ensure that the code execution time in each section is not very small, otherwise the section execution time is longer than the other section, resulting in other threads idle waiting. With a for statement to share the workload is automatically divided by the system, as long as there is no time difference between each cycle, then the allocation is more uniform, using section to divide the thread is a manual division of work, the final load balance depends on the programmer.

In addition, clauses have two major functions: ① synchronization (prevent competition) ② planning data

There are two main ways of synchronizing:

(1) using the thread lock

There are three types of thread locks: critical lock, Atomic atomic lock, and lock () mutex

I each thread enters into the critical protected area, and only one thread executes the critical section code at any time. Critical blocks are not nested, but the parallel domain is possible, so not only the master thread can create new threads.

The memory update (read, update, and write) portionoccurs without thread context being switched.

#pragma omp Critical (global_name)

Critical can be a piece of code, but atomic can only be a single statement.

II Atomic Lock: Use atomicity, either not executed or executed, to ensure the atomicity of a statement that can be converted to a machine language to guarantee the consistency of the variable for all threads.

The entire statement occurs without thread context beingswitched.

III Mutual exclusion Lock

The programmer must ensure that the corresponding lock is released after invoking the appropriate lock operation, otherwise it may cause the deadlock of the multi-threaded procedure.

Mutexes are the use of APIs to synchronize:

void Omp_init_lock (Omp_lock *) initializes the mutex

void Omp_destroy_lock (Omp_lock *) destroys the mutex

void Omp_set_lock (Omp_lock *) obtains the mutex

void Omp_unset_lock (Omp_lock *) Release Mutex

BOOL Omp_test_lock (Omp_lock *) attempts to obtain the mutex if it succeeds returns true, otherwise false

The method of use is first initialized, then obtained, released. Between getting and releasing is equivalent to the critical part.


(2) Use of roadblocks, (single/master) or ordered blocks

Barricade:

Synchronizes the barrier lock (barrier) in OpenMP. The thread must wait until all the threads in the parallel region have reached the same point before encountering the roadblock to continue executing the following code. There is an implied synchronization roadblock at the end of each parallel domain and task-sharing domain, and the thread groups that perform this parallel domain/task-sharing domain need to synchronize all the threads of the parallel domain before executing the region code. That is, at the end of parallel, for, sections (note sections instead of section) and single, there is an implicit roadblock.

When you need to explicitly insert barrier, you need to use:

#pragma omp barrier

On the contrary, there is no need to use roadblocks (not necessary) in order to accelerate, you can use the NOWAIT clause to remove the hidden roadblocks.

such as: #pragma omp for nowait

Ordered vs Critical:

Ordered are used in loops, as in schedule. The purpose of the ordered is to ensure that a section of the loop Body contains code that is executed in the order in which the loop variable is incremented (so that at the same time there is only one thread entering the ordered body), but the critical is different, just to make sure that only one thread enters the critical section at a time, but does not ensure the order. If you want to use #pragma omp ordered in the loop body then the for of the loop body must be: #pragmaomp parallel for ordered

Single vs Master:

Used to specify that a piece of code is executed by the main thread. The master guidance instruction is similar to the single guidance directive, except that the code snippet contained in the master guidance instruction is executed only by the main thread, while the code snippet contained by the single guidance instruction can be executed by either thread, and the master guidance instruction is not implicitly synchronized at the end. You cannot specify NOWAIT clauses.

The single-threaded execution of the command specifies that the included code is executed by only one thread, and the other thread skips the code. If there is no nowait clause, all threads are synchronized at the end of the single guidance instruction with an implicit synchronization point. If the single guidance command has a nowait clause, the other thread executes directly down, not at the implicit synchronization point, and the one command is used before a section of code that is executed only by an individual thread, indicating that the subsequent code snippet will be executed single-threaded.

A different function of the clause: planning data

There are several plans for the data:

1 shares (shared)

Shared variables are stored in the shared memory of the content, in a multi-threaded environment, read and write shared memory is required to use synchronous lock, otherwise the competition will make the content uncertain. In addition, the comparison recommendations and common processing methods are: To change the shared variable into a private variable to read and write. If the shared variable is written directly in the parallel domain without lock protection, there is a data competition problem, which can result in unpredictable abnormal results. If the shared data enters the parallel domain as a parameter of the private, firstprivate, lastprivate, threadprivate, reduction clauses, it becomes a thread-private and does not require lock protection.

2 per-thread private in a parallel domain (private)

It is important to note that the loop variables within the For loop are private to each thread, and the variables in the loop body are also private to each thread. Even if there is a shared variable with the same name in the parallel domain, shared variables do not have any effect in the parallel domains, and the shared variables outside are not manipulated within the parallel domain. * Variables appearing in the reduction clause cannot appear in the private clause.

The variable k before the for loop and the variable K in the loop area are actually two different variables. The initial value of a private variable declared with the private clause is undefined at the entrance to the parallel domain, and it does not inherit the value of the shared variable with the same name.

So how do you inherit the value of a shared variable and then become a private variable?

Use the firstprivate clause.

Note that the value of the original shared variable does not change when using only firstprivate. Because the changes are still copied over the private variables.

Of course, if you need to update the last private variable value to a shared variable of the same name, add another lastprivate.


3 global per-thread private (threadprivate)

The private variable is invalidated after exiting the parallel domain, while the threadprivate thread-specific variable can maintain continuity between the front and back multiple parallel domains. (Copyin is used to copy the value of the threadprivate variable in the main thread into the threadprivate variable of each thread that executes the parallel domain, making it easy for all threads to access variable values in the main thread.) and copyprivate broadcasts the value of a thread-private variable to another thread that executes the same parallel domain. The copyprivate clause can be associated with a single construct, which completes the broadcast work before the barrier of the single structure arrives. )

4. Operation of the vesting (reduction)

Each thread will create a private copy of the parameter entry, and at the end of the parallel domain or task sharing domain, the value of the private copy will be calculated using the specified run operator, and the original parameter entry is updated with the value of the result of the operation. Lists the operators that can be used for the reduction clause and the default initial values for the corresponding private copy variables, the actual initial value of the private copy variable depends on the data type of the reduction variable: + (0),-(0), * (1), & (~0), | (0), ^ (0), && (1), | | (0).

Study Notes:openmp Gramma and Notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.