Function: A private copy for each list variable is created for each thread. at the end of the operation, the Operation ction variable is applied to all private copies of the shared variable, and the final result is written to the global
Shared variable.
The function clause specifies an operator for the variable. Each thread creates a private copy of the function variable. At the end of the OpenMP region, the private copy value of each thread is iterated through the specified operator and assigned to the original variable.
The syntax of function is recutioin (OPERATOR: list). Unlike other data attribute clauses, the operator parameter is added. Since an iterative operation is performed at the end, not all operators can be used as the callback parameter. In addition, an initial value is required for iterative operations. Not all operators need to have the same initial value. Generally, the initial values of common operators are: + (0), * (1),-(0 ),&~ (0), | (0), ^ (0), & (1), | (0), of course, this is not necessary, such as the initial value of the superposition operation, it can be any value, but the meaning of the expression is different, but for some operators, some initial values are meaningless. For example, if the initial value of multiplication iteration is 0, the result must be 0!
A typical example of using the function is the superposition (SUM) operation:
#include <omp.h> #define COUNT 10int main(int argc, _TCHAR* argv[]) {int sum = 100;// Assign an initial value.#pragma omp parallel for reduction(+:sum)for(int i = 0;i < COUNT; i++) {sum += i;}printf("Sum: %d\n",sum);return 0; }
In this example, sum 0 to count. Because the initial value is 100, a 100 value is added. If you only want to sum, you only need to set the initial value to 0. Using limit ction can avoid data competition and change the count in the preceding example to a relatively large value. If we do not use limit ction, we will find that data competition leads to inconsistent results, after using callback, you can get the correct result every time.
The usage of function is relatively simple. We need to understand the "Initial Value" mentioned above. The first understanding is the initial value of 100, which is the initial value outside the parallel area, when the iteration results are finally calculated, there is an implicit initial value, that is, we know that when callback is used, each thread constructs a thread copy of the callback variable, so what is the value? From the above example, we can see that the initial value is 0. If the initial value is 100, the result should be 100 and the number of threads will be added.How to determine the initial valueAs mentioned above:+ (0), * (1),-(0 ),&~ (0), | (0), ^ (0), & (1), | (0).
Therefore, understand the workflow of function:
(1) After Entering the parallel area, each new thread in the team constructs a copy of the callback variable. For example, in the preceding example, if there are four threads, the initialization values for entering the parallel area are: sum0 = 100, sum1 = sum2 = sum3 = 0. why is sum0 100? Because the main thread is not a new thread, there is no need to construct a copy of the main thread (the official statement is not found, but in terms of understanding, it should work like this, only one thread will use the initial value outside the parallel zone, and the rest will be 0 ).
(2) Each thread uses its own copy variable for computing.
(3) When exiting the parallel area, use the specified operator to iterate the copy variables of all threads. For the above example, that is, sum '= sum0' + sum1' + sum2' + sum3 '.
(4) Assign the iteration result to the original variable (SUM), sum = Sum '.
Note:
Functions can only be used for scalar types (INT, float, and so on );
Function is only used in a region or work-sharing structure. In this region, function variables can only be used in statements similar to the following:
x = x op expr x = expr op x (except subtraction) x binop = expr x++ ++x x-- --x
Note: after testing, compiling and Running won't be a problem if it does not comply with this rule. Some may even explain why the result is like this, but no matter what, generally, when using callback, there are some iterations, And the semantics should be clear. Let's take a look at the following "error" Example:
#define COUNT 10int main(int argc, _TCHAR* argv[]) {int sum = 100;// Assign an initial value.#pragma omp parallel for reduction(+:sum)for(int i = 0;i < COUNT; i++) {sum += i;sum = 1;}printf("Sum: %d\n",sum);return 0; }
The output is 104 (4-core machine ). In this example, sum = 1; this expression should not appear, but if so, compilation and running are okay, and the result is even expected. After each thread finishes computing, its sum value is 1, four threads, and the initial value is 100. Therefore, the final result is 104. :) In any case, even if it can be explained, I believe there are no such cases. At least, do not rely on such implementation results. From the example of this error, in turn, I found that the above process about "Understanding functions" does not seem completely correct. The first step, after entering the parallel area, the initial value is "sum0 = 100, sum1
= Sum2 = sum3 = 0 ". In this case, it is only an initial value. After calculation, in this example, the sum of all threads is 1 and the result should be 4. So it seems that the actual understanding is that the main thread will also create a copy variable, and its initial value is also 0. During the last iteration, is calculated using the original sum value and a copy of each thread. The process is as follows:
(1) sum = 100
(2) enter the parallel area and create four copies of four threads: sum0 = sum1 = sum2 = sum3 = 0;
(3) After the calculation is complete, get sum0', sum1', sum2', sum3'
(4) calculate sum, sum = sum op sum 0 'Op sum1' op sum2 'Op sum3 '.
In short, it doesn't matter how the compiler is implemented. The key is to understand how the function works.