Note:
User Guide: http://software.intel.com/zh-cn/forums/showthread.php? T000077996&o=a&s=lr(cilk_user_guide.pdf)
This document provides some learning notes for this User Guide (Chinese version), simplifies the process, and gives you a better understanding of the code. You can refer to the original document for more details.
The main content of cilk is the three keywords cilk_spawn, cilk_sync, and cilk_for. However, in addition, we need to consider a problem related to data competition. For any parallel programming multi-thread programming, data competition is a matter of consideration. So how can we deal with the data competition for cilk programs?
The data competition of cilk programs is the same as that of other parallel programming data. The solution is to re-construct code, modify algorithms, replace global variables with local variables, and apply locks. Most of them eventually need to be locked to solve data competition. Cilk does not provide a data structure for locking, but cilk can identify some locking mechanisms of other frameworks.
Cilk can identify the following locking mechanisms:
1. The intel threading building blocks Library provides TBB: mutex for creating critical code. It is safe to update and access the shared memory and other shared resources in the code of the critical section. The intel parallel studio tool recognizes the locking mechanism. Memory Access protected by TBB: mutex does not report data competition.
2. Windows * Operating System: The functions of the critical_section object are basically the same as those of the TBB: mutex object. Intel parallel studio does not report data competition for access protected by entercriticalsection (), tryentercriticalsection (), or leavecriticalsection.
3. Linux * Operating System: POSIX * pthread mutex (pthread_mutex_t) has the same functions as TBB: mutex. Intel parallel studio does not report data competition for access protected by pthread_mutex_lock (), pthread_mutex_trylock (), or pthread_mutex_unlock.
4. The intel parallel studio tool recognizes atomic machine commands. C/C ++ programmers can use these commands through the basic functions of the compiler.
In addition, deadlocks and so on are still similar to other parallel programs in cilk. For the performance of cilk programs, in addition to the granularity, generally, problems related to parallel programs, such as lock competition, high-speed cache efficiency and memory bandwidth, pseudo-sharing, and atomic operations, may also affect the performance of cilk programs.
In addition to the above Code modification and locking methods to solve data competition, there is also a special method to solve data competition: reducer. Of course, this method can only be applied to some special data competition. For more information about CER functions, see http://blog.csdn.net/gengshenghong/article/details/7000685. Of course, this is to say that the functions are consistent. In cikl, CER is a data structure (data type), and the branch Function of OpenMP is a clause.
Reducers has the following important attributes:
Ipvcers allows non-local variables to be reliably accessed without competition.
Reducers does not need to lock, so it avoids the lock competition caused by locking non-local variables and the parallel problems caused by this.
When correctly defined and used, the serial CERs retains the serial semantics.. The results of the cilk program using ipvcers are consistent with those of the serial version. The results do not depend on the number of processors on the target machine or the scheduling of the worker thread. The use of javascers does not require significant modifications to the existing code structure.
The implementation of ipvcers is efficient.
Unlike the implementation defined in a control structure, such as a loop, the use of javascers does not depend on the control structure of the program.
Refer to http://software.intel.com/zh-cn/blogs/2010/06/25/intel-cilk-plus-reducer/and User Guide to understand the CER view.
In general, I personally think that CER is easy to understand and mainly used for "iteration" operations, such as "superposition" and "overlapping multiplication.
The following is an example of summation to understand the use of CER:
// File: test1.cpp # include <stdio. h> # include <cilk/cilk. h> # include <cilk/reducer_opadd.h> # define n000000cilk: reducer_opadd <unsigned long> sum; // you do not need to initialize the value 0. If you need to initialize other values, you need to modify the code yourself. Unsigned long sum0 = 0, sum1 = 0; int main () {// case1for (INT I = 0; I <n; I ++) {sum0 = sum0 + I;} printf ("correct sum is % d \ n", sum0); // case2cilk_for (INT I = 0; I <n; I ++) {sum1 = sum1 + I;} printf ("No reducer sum is % d \ n", sum1); // case3cilk_for (INT I = 0; I <N; I ++) {sum = sum + I;} printf ("CER sum is % d \ n", Sum. get_value (); Return 0 ;}