Recently looked at the miscellaneous, picked some people's notes! With the increasing popularity of multicore, more and more programs will be parallelized by multithreading in a way to improve performance. However, writing the correct multithreaded program has always been a very difficult task, and the use of the volatile keyword is one of the typical examples. Volatile in C/s + + is not typically used for multithreaded synchronization in C/s + +, if you want to declare a variable as volatile, it is equivalent to telling the compiler that the variable is "volatile" and that he can be modified anywhere else at any time. So the compiler cannot make any changes to it: every time it reads and writes the variable, it must operate directly on its memory address, and so the operation of the variable must be executed strictly in the order specified in the program. For example, one of the most common performance optimizations that compilers make is to cache variables that are frequently read in registers to increase access speed. However, if the value of the variable can be changed at any time outside the chip, then it is possible that the cached value is not the latest value of the variable, resulting in a run error. In this case, it is necessary to use the volatile keyword to modify the variable to ensure that the compiler does not perform any cache optimizations for the read-write operation of the variable. Another example is memory-mapped I/O operations. As shown in the following code: int *p = get_io_address (); int A, b; A = *p; B = *p; P is a pointer to the hardware I/O port, and the value of the port changes after each read operation. The program continuously reads the port two times, assigning two different values to A and B, respectively. If you do not declare a and B as volatile, the compiler may be "smart" to think that the values read from P two times are the same, thus optimizing *b=*p to B = A, resulting in a program error. Although the volatile keyword for this "variable" read and write operation can be protected, but he is not suitable for multi-threaded programs to share variables in the synchronization operation. The root of the problem is that there is no semantic meaning of the volatile atom and sequence in the C + + standard. Atomicity Here is an example of atomicity. i++ This seemingly atomic statement actually has three operations: read the value from the memory address into the register, the value in the register to add 1 operation, and then write the new value back into memory, it is because i++ is not atomic, so if two threads i++ operation at the same time will still produce a data race, resulting in the final value of I is not equal to 2. In this case, the volatile keyword in C + + is simply not able to provide any guarantee of the atomicity of the operation. volatile int i=0;//thread 1i++;//thread 2i++; sequential Unfortunately, the volatile keyword in the C + + standard now does not provide any assurance as to the order of shared variable operations. Taking the Dekker algorithm in this article as an example: when two threads execute Dekker1 and DEKKER2 functions respectively, the program implements two threads mutually exclusive access to the shared variable turn in the critical section by reading and writing to FLAG1/2 and Gcounter. The key to this algorithm is that the read and write operations of FLAG1/2 and turn are performed after their write operations, so it can ensure that dekker1 and dekker2 are mutually exclusive to Gcounterde operations, which is equivalent to putting gcounter++ into a critical section. The Dekker algorithm is as follows: Volatile int flag1 = 0; Volatile int flag2 = 0; Volatile int turn = 1; Volatile int gcounter = 0; Void dekker1 () { Flag1 = 1; Turn = 2; while (Flag2 = = 1) && (turn = = 2)) {}//Enter the critical section gcounter++; Flag1 = 0; Leave the critical section} void dekker2 () { Flag2 = 1; Turn = 2; while ((Flag1 = = 1) && (turn = = 2) {}//enters the critical section gcounter++; flag2 = 0; Leave the critical section} although volatile specifies that the compiler cannot order optimizations for all operations of the same variable, it does not prevent the compiler from ordering optimizations for operations between different volatile variables. For example, the compiler might refer to Flag2 read operations in Dekker1 before Flag1 and turn writes, causing mutually exclusive access to the critical section to be invalidated, and eventually the gcounter++ operation would result in a data race. In fact, even if the compiler does not do any optimizations for this program, the volatile keyword does not prevent the multi-core CPU from ordering optimizations for that program. For common x86 hardware, the store x--àload y for different variables, x, Y, is ordered to be scrambled, and the load Y operation is mentioned before the store x operation. In this case, the Flag2 read operation in Dekker1 is still likely to be referred to Flag1 and turn before the write operation, resulting in an incorrect calculation result. So why do compilers and multicore CPUs do this sort of chaos optimization on multithreaded programs? Since there is no dependency between the read and write operations of Flag1 and Flag2,turn from a single-core perspective, using the compiler/cpu can of course optimize them in order to hide part of the memory access delay, thus making better use of the pipeline in the CPU. In other words, such optimizations are not wrong from a single thread, but violate the multithreaded semantics that are expected when designing this multithreaded algorithm. To solve this problem, we need to solve this problem, we need to add the memory fence to explicitly guarantee the order, or simply do not implement such an algorithm, instead of using a lock operation like Pthread_mutex_lock to achieve mutually exclusive access. In general, since there is no semantics for volatile addition in the existing C/s + + standard, it is wrong to use volatile for multi-threaded synchronization in most C + + programs. In fact, we want to use volatile variables to synchronize, simply because the lock, condition variables, such as the cost is too large, so want to have a lightweight, efficient synchronization mechanism. |