An overview
This article belongs to "Java Concurrent Programming Art" Reading notes series, continue the third chapter Java memory model.
Double sort
2.1 Data dependencies
If two operations access the same variable, and one of the two operations is a write operation, there is a data dependency between the two operations. The following three types of data dependencies are:
| name |
code example |
description |
| write after read |
a = 1;b = A; |
After writing a variable, read the position again. |
| write after |
a = 1;a = 2; |
After writing a variable, write the variable. |
| read-write |
a = b;b = 1; |
After reading a variable, write the variable. |
In the above three cases, the execution result of the program will be changed as long as the order of execution of the two operations is reordered.
As mentioned earlier, the compiler and the processor may reorder the operations. When the compiler and processor are reordered, data dependencies are observed, and the compiler and processor do not change the order in which the two operations exist that have data dependencies.
Note that the data dependencies described here are only for sequences of instructions executed in a single processor and for operations performed in a single thread, and data dependencies between different processors and between different threads are not considered by the compiler and processor.
2.2as-if-serial semantics
As-if-serial semantics means: No matter how reordering (compilers and processors in order to improve parallelism), the execution results of (single-threaded) programs cannot be changed. Compilers, runtime, and processors must adhere to as-if-serial semantics.
To comply with as-if-serial semantics, the compiler and processor do not reorder operations that have data dependencies, because this reordering alters the execution result. However, if there is no data dependency between the actions, these operations may be reordered by the compiler and the handler.
Here the author gave an example of calculating the area:
Double pi = 3.14; adouble R = 1.0; bdouble area = Pi * R * r; C
A, b interaction does not affect the final result. Develop the degree of parallelism as much as possible without changing the results of the program execution. Compilers and processors follow this goal, and from the definition of happens-before we can see that JMM also follow this goal.
The effect of 2.3 reordering on multithreading
The author gives an example of the effect of reordering on multithreading, which is not posted here for reference to the original book, only to write conclusions.
In a single-threaded program, reordering of existing control-dependent operations does not alter the execution result (which is also why as-if-serial semantics allow for reordering of operations that have control dependencies), but in multithreaded programs, the execution of a control-dependent operation may change the result of a program.
Three-order consistency
3.1 Data competition and sequential consistency assurance
Data contention exists when the program is not synchronized correctly. The Java memory Model specification defines the competition for data as follows:
- Write a variable in a thread,
- Read the same variable on another thread,
- and write and read are not sorted by synchronization.
JMM the following guarantees for the memory consistency of correctly synchronized multithreaded threads:
- If the program is synchronized correctly, the execution of the program will have sequential consistency (sequentially consistent) – that is, the execution result of the program is the same as that of the program in the sequential consistent memory model (as we will see now, this is a very strong guarantee for programmers). The synchronization here refers to the generalized synchronization, including the correct use of common synchronization primitives (Synchronized,volatile and final).
3.2 Sequential consistent memory model
The sequential consistent memory model has two major features:
- All operations in a thread must be executed in the order of the program.
- (regardless of whether the program is synchronized) all threads can see only a single order of operation execution. In the sequential consistent memory model, each operation must be atomically executed and immediately visible to all threads.
Conceptually, the sequential consistency model has a single global memory that can be connected to any thread by a switch that swings around. At the same time, each thread must perform a memory read/write operation in the order of the program. As we can see, at most one thread can be connected to memory at any point in time. When multiple threads are executing concurrently, the switch device in the diagram can serialize all the memory read/write operations of all threads.
3.3 Sequential consistency effect of synchronization programs
Here we reorderexample the previous example program with a monitor to see how the correctly synchronized programs have sequential consistency.
Class Synchronizedexample {int a = 0;boolean flag = false;public synchronized void writer () {//get lock a = 1; Flag = true;} Release lock public synchronized void Reader () {//Get lock if (flag) { int i = A; ... } Release Lock }
In the example code above, suppose a thread executes the writer () method, and the B thread executes the reader () method. This is a properly synchronized multithreaded program. According to the JMM specification, the execution result of the program will be the same as the execution result of the program in the sequential consistency model. Here is a comparison of the execution timing of the program in two memory models:
In the sequential consistency model, all operations are executed serially in the order of the program. In JMM, the code within the critical section can be reordered (but JMM does not allow the code in the critical section to "Escape" beyond the critical section, which destroys the semantics of the monitor). JMM will do some special processing at two key points when exiting the monitor and entering the monitor, allowing the thread to have the same memory view as the sequential consistency model at both points of time (as explained in detail). Although thread A is reordered within a critical section, thread B cannot "observe" the reordering of thread A in the critical section because of the nature of the monitor's mutex execution. This reordering not only improves the efficiency of execution, but also does not change the execution result of the program. From here we can see the basic policy of JMM in concrete implementation: without changing (correctly synchronized) program execution results, as far as possible for the compiler and processor optimization open the door.
3.4 Execution characteristics of unsynchronized programs JMM does not guarantee that the execution result of the unsynchronized program is consistent with the execution result of the program in the sequential consistency model. Because the unsynchronized program executes in the sequential consistency model, it is unordered in its entirety and its execution results are unpredictable. It is meaningless to ensure that the results of the unsynchronized program execution in two models are consistent.
JMM does not guarantee that read/write operations on 64-bit long and double variables are atomic, and the sequential consistency model guarantees atomicity for all memory read/write operations. In a computer, data is passed between the processor and memory through the bus. Each time the data transfer between the processor and the memory is done through a series of steps called Bus transaction. The bus transaction consists of a read transaction (transaction) and a write transaction (write transaction). A read transaction transmits data from memory to the processor, the write transaction transmits data from the processor to memory, and each transaction reads/writes one or more physically contiguous words in memory. The key here is that the bus synchronizes transactions that attempt to use the bus concurrently. The bus prevents all other processors and I/O devices from performing memory read/write during the execution of a bus transaction on one processor. Let's use one to illustrate how the bus works:
These mechanisms of the bus can execute all processor-to-memory accesses in a serialized manner, and at any point in time, only one processor can access the memory. This feature ensures that memory read/write operations in a single bus transaction are atomic.
On some 32-bit processors, there is a significant overhead if a write operation to 64-bit data is required to be atomic. In order to take care of this processor, the Java language Specification encourages, but does not impose, the JVM's write to the 64-bit long variable and the double type variable to be atomic. When the JVM is running on such a processor, a write operation of a 64-bit long/double variable is split into two 32-bit writes to execute. These two 32-bit writes may be assigned to different bus transactions, at which point the write to this 64-bit variable will not be atomic. Starting with the JSR-133 memory model (that is, starting with JDK5), only the write operation of a 64-bit long/double variable can be split into two 32-bit writes to execute, and any read operation in the JSR- All 133 must be atomic (that is, any read operation must be performed in a single read transaction).
Memory semantics of four volatile
- Visibility. To read a volatile variable, you can always see (any thread) the last write to the volatile variable.
- Atomicity: The read/write of any single volatile variable is atomic, but a composite operation similar to volatile++ is not atomic.
4.1volatile Writing-Reading established happens before relationship
From the memory semantics point of view, the volatile write and the release of the lock have the same memory semantics, which is equivalent to exiting the synchronization block, and the volatile read has the same memory semantics as the acquisition of the lock, which is equivalent to entering the synchronous code block.
The author gives an example, assuming thread B executes the reader () method after thread A executes the writer () method.
Class Volatileexample { int a = 0; Volatile Boolean flag = false; public void writer () { a = 1; 1 flag = true; 2 } public void Reader () { if (flag) { //3 int i = A; 4 ...}}
Here a thread writes a volatile variable, and the B thread reads the same volatile variable. A thread all visible shared variables before the volatile variable is written, and immediately becomes visible to the B thread after the B thread reads the same volatile variable.
4.2volatile write-Read memory semantics
The memory semantics for volatile writes are as follows:
- When a volatile variable is written, jmm flushes the shared variable in the local memory corresponding to the thread to main memory.
The memory semantics for volatile reads are as follows:
- When a volatile variable is read, JMM will place the local memory corresponding to that thread as invalid. The thread next reads the shared variable from the main memory.
After thread A writes the flag variable, the values of the two shared variables that were updated by thread A in local memory A are flushed to main memory. After reading the flag variable, local memory B has been set to invalid. At this point, thread B must read the shared variable from main memory. The read operation of thread B will cause the values of local memory B and shared variables in main memory to become consistent.
The following is a summary of the memory semantics for volatile and volatile reads:
- Thread A writes a volatile variable, essentially thread A sends a message to a thread that is going to read the volatile variable (which modifies the shared variable).
- Thread B reads a volatile variable, essentially thread B receives a message from a previous thread (modified to a shared variable before writing the volatile variable).
- Thread A writes a volatile variable, and then thread B reads the volatile variable, which is essentially a thread A sends a message to thread B through main memory.
Implementation of 4.3volatile memory semantics
In the previous article we mentioned that the overloaded sort is divided into compiler reordering and handler reordering. In order to implement volatile memory semantics, JMM restricts the reordering types of these two types separately. The following is a list of volatile reordering rules jmm for the compiler:
| Is it possible to reorder |
A second action |
| First action |
General Read/write |
Volatile read |
Volatile write |
| General Read/write |
|
|
NO |
| Volatile read |
NO |
NO |
NO |
| Volatile write |
|
NO |
NO |
here is the sequence of instructions generated by the volatile write inserted into the memory barrier under the Conservative policy:
The memory barrier insertion strategy described above for volatile write and volatile reads is very conservative. In actual execution, the compiler can omit unnecessary barriers as long as the volatile write-read memory semantics are not changed.
Because different processors have different "tightness" processor memory models, the insertion of memory barriers can continue to be optimized based on the specific processor memory model. In the case of the x86 processor, the other barriers are omitted except for the final storeload barrier.
4.4jsr-133 why to enhance the volatile memory semantics
So in the old memory model, the volatile write-read does not release the lock-the memory semantics that are obtained. To provide a mechanism to communicate with more lightweight threads than locks, the JSR-133 Expert Group decided to enhance the memory semantics of volatile: strictly restricting the compiler and processor to reorder volatile variables and ordinary variables, ensuring volatile write-read and lock release-get the same, Have the same memory semantics. From the compiler reordering rules and the processor memory barrier insertion strategy, as long as the reordering between the volatile variable and the normal variable can break the volatile memory semantics, this reordering is suppressed by the compiler collation and the processor memory barrier insertion policy.
Because volatile only guarantees that the read/write of a single volatile variable is atomic, the mutex execution of the lock ensures that the execution of the entire critical section code is atomic. In functionality, locks are more powerful than volatile, and volatile is more advantageous in scalability and execution performance. Here are the conditions for using volatile:
The write to the variable does not depend on the current value of the variable, and the access variable does not need to be locked.
Memory semantics of five locks
Contrast lock Release-the memory semantics obtained are the same as for volatile write-read memory semantics. Not duplicated.
Lock memory Semantics implementation: Refer to the Reentrantlock I have previously collated.
5.1CAS
In this paper, the Java Compareandset () method call is referred to as CAs. The JDK document describes the method as follows: If the current state value equals the expected value, the synchronization state is atomically set to the given update value. This operation has volatile read and write memory semantics. About CAs can also be referred to my previous collation.
Here we analyze from the perspective of the compiler and processor, how CAS has both volatile read and volatile write memory semantics.
As we mentioned earlier, the compiler does not reorder any memory operations that are followed by volatile reads and volatile reads, and the compiler does not reorder any memory operations that precede volatile writes with volatile writes. Combining these two conditions means that in order to implement the memory semantics of both volatile read and volatile writes, the compiler cannot reorder CAs with any memory operations in front and behind CAs.
Let's analyze how CAs can have both volatile read and volatile write memory semantics in common Intel x86 processors.
The following is the source code for the Compareandswapint () method of the Sun.misc.Unsafe class:
Public Final native Boolean compareandswapint (Object o, long offset, int expected, int x);
You can see that this is a local method call. The native method calls the C + + code in OPENJDK in order: Unsafe.cpp,atomic.cpp and ATOMICWINDOWSX86.INLINE.HPP. The final implementation of this local method is in the following location in OpenJDK: openjdk-7-fcs-src-b147-27jun2011\openjdk\hotspot\src\oscpu\windowsx86\vm\ Atomicwindowsx86.inline.hpp
Source code is not posted here, directly look at the analysis: The program will be based on the current type of processor to determine whether to add a lock prefix for the CMPXCHG directive. If the program is running on a multiprocessor, add the lock prefix (lock CMPXCHG) to the cmpxchg instruction. Conversely, if the program is running on a single processor, the lock prefix is omitted (the single processor itself maintains sequential consistency within a single processor and does not require the memory barrier effect provided by the lock prefix).
The Intel manual describes the lock prefix as follows:
- Ensure that the read-change-write operation of the memory is performed atomically. In processors prior to Pentium and Pentium, instructions with a lock prefix lock the bus during execution, leaving other processors temporarily unable to access memory through the bus. Obviously, this will cost you dearly. Starting with the Pentium 4,intel Xeon and P6 processors, Intel has made a significant optimization on the basis of the original bus lock: If the area of memory to be accessed Memory) is locked in the cache inside the processor during the lock prefix instruction (that is, the cache row that contains the memory area is currently exclusive or modified), and the region is fully contained in a single cache line, and the processor executes the instruction directly. Because the cache row is locked during instruction execution, the other processor cannot read/write the memory area to which the instruction is to be accessed, thus guaranteeing the atomicity of the instruction execution. This procedure is called cache locking, and the cache lock will significantly reduce the execution overhead of the lock prefix instruction, but will still lock the bus when there is a high degree of contention between multiple processors or if the memory address of the instruction access is misaligned.
- It is forbidden to re-order the instruction with the previous and subsequent read and write instructions.
- Flushes all the data in the write buffer to memory.
The upper 2nd and 3rd have a memory barrier effect that is sufficient for both volatile read and volatile write memory semantics.
Through the above analysis, we can now finally understand why the JDK document says that CAS has both volatile read and volatile write memory semantics.
Now let's summarize the memory semantics of fair lock and non-fair lock:
- When a fair lock and an unfair lock are released, the last one is to write a volatile variable state.
- When a fair lock is acquired, the volatile variable is read first.
- When an unfair lock is acquired, the volatile variable is first updated with CAs, which has the memory semantics of both volatile read and volatile write.
From the analysis of Reentrantlock in this paper, we can see that the implementation of lock release-acquired memory semantics is at least in the following two ways:
- Use the memory semantics of the write-read of the volatile variable.
- Use the memory semantics of the volatile read and volatile writes that are included with CAs.
Implementation of the 5.2concurrent package
The CAs in Java use the high-efficiency machine-level atomic instructions available on modern processors that atomically perform read-and-write operations on memory, which is critical for synchronizing in multiprocessor. At the same time, the read/write and CAS of volatile variables can implement communication between threads. The integration of these features forms the cornerstone of the entire concurrent package. If we carefully analyze the source code implementation of the concurrent package, we will find a generalized implementation pattern:
- First, declare the shared variable to be volatile;
- Then, the synchronization between threads is realized by using the atomic condition update of CAs.
- At the same time, the communication between threads is implemented with volatile read/write and the volatile reading and writing memory semantics of CAs.
AQS, Non-blocking data structures and atomic variable classes (classes in the Java.util.concurrent.atomic package), the underlying classes in these concurrent packages, are implemented using this pattern, and the high-level classes in the concurrent package are dependent on these base classes for implementation. Overall, the concurrent package is implemented as follows:
<java Art > Reading notes for Concurrent programming-Chapter III Java memory Model (i)