First introduce some optimistic lock and pessimistic lock:
Pessimistic lock: Always assume the worst case, every time to take the data when they think others will change, so every time when the data are locked, so that other people want to take this data will block until it gets the lock. Traditional relational database in the use of a lot of this locking mechanism, such as row locks, table locks, read locks, write locks, etc., are in operation before the lock. Another example of Java synchronization primitives synchronized keyword implementation is also pessimistic lock.
Optimistic lock: As the name implies, is very optimistic, every time to take the data when they think others will not be modified, so will not be locked, but in the update will be judged in the period when others have to update this data, you can use the version number and other mechanisms. Optimistic locking is useful for multi-read application types, which can improve throughput, like the write_condition mechanism provided by the database, which is actually an optimistic lock. In Java, the atomic variable class under the Java.util.concurrent.atomic package is implemented using an implementation of optimistic locking CAs.
An implementation of optimistic locking-cas (Compare and swap compare and swap):
Problems with Locks:
Java is guaranteed to be synchronized by the Synchronized keyword before JDK1.5, and this ensures that the variables are accessed exclusively by using a consistent locking protocol to coordinate access to the shared state, ensuring that no matter which thread holds the lock on the shared variable. This is an exclusive lock, the exclusive lock is actually a pessimistic lock, so it can be said that synchronized is pessimistic lock.
The pessimistic locking mechanism has the following problems:
In the multi-threaded competition, locking, releasing locks can cause a lot of context switching and scheduling delay, causing performance problems.
Holding a lock on one thread causes all other threads that require this lock to hang.
- If a thread with a high priority waits for a thread with a lower priority to release the lock, it causes the priority to be inverted, causing a performance risk.
In contrast to these problems of pessimistic locking, another more effective lock is the optimistic lock. In fact, optimistic lock is: each time without locking but assume that there is no concurrency conflict to complete an operation, if the concurrency conflict failed to retry until successful.
Optimistic Lock:
The optimistic lock (optimistic Locking) has been said above, is actually a kind of thought. Relative pessimistic lock, optimistic locking hypothesis that the data generally do not produce concurrency conflicts, so when the data to commit the update, the data will not formally detect concurrency conflicts, if found concurrency conflict, let the user return error information, let users decide how to do.
The concept of the optimistic lock mentioned above has actually elaborated its specific implementation details: The main is two steps: conflict detection and Data update. One of the more typical ways to implement this is Compare and Swap (CAS).
Cas:cas is optimistic locking technology, when multiple threads try to update the same variable simultaneously using CAS, only one of the threads can update the value of the variable, and the other threads fail, and the failed thread is not suspended, but is told that the competition failed and can try again.
The CAS operation contains three operands-the memory location (V) to read and write, the expected original value for the comparison (A), and the new value to write (B). If the value of the memory location v matches the expected original value A, the processor automatically updates the location value to the new value B. Otherwise the processor does not do any operation. In either case, it will return the value of that location before the CAS directive. (In some special cases of CAs, only the CAS will be returned successfully, not the current value.) CAS effectively illustrates that "I think position V should contain a value of A; If you include this value, B will be placed in this position; otherwise, do not change the position, just tell me the value of this position now." This is in fact the same principle as the optimistic lock conflict check + data update.
Here again, optimistic locking is an idea. CAS is a way of realizing this idea.
Java support for CAS: the addition of Java.util.concurrent (J.U.C) in JDK1.5 is based on CAs. Compared to the blocking algorithm for synchronized, CAS is a common implementation of non-blocking algorithms. Therefore, the J.U.C has a great improvement in performance.
Take Atomicinteger in Java.util.concurrent as an example to see how thread safety is ensured without using locks. The main understanding Getandincrement method, the function of this method is equivalent to ++i operation.
In the absence of a lock mechanism, the field value is to use the volatile primitive to ensure that data between threads is visible. This can be read directly when the value of the variable is obtained. Then let's see how ++i did it. Getandincrement takes a CAS operation, reads the data from memory each time, and then takes this data and the results of +1 for the CAS operation, if successful, returns the result, or retries until it succeeds. Instead, Compareandset uses JNI (Java Native Interface) to do the CPU instruction:
So compare this = = expect, replace this = Update,compareandswapint to achieve the atomicity of these two steps? The principle of reference CAs
CAS principle: CAS is implemented by invoking JNI code. The compareandswapint is implemented using C to invoke the CPU's underlying instructions. The following analysis compares the common CPU (Intel x86) to explain the principle of CAS implementation. The following is the source code for the Compareandswapint () method of the Sun.misc.Unsafe class:
As shown in the source code above, the program determines whether to add a lock prefix to the CMPXCHG directive based on the current processor type. If the program is running on a multiprocessor, add the lock prefix (lock CMPXCHG) to the cmpxchg instruction. Conversely, if the program is running on a single processor, the lock prefix is omitted (the single processor itself maintains sequential consistency within a single processor and does not require the memory barrier effect provided by the lock prefix).
CAs Cons:
- ABA problem: For example, a thread one takes a from the memory location V, when another thread two also takes a from memory, and two does something to B, then two changes the V position to a, and then the CAS operation finds that the memory is still a, Then the one operation succeeds. Although the CAS operation of thread one is successful, there may be a hidden problem. As shown below:
Existing a one-way list implementation of the stack, the top of the stack is a, then the thread T1 already know that A.next is B, and then want to use CAs to replace the top of the stack with B:head.compareandset (A, A, b), before the T1 execute the above instruction, the thread T2 intervene, Again pushd, C, A, this time the stack structure, and object B is now in a Free State:
At this point the thread T1 performs the CAS operation, the detection finds that the top of the stack is still a, so the CAs succeeds, the stack top changes to B, but in fact the b.next is null, so this situation becomes:
Where there is only a B element in the stack, a list of C and D no longer exists on the stack, and C and D are discarded for no apparent matter. A class atomicstampedreference is provided in the atomic package of the JDK starting with Java1.5 to solve the ABA problem. The Compareandset method of this class is to first check whether the current reference is equal to the expected reference, and whether the current flag is equal to the expected flag, and if all is equal, then atomically sets the reference and the value of the flag to the given update value.
- Long cycle times cost large:
Spin CAs (unsuccessful, always loop execution until successful) if the long time is unsuccessful, the CPU will be very expensive to execute. If the JVM can support the pause instruction provided by the processor then the efficiency will be improved, the pause command has two functions, first it can delay the pipelining instruction (de-pipeline), so that the CPU does not consume excessive execution resources, the delay depends on the specific implementation of the version, On some processors, the delay time is zero. Second, it avoids the CPU pipelining being emptied (CPU pipeline flush) when exiting the loop due to memory order collisions (violation), which improves CPU execution efficiency.
- Only atomic operations of a shared variable can be guaranteed:
When performing operations on a shared variable, we can use the method of circular CAs to guarantee atomic operations, but for multiple shared variables, the cyclic CAS cannot guarantee the atomicity of the operation, it is possible to use locks at this time, or there is a trickery way to combine multiple shared variables into a single shared variable to operate. For example, there are two shared variable i=2,j=a, merge ij=2a, and then use CAs to manipulate IJ. Starting with Java1.5 The JDK provides the Atomicreference class to guarantee atomicity between reference objects, and you can put multiple variables in an object for CAS operations.
CAs and synchronized usage scenarios:
1, for less resource competition (less thread conflict), using synchronized synchronization lock for thread blocking and wake-up switching and user-mode kernel state switching operation additional waste of CPU resources, and CAS based hardware implementation, do not need to enter the kernel, do not need to switch threads, The operation spins less often, so you get higher performance.
2, for the serious resource competition (serious thread conflict), the probability of CAS spin is relatively large, thus wasting more CPU resources, efficiency is less than synchronized.
Added: Synchronized has improved its optimization after jdk1.6. Synchronized the underlying implementation of the main rely on the Lock-free queue, the basic idea is the spin after blocking, competition after the switch to continue the competition lock, a little sacrifice of fairness, but achieved high throughput. With fewer thread conflicts, you can get similar performance to CAS, which is much higher than CAs in the case of a severely threaded conflict.
Implementation of the concurrent package:
Because Java's CAs have both volatile read and volatile write memory semantics, communication between Java threads now has the following four ways:
A thread writes the volatile variable, and then the B thread reads the volatile variable.
A thread writes the volatile variable, and then the B thread updates the volatile variable with CAs.
A thread updates a volatile variable with CAS, and then the B thread updates the volatile variable with CAs.
- A thread updates a volatile variable with CAS, and then the B thread reads the volatile variable.
The CAs in Java use the high-efficiency machine-level atomic instructions available on modern processors that atomically perform read-and-write operations on memory, which is the key to achieving synchronization in a multiprocessor (essentially, a computer that supports atomic read-change-write instructions, is an asynchronous equivalent machine for calculating Turing machines sequentially, so any modern multiprocessor will support some atomic instruction that performs atomic read-and-write operations on memory. At the same time, the read/write and CAS of volatile variables can implement communication between threads. The integration of these features forms the cornerstone of the entire concurrent package. If we carefully analyze the source code implementation of the concurrent package, we will find a generalized implementation pattern:
First, declare the shared variable to be volatile;
Then, the synchronization between threads is realized by using the atomic condition update of CAs.
- At the same time, the communication between threads is implemented with volatile read/write and the volatile reading and writing memory semantics of CAs.
AQS, Non-blocking data structures and atomic variable classes (classes in the Java.util.concurrent.atomic package), the underlying classes in these concurrent packages, are implemented using this pattern, and the high-level classes in the concurrent package are dependent on these base classes for implementation. Overall, the concurrent package is implemented as follows:
CAs in the JVM (allocation of objects in the heap):
The Java call to new object () creates an object that is assigned to the JVM's heap. So how is this object stored in the heap?
First, when new object () executes, how much space this object needs, in fact, is already determined, because the various data types in Java, occupy how much space is fixed (to its principle is unclear, please google). Then the next job is to find a space in the heap to hold the object.
In a single-threaded scenario, there are generally two allocation strategies:
Pointer collisions: This generally applies to memory that is absolutely structured (memory is structured depending on the memory reclamation policy), and the allocation of space is simply the distance that the pointer moves the size of the object like the free memory side.
- Idle list: This applies to memory irregularities, in which case the JVM maintains a memory list that records which areas of memory are idle and how large. When assigning space to an object, go to the free list to find the appropriate area and assign it.
But the JVM can't always run in a single-threaded state, so inefficient. Because it is not an atomic operation to allocate memory to an object at least, it is not safe to find a free list, allocate memory, modify the idle list, and so on. There are also two strategies for resolving concurrency-related security issues:
CAS: In fact, the virtual machine uses CAS with the failure retry method to ensure the atomicity of the update operation, the same principle as above.
- Tlab: If using CAs actually has an impact on performance, the JVM proposes a more advanced optimization strategy: Each thread allocates a small chunk of memory in the Java heap, called the local thread allocation buffer (Tlab), and the thread internally allocates memory directly on the Tlab. Thread conflicts are avoided. A larger memory space is allocated for CAS operations only when the memory of the buffer needs to be redistributed.
If the virtual machine uses Tlab, it can be configured with the-xx:+/-usetlab parameter (Tlab is enabled by default for Jdk5 and later versions).
Java concurrency problem--optimistic lock and pessimistic lock and optimistic lock an implementation way-cas