http://blog.csdn.net/chen77716/article/details/6618779
There are two kinds of locking mechanisms in Java: The synchronized and Lock,lock interfaces and their implementation classes are JDK5 additions, and the author is the famous concurrency expert Doug Lea. This article does not compare synchronized with lock which is inferior, just introduces the realization principle of both.
Data synchronization need to rely on the lock, the synchronization of the lock who depends on? The answer given by synchronized is that it relies on the JVM at the software level, and lock gives the option to rely on special CPU instructions at the hardware level, and you may further ask: How does the JVM's bottom layer implement synchronized?
The JVM referred to in this article refers to the 6u23 version of Hotspot, which first describes the implementation of synchronized:
synrhronized keywords are concise, clear, and semantically clear, so even with the lock interface, the use is very extensive. The semantics of its application layer can be any non-null object as a "lock", when synchronized action on the method, the lock is the object instance (this), when the action in the static method is locked is the object corresponding class instance, because the class data exists perpetual band, So a static method lock is equivalent to a global lock of that class, and when synchronized acts on an object instance, the corresponding code block is locked. In the Hotspot JVM implementation, the lock has a special name: Object Monitor. 1. Thread Status and state transitions
When multiple threads request an object monitor at the same time, object monitor sets several states to differentiate the requested thread:
- Contention List: All threads that request a lock will be placed first in the competition queue
- Entry list:contention List The threads that qualify as candidates are moved to the Entry list
- Wait set: Those threads that call the wait method to be blocked are placed into the wait set
- OnDeck: There can be at most one line is impersonating at a competitive lock at any time, the thread is called OnDeck
- Owner: The thread that acquired the lock is called owner
- ! Owner: The thread that freed the lock
Reflects a state transition relationship: The new request lock thread is first added to Conetentionlist, and when a thread that has a lock (owner state) calls unlock, If Entrylist is found to be empty, move threads from Contentionlist to Entrylist, the following shows how Contentionlist and Entrylist are implemented: 1.1 contentionlist Virtual Queue
Contentionlist is not a real queue, but a virtual one, because Contentionlist is made up of node and its next pointer logic, and there is no data structure for a queue. Contentionlist is a last-in, first-out (LIFO) queue, each time a new node is added to the team head, the first node through the CAs to change the pointer to the new nodes, while setting the new node next point to the next node, and take action occurs at the end of the team. Obviously, the structure is actually a lock-free queue.
Because only the owner thread can take the element from the end of the queue, that is, the thread dequeue operation is not contention, of course, it avoids the ABA problem of CAs.
1.2 Entrylistentrylist and contentionlist logically belong to the waiting queue, contentionlist will be accessed concurrently by the thread, in order to reduce contention for the tail of contentionlist, and establish entrylist. The owner thread migrates the thread from contentionlist to entrylist when unlock, and specifies that a thread in entrylist (typically head) is a ready (OnDeck) thread. The owner thread is not passing the lock to the OnDeck thread, but handing over the right to the competition lock to the Ondeck,ondeck thread requires a re-competition lock. While this sacrifices some fairness, it greatly improves the overall throughput and calls OnDeck's choice behavior "competitive switching" in the hotspot. The ondeck thread obtains the lock and becomes the owner thread, and the inability to obtain the lock remains in the entrylist, and the position in the entrylist is not changed (still in the team header), given fairness. If the owner thread is blocked by the wait method, it is transferred to the Waitset queue, and if awakened at some point by Notify/notifyall, it is transferred to Entrylist again. 2. Spin locks Those threads in Contetionlist, Entrylist, Waitset are in a blocking state, and the blocking operation is done by the operating system (Pthread_mutex_lock function under Linxu). The thread is blocked and then enters the kernel (Linux) dispatch state, which causes the system to switch back and forth between the user state and the kernel state, which seriously affects the performance of the lock the way to alleviate the problem is spin, the principle is: when the contention occurs, if the owner thread can release the lock in a short period of time, Then those competing threads can wait a bit (spin), and after the owner thread releases the lock, the contention thread may immediately get the lock, thus avoiding system blocking. However, the owner may run longer than the critical value, and the contention thread will stop spinning into a blocking state (back) after a certain period of time or the argument is unable to acquire the lock. The basic idea is to spin, not to successfully block, and to minimize the likelihood of blocking, which has a very important performance boost for code blocks that have a short execution time. Spin lock has a more appropriate name: spin-exponential back lock, also known as compound lock. It is clear that spin is meaningful on multiple processors. Another question is, what do threads do when they spin? In fact, do not do anything, you can perform several for loops, you can execute a few empty assembly instructions, the purpose is to occupy the CPU, waiting for the opportunity to acquire the lock. So, spin is a double-edged sword, if the spin time is too long will affect the overall performance, whenToo short to delay blocking. Obviously, spin cycle selection is very important, but this with the operating system, hardware system, system load and many other scenarios related, it is difficult to choose, if the choice is not good, not only performance is not improved, may also decline, so it is generally believed that the spin lock is not extensible. on the selection of the spin lock cycle, the hotspot thinks that the best time should be the time of a thread context switch, but not at the moment. After investigation, currently only through the Assembly suspended a few CPU cycles, in addition to spin cycle selection, Hotspot also carries out many other spin optimization strategies, specifically as follows:
- If the average load is less than CPUs, always spin
- if more than (CPUS/2) threads are spinning, then threads block directly
- If the thread that is spinning is discovering that the owner has changed, delay spin time (spin count) or go into blocking
- if the CPU is in power-saving mode, the worst case of spin-off time is the CPU's storage latency (CPU a stores a data to the CPU b know the direct time difference of this data)
- spins properly discards the difference between thread priorities
When did the synchronized achieve the use of spin locks? The answer is when the thread enters Contentionlist, that is, before the first step. When a thread enters the waiting queue, it first spins an attempt to obtain a lock, if it does not enter the waiting queue successfully. This is slightly unfair to those threads that are already waiting in the queue. There is also an unfair place where a spin thread might preempt a ready thread's lock. Spin locks are maintained by each monitoring object, one for each monitoring object. 3. Biased lock in the JVM1.6 introduced a bias lock, biased lock mainly to solve the problem of the lock performance without competition, first we look at the non-competitive lock What is the problem: now almost all the locks are reentrant, that is, the lock has been acquired by the thread can lock/unlock the monitoring object, according to the previous hotspot design, Each time the locking/unlock involves some CAS operations (such as a CAS operation waiting for a queue), CAS operations delay local calls, so the idea of biased locking is that once the thread first obtains the monitoring object, the monitor object "leans" to the thread, and subsequent calls can avoid CAS operations. To put it bluntly is to set a variable, if found true, no need to go through the various locking/unlock process. But there are many concepts that need to be explained, and many of the problems that are introduced need to be addressed: Why 3.1 CAs and SMP architecture CAs introduce local latency? This is based on the SMP (symmetric multiprocessor) architecture, which probably indicates the structure of the SMP: it means that all CPUs will share a system bus, which is connected to main memory by this bus. Each core has its own first-level cache, and the cores are symmetrically distributed relative to the bus, so this structure is called "symmetric multiprocessor." CAS is all called Compare-and-swap, is a CPU atomic instructions, its role is to allow the CPU to update the value of a location after the atomic, after investigation found that its implementation is based on the hardware platform assembly instructions, that is, CAS is hardware implementation, The JVM simply encapsulates assembly calls, and those Atomicinteger classes use these encapsulated interfaces. core1 and Core2 may load the value of a location in main memory into their own L1 cache, and when Core1 modifies the value of the position in its own L1 cache, the value of Core2 cache in L1 is "invalidated" by the bus, And Core2 once found that the value in the L1 cache (known as the cache hit missing) will load the current value of the address through the bus from memory, everyone through the bus back and forth communication is called "cache consistency Traffic", because the bus is designed to be a fixed "communication capability", If the cache consistency traffic is too large, the bus becomes a bottleneck. When the values in Core1 and Core2 are again consistent, it is called the "cache oneAt this level, the ultimate goal of lock design is to reduce cache conformance traffic. CAs happens to cause cache consistency traffic, and if many threads share the same object, when a core CAs succeeds, it inevitably causes a bus storm, which is called local latency, which essentially favors locking to eliminate CAS and reduce cache conformance traffic.
Cache Consistency:The above mentioned cache consistency, in fact, there is a protocol support, now the general Protocol is MESI (first supported by Intel), specific reference: Http://en.wikipedia.org/wiki/MESI_protocol, will carefully explain this part later.
exceptions to cache conformance traffic:In fact, not all CAs will cause bus storms, which are related to the cache conformance protocol, specific reference: Http://blogs.oracle.com/dave/entry/biased_locking_in_hotspot
NUMA (Non Uniform Memory Access achitecture) Architecture:With SMP and asymmetric multiprocessor architectures, it is now mainly used on some high-end processors, the main feature is that there is no bus, no public memory, each core has its own memory, for this structure is not discussed here. 3.2 An important issue in favor of the introduction of a biased lock is that, in a multi-contention scenario, if another thread is competing for a biased object, the owner needs to release the biased lock, and the release process introduces some performance overhead, but overall the benefit of a biased lock is greater than the CAS cost. 4. Summary of the lock, the JVM also introduced a number of other technologies such as lock expansion, such as the spin lock, biased lock compared to the impact is not very large, here does not introduce. As can be seen through the above introduction, the bottom of the synchronized mainly rely on the Lock-free queue, the basic idea is the spin after blocking, competition after the switch to continue the competition lock, a little sacrifice fairness, but achieved high throughput. This continues with the lock in the JVM lock (deep into the JVM lock 2-lock).
Deep JVM lock Mechanism 1-synchronized