In-depth JVM lock mechanism 1-synchronized

Source: Internet
Author: User
Document directory
  • 1. thread status and State Conversion
  • 2. spin lock
  • 3. Biased lock
  • 4. Summary

Currently, there are two lock mechanisms in Java: synchronized and lock. The lock interface and its implementation class are added by jdk5. the author is Doug Lea, a famous concurrency expert. This article does not compare synchronized and lock ratio, but only introduces the implementation principles of the two.

Data Synchronization depends on the lock. Who does the lock synchronization depend on? Synchronized gives the answer that it depends on JVM at the software level, while lock provides the solution that relies on special CPU commands at the hardware level. You may further ask: how does the JVM underlying implement synchronized?

The JVM mentioned in this Article refers to the 6u23 version of hotspot. The following describes the implementation of synchronized:

Synrhronized has simple keywords, clear keywords, and clear semantics. Therefore, even with the lock interface, synrhronized is widely used. The semantics of the application layer can take any non-null object as a "Lock". When synchronized is applied to the method, the lock is the object instance (this ); when used as a static method, the lock is the class instance corresponding to the object. Because the class data exists in the permanent band, the static method lock is equivalent to a global lock of the class; when synchronized acts on an object instance, the lock is the corresponding code block. In the hotspot

In JVM implementation, the lock has a special name: Object monitor.

1. thread status and State Conversion

When multiple threads request an object monitor at the same time, the object monitor sets several States to distinguish the request thread:

  • Contention list: All request lock threads are first placed in the competition queue
  • Entry list: the threads in the contention list that are eligible to become candidates are moved to the entry list.
  • Wait set: the threads that call the wait method to be blocked are placed in the wait set.
  • Ondeck: Only one thread can compete for a lock at any time. This thread is called ondeck.
  • Owner: the thread used to obtain the lock is called the owner.
  • ! Owner: the thread that releases the lock.
Reflects the state transition relationship: the thread of the new request lock will first be added to the conetentionlist. When a thread with a lock (Owner Status) calls unlock, if entrylist is found to be empty, the thread will be moved from contentionlist to entrylist. The following describes how to implement contentionlist and entrylist: 1.1 contentionlist Virtual Queue

Contentionlist is not a real queue, but a virtual queue because the contentionlist is composed of node and its next pointer logic and does not have a queue data structure. Contentionlist is a post-in-first-out (LIFO) queue. Each time a node is added to a new queue, it is in the queue header. You can use CAs to change the pointer of the first node to a new node, at the same time, set the next of the new node to point to the subsequent node, and the acquisition operation takes place at the end of the team. Obviously, this structure is actually a lock-free queue.

Because only the owner thread can get elements from the end of the team, that is, the out-of-Column Operations of the thread are useless, of course, the ABA problem of CAS is avoided.

1.2 entrylistentrylist and contentionlist are logically in the same waiting queue. contentionlist will be accessed by threads concurrently. To reduce contention for the end of the contentionlist team, entrylist is created. The owner thread will migrate the thread from the contentionlist to the entrylist during unlock, and specify a thread (generally head) in the entrylist as the ready (ondeck) thread. The owner thread does not pass the lock to the ondeck thread, but gives the right of the competitive lock to ondeck. The ondeck thread needs to compete for the lock again. Although it sacrifices a certain degree of fairness, it greatly improves the overall throughput. In hotspot, ondeck's choice behavior is called "competitive switching ". After the ondeck thread acquires the lock, it becomes the owner thread. If the lock cannot be obtained, it will remain in the entrylist. Considering the fairness, the position in the entrylist will not change (still in the head of the queue ). If the owner thread is blocked by the wait method, it is transferred to the waitset queue. If it is awakened by Y/notifyall at a certain time point, it is transferred to the entrylist again. 2. the threads in the contetionlist, entrylist, and waitset are all blocked. The blocking operation is completed by the operating system (using the pthread_mutex_lock function in linxu ). After a thread is blocked, it enters the kernel (Linux) scheduling state. This will cause the system to switch back and forth between the user and kernel states. The way to seriously affect the lock performance to alleviate the above problems is to spin, the principle is: when a contention occurs, if the owner thread can release the lock within a short period of time, then those competing threads can wait a bit (spin). After the owner thread releases the lock, the contention thread may be immediately locked to avoid system blocking. However, the running time of the owner may exceed the critical value. After the contention thread spin for a period of time, the lock still cannot be obtained. At this time, the contention thread stops the spin and enters the blocking status (backward ). The basic idea is to spin the code, block it if it fails, and reduce the possibility of blocking as much as possible. This is very important for code blocks with short execution time to improve performance. The spin lock has a more appropriate name: the spin-exponential backward lock, that is, the composite lock. Obviously, spin is meaningful on a multi-processor. Another question is, what do I do when a thread spin? In fact, we can do nothing. We can execute several for loops and execute several empty Assembly commands to occupy the CPU and wait for the opportunity to get the lock. Therefore, spin is a double-edged sword. If the rotation time is too long, the overall performance will be affected. If the time is too short, the delay blocking will not be achieved. Obviously, the choice of spin cycle is very important, but this is related to the operating system, hardware system, system load and many other scenarios, it is difficult to choose, if the choice is inappropriate, not only the performance is not improved, it may fall, so we generally think that spin locks are not scalable. In the selection of the spin lock period, hotspot deems that the best time should be the time for context switching of a thread, but it has not done so yet. According to the survey, only several CPU cycles are paused through compilation. In addition to the selection of spin cycles, hotspot also implements many other spin optimization policies, as shown below:
  • If the average load is less than CPUs, it is always spin
  • If more than (CPUs/2) threads are spinning, then the threads are blocked directly.
  • If the spin thread finds that the owner has changed, the spin time (spin count) is delayed or the thread is blocked.
  • If the CPU is in power-saving mode, the spin is stopped.
  • The worst case of spin time is the storage latency of the CPU (CPU a stores a piece of data, and the time difference between the data obtained by cpu B)
  • During spin, the difference between thread priorities will be discarded.
When does synchronized use spin locks? The answer is that when the thread enters the contentionlist, that is, before the first step of the operation. When a thread enters the waiting queue, it first performs a spin attempt to obtain the lock. If the lock fails, it then enters the waiting queue. This is slightly unfair to the threads that are already waiting in the queue. Another unfair thing is that the spin thread may seize the lock of the ready thread. The spin lock is maintained by each Monitored object, and each Monitored object has one. 3. biased locks are introduced in jvm1.6. Biased locks mainly solve the performance problems of non-competitive locks. First, let's look at the problems of non-competitive locks: almost all locks are reentrant, that is, the threads that have obtained the lock can lock/unlock the monitoring object multiple times. According to the previous hotspot design, each lock/unlock operation involves some CAS operations (such as the CAS operation waiting for the queue), CAS operations will delay local calls, so the idea of locking is that once the thread gets the monitoring object for the first time, then let the Monitored object "biased" to this thread. The subsequent multiple calls can avoid CAS operations. To put it bluntly, a variable is set, if the value is true, you do not need to perform various lock/unlock procedures. However, there are still many concepts that need to be explained and many introduced problems need to be solved: Why does 3.1 CAS and SMP architecture CAS introduce local latency? Starting from the SMP (symmetric multi-processor) architecture, this probably indicates the SMP structure: it means that all CPUs will share a system bus and connect to the main memory through this bus. Each core has its own primary cache, and each core is symmetric distributed relative to the bus. Therefore, this structure is called Symmetric multi-processor ". CAS is called compare-and-swap, which is an atomic command of the CPU. It allows the CPU to update the value at a certain position atomically after comparison, the implementation is based on Assembly commands on the hardware platform, that is, CAS is implemented by hardware. JVM only encapsulates Assembly calls, and those atomicinteger classes use these encapsulated interfaces. Core1 and core2 may load the value at a certain position in the primary storage to their L1 cache at the same time. When core1 modifies the value at this position in its L1 cache, it will pass the bus, make the value corresponding to L1 cache in core2 "invalid", and once core2 finds that its L1 cache value is invalid (called missing cache hit) the latest value of this address will be loaded from the memory through the bus. The back-to-forth communication through the bus is called "cache consistent traffic" because the bus is designed as a fixed "communication capability ", if the cache consistency traffic is too large, the bus will become a bottleneck. When the values in core1 and core2 are the same again, they are called "cache consistency". At this level, the ultimate goal of the lock design is to reduce cache consistency traffic. CAS will result in consistent cache traffic. If many threads share the same object, when a core CAS succeeds, it will inevitably lead to a bus storm. This is called local latency, in essence, the bias lock aims to eliminate CAS and reduce consistent cache traffic. Cache consistency:As mentioned above, cache consistency is actually supported by protocols. Currently, the common protocol is MESI (which was first supported by Intel). For details, refer to http://en.wikipedia.org/wiki/mesi_protocol. Exceptions of consistent cache traffic:In fact, not all CAS will cause a bus storm, which is related to the cache consistency protocol, specific reference: http://blogs.oracle.com/dave/entry/biased_locking_in_hotspot NUMA (non uniform memory access Achitecture) architecture:There is also an asymmetric multi-processor architecture corresponding to SMP. Currently, it is mainly used on some high-end processors. Its main feature is that there is no bus, no public primary storage, and each core has its own memory, we will not discuss this structure here. 3.2 an important problem introduced by biased locks is that, in the multi-contention scenario, if another thread is biased towards the object, the owner needs to release biased locks, the release process will bring about some performance overhead, but in general, the benefits brought by locking are higher than the cost of CAS. 4. To sum up the lock, some other technologies such as lock expansion are introduced in JVM. These technologies have little impact on spin locks and biased locks. We will not introduce them here. From the above introduction, we can see that the underlying implementation of synchronized mainly relies on the lock-free queue. The basic idea is to block the queue after the spin, and continue to compete for the lock after the competition switch, slightly sacrificing the fairness, however, high throughput is achieved. Next we will continue to introduce the lock in the JVM lock (deep into the JVM lock 2-Lock ). Author: chen77716 posted on 16:24:41 Original article link reading: 71 comment: 0 view comment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.