How the Java synchronized keyword is implemented

Last Update:2015-05-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Data synchronization need to rely on the lock, the synchronization of the lock who depends on? The answer given by synchronized is that it relies on the JVM at the software level, and lock gives the option to rely on special CPU instructions at the hardware level, and you may further ask: How does the JVM's bottom layer implement synchronized?

The JVM referred to in this article refers to the 6u23 version of Hotspot, which first describes the implementation of synchronized:

synrhronized keywords are concise, clear, and semantically clear, so even with the lock interface, the use is very extensive. The semantics of its application layer can be any non-null object as a "lock", when synchronized action on the method, the lock is the object instance (this), when the action in the static method is locked is the object corresponding class instance, because the class data exists perpetual band, So a static method lock is equivalent to a global lock of that class, and when synchronized acts on an object instance, the corresponding code block is locked. In the HotSpot JVM implementation, the lock has a special name: Object Monitor.

1. thread status and state transitions

When multiple threads request an object monitor at the same time, object monitor sets several states to differentiate the requested thread:

Contention List: All threads that request a lock will be placed first in the competition queue

Entry list:contention List The threads that qualify as candidates are moved to the Entry list

Wait set: Those threads that call the wait method to be blocked are placed into the wait set

OnDeck: There can be at most one line is impersonating at a competitive lock at any time, the thread is called OnDeck

Owner: The thread that acquired the lock is called owner

! Owner: The thread that freed the lock

Reflects a state transition relationship:

The thread of the new request lock is first added to Conetentionlist, and when a thread that has a lock (owner state) calls unlock, if the entrylist is found to be empty, move the thread from Contentionlist to Entrylist, Below is a description of how contentionlist and Entrylist are implemented:

1.1 Contentionlist Virtual Queues

Contentionlist is not a real queue, but a virtual one, because Contentionlist is made up of node and its next pointer logic, and there is no data structure for a queue. Contentionlist is a last-in, first-out (LIFO) queue, each time a new node is added to the team head, the first node through the CAs to change the pointer to the new nodes, while setting the new node next point to the next node, and take action occurs at the end of the team. Obviously, the structure is actually a lock-free queue.

Because only the owner thread can take the element from the end of the queue, that is, the thread dequeue operation is not contention, of course, it avoids the ABA problem of CAs.

1.2 entrylist

Entrylist and contentionlist logically belong to the waiting queue, contentionlist will be accessed concurrently by the thread, in order to reduce contention for contentionlist team tail, and establish entrylist. The owner thread migrates the thread from contentionlist to entrylist when unlock, and specifies that a thread in entrylist (typically head) is a ready (OnDeck) thread. The owner thread is not passing the lock to the OnDeck thread, but handing over the right to the competition lock to the Ondeck,ondeck thread requires a re-competition lock. While this sacrifices some fairness, it greatly improves the overall throughput and calls OnDeck's choice behavior "competitive switching" in the hotspot.

The OnDeck thread obtains the lock and becomes the owner thread, and the inability to obtain the lock remains in the entrylist, and the position in the entrylist is not changed (still in the team header), given fairness. If the owner thread is blocked by the wait method, it is transferred to the Waitset queue, and if awakened at some point by Notify/notifyall, it is transferred to Entrylist again.

2. spin lock

Those threads in Contetionlist, Entrylist, Waitset are in a blocking state, and the blocking operation is done by the operating system (Pthread_mutex_lock function under Linxu). The thread is blocked and then enters the kernel (Linux) dispatch state, which causes the system to switch back and forth between the user state and the kernel state, which seriously affects the performance of the lock.

The solution to this problem is spin, the principle is: when contention occurs, if the owner thread can release the lock in a short time, then those competing threads can wait a bit (spin), after the owner thread releases the lock, the contention thread may immediately get the lock, thus avoiding the system blocking. However, the owner may run longer than the critical value, and the contention thread will stop spinning into a blocking state (back) after a certain period of time or the argument is unable to acquire the lock. The basic idea is to spin, not to successfully block, and to minimize the likelihood of blocking, which has a very important performance boost for code blocks that have a short execution time. Spin lock has a more appropriate name: spin-exponential back lock, also known as compound lock. It is clear that spin is meaningful on multiple processors.

Another question is, what does a thread do when it spins? In fact, do not do anything, you can perform several for loops, you can execute a few empty assembly instructions, the purpose is to occupy the CPU, waiting for the opportunity to acquire the lock. So, spin is a double-edged sword, if the spin time too long will affect the overall performance, time is too short and can not reach the purpose of delay blocking. Obviously, spin cycle selection is very important, but this with the operating system, hardware system, system load and many other scenarios related, it is difficult to choose, if the choice is not good, not only performance is not improved, may also decline, so it is generally believed that the spin lock is not extensible.

Spin optimization strategy

For the selection of the spin lock cycle, the hotspot thinks that the best time should be the time of a thread context switch, but not at the moment. After investigation, currently only through the Assembly suspended a few CPU cycles, in addition to spin cycle selection, Hotspot also carries out many other spin optimization strategies, specifically as follows:

If the average load is less than CPUs, the spin

If more than one thread (CPUS/2) is spinning, then the thread blocks directly

Delay Spin time (spin count) or ingress blocking if the thread that is spinning is discovering that the owner has changed

Stop spinning if the CPU is in power-saving mode

The worst case of spin time is the memory latency of the CPU (CPU a stores a data, and CPU B learns the direct difference between the data)

The difference between thread priorities is properly discarded when spinning

When did the synchronized achieve the use of spin locks? The answer is when the thread enters Contentionlist, that is, before the first step. When a thread enters the waiting queue, it first spins an attempt to obtain a lock, if it does not enter the waiting queue successfully. This is slightly unfair to those threads that are already waiting in the queue. There is also an unfair place where a spin thread might preempt a ready thread's lock. Spin locks are maintained by each monitoring object, one for each monitoring object.

3. JVM1.6 bias lock

In the JVM1.6 of the introduction of biased lock, biased lock mainly to solve the problem of the lock performance without competition, first of all we look at the non-competitive lock What is the problem:

Almost all locks are reentrant now, that is, the thread that has acquired the lock can lock/unlock the Monitoring object multiple times, according to the previous hotspot design, each locking/unlock involves some CAS operations (such as a CAS operation waiting for a queue), and CAS operations delay local calls. Therefore, the idea of biased locking is that once the thread first obtains the monitoring object, and then let the monitoring object "biased" the thread, after the multiple calls can avoid the CAS operation, White is to set a variable, if found to be true no need to go through the various locking/unlock process. But there are many concepts that need to be explained, and many of the problems introduced need to be addressed:

3.1 CAS and SMP Architecture

Why is CAs introducing local latency? This starts with the SMP (symmetric multiprocessor) architecture, which probably indicates the structure of the SMP:

This means that all CPUs will share a system bus, which is connected to main memory by this bus. Each core has its own first-level cache, and the cores are symmetrically distributed relative to the bus, so this structure is called "symmetric multiprocessor."

And CAS is all called Compare-and-swap, is a CPU atomic instructions, its role is to allow the CPU to update the value of a location after the atomic, after investigation found that its implementation is based on the hardware platform assembly instructions, that is, CAS is hardware implementation, The JVM simply encapsulates assembly calls, and those Atomicinteger classes use these encapsulated interfaces.

Core1 and Core2 may load the value of a location in main memory into their own L1 cache, and when Core1 modifies the value of the position in its own L1 cache, the value of Core2 cache in L1 is "invalidated" by the bus, And Core2 once found that the value in the L1 cache (known as the cache hit missing) will load the current value of the address through the bus from memory, everyone through the bus back and forth communication is called "cache consistency Traffic", because the bus is designed to be a fixed "communication capability", If the cache consistency traffic is too large, the bus becomes a bottleneck. When the values in Core1 and Core2 are once again consistent, called "cache consistency," the ultimate goal of lock design is to reduce cache conformance traffic at this level.

While CAs happens to cause cache consistency traffic, if many threads share the same object, when a core CAs succeeds it will inevitably cause a bus storm, which is called local latency, which essentially favors locking to eliminate CAS and reduce cache consistency traffic.

Cache Consistency:

The above mentioned cache consistency, in fact, there is a protocol support, now the general Protocol is MESI (first supported by Intel), specific reference: Http://en.wikipedia.org/wiki/MESI_protocol, will carefully explain this part later.

Cache exceptions to consistent traffic:

In fact, not all CAs will cause bus storms, which are related to the cache conformance protocol, specific reference: Http://blogs.oracle.com/dave/entry/biased_locking_in_hotspot

NUMA (Non Uniform Memory Access achitecture) Architecture:

With SMP and asymmetric multiprocessor architectures, it is now mainly used on some high-end processors, the main feature is that there is no bus, no public memory, each core has its own memory, for this structure is not discussed here.

3.2 inclined to lift

An important problem with biased locking is that, in a multi-contention scenario, if another thread is competing for a biased object, the owner needs to release the biased lock, and the release process introduces some performance overhead, but overall the benefit of a bias lock is greater than the CAS cost.

4. Summary

With regard to locks, there are other techniques introduced in the JVM, such as lock expansion, which are not very significant compared to spin locks and bias locks, and are not introduced here.

As can be seen through the above introduction, the bottom of the synchronized mainly rely on the Lock-free queue, the basic idea is the spin after blocking, competition after the switch to continue the competition lock, a little sacrifice fairness, but achieved high throughput.

Reprinted from Http://www.open-open.com/lib/view/open1352431526366.html

How the Java synchronized keyword is implemented

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How the Java synchronized keyword is implemented

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How the Java synchronized keyword is implemented

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support