Multi-Thread: The principle of synchonized lock implementation

Multi-Thread: The principle of synchonized lock implementation < a >

Last Update:2016-06-07 Source: Internet

Author: User

Tags cas

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One: What is the type of lock for Java synchronization?

---> There are two kinds of locking mechanisms in Java: Synchonized and lock
The--->lock interface and its implementation class are JDK5 additions, the author of which is the famous concurrency expert Doug Lea. This article does not compare synchronized with lock which is inferior, just introduces the realization principle of both.

Two: The Java lock type of the dependent person?

The answer--->synchonized is to rely on the JVM at the software level
--->lock the solution is to rely on special CPU instructions at the hardware level.

Three: Synchonized's lock?

---> For normal synchronization methods, the lock is the current instance object
---> For a static synchronization method, the lock is the class object of the current classes
---> For synchronous method blocks, the lock is the object configured in synchonized brackets

Four: Thread status and state transitions?

When multiple threads request an object monitor at the same time, object monitor sets several states to differentiate the requested thread:

--->contention List: All threads that request a lock will be placed first in the competition queue
--->entry list:contention list, the thread that qualifies as a candidate is moved to the Entry list
--->wait set: Those threads that call the wait method to be blocked are placed into the wait Set
--->ondeck: There can be at most one line is impersonating at a competitive lock at any time, the thread is called OnDeck
--->owner: The thread that gets the lock is called Owner
--->! Owner: The thread that freed the lock

V: Thread state transition diagram

---> New request lock thread is first added to Conetentionlist when a thread that owns the lock (owner state) calls unlock, if found Entrylist is empty move thread from contentionlist to Entrylist

1.1 Contentionlist virtual queues
--->contentionlist is not a real queue, but a virtual one, because Contentionlist is made up of node and its next pointer logic, and there is no data structure for a queue. Contentionlist is a last-in, first-out (LIFO) queue, each time a new node is added to the team head, the first node through the CAs to change the pointer to the new nodes, while setting the new node next point to the next node, and take action occurs at the end of the team. Obviously, the structure is actually a lock-free queue.
---> Because only the owner thread can take elements from the end of the queue, that is, the thread dequeue operation is not contention, and of course avoids the ABA problem in CAs.

1.2 entrylist

--->entrylist and contentionlist logically belong to the waiting queue, contentionlist will be accessed concurrently by threads, in order to reduce contention for contentionlist team tail, and establish entrylist. The owner thread migrates the thread from contentionlist to entrylist when unlock, and specifies that a thread in entrylist (typically head) is a ready (OnDeck) thread. The owner thread is not passing the lock to the OnDeck thread, but handing over the right to the competition lock to the Ondeck,ondeck thread requires a re-competition lock. While this sacrifices some fairness, it greatly improves the overall throughput and calls OnDeck's choice behavior "competitive switching" in the hotspot.
---The >ondeck thread obtains the lock, it becomes the owner thread, and the inability to obtain the lock remains in the entrylist, and the position in the entrylist is not changed (still in the team header), given fairness. If the owner thread is blocked by the wait method, it is transferred to the Waitset queue, and if awakened at some point by Notify/notifyall, it is transferred to Entrylist again.

Hex: Spin Lock

1.1 Overview of Spin Locks

---> Those threads in Contetionlist, Entrylist, Waitset are in a blocking state, and the blocking operation is done by the operating system (Pthread_mutex_lock function under Linxu). The thread is blocked and then enters the kernel (Linux) dispatch state, which causes the system to switch back and forth between the user state and the kernel state, which seriously affects the performance of the lock.

---> Mitigation of this problem is spin, the principle is: when the contention occurs, if the owner thread can release the lock in a short time, then those who are competing threads can wait a bit (spin), after the owner thread releases the lock, the contention thread may immediately get the lock, thus avoiding the system blocking. However, the owner may run longer than the critical value, and the contention thread will stop spinning into a blocking state (back) after a certain period of time or the argument is unable to acquire the lock. The basic idea is to spin, not to successfully block, and to minimize the likelihood of blocking, which has a very important performance boost for code blocks that have a short execution time. Spin lock has a more appropriate name: spin-exponential back lock, also known as compound lock. It is clear that spin is meaningful on multiple processors.

---> Another question is, what do threads do when they spin? In fact, do not do anything, you can perform several for loops, you can execute a few empty assembly instructions, the purpose is to occupy the CPU, waiting for the opportunity to acquire the lock. So, spin is a double-edged sword, if the spin time too long will affect the overall performance, time is too short and can not reach the purpose of delay blocking. Obviously, spin cycle selection is very important, but this with the operating system, hardware system, system load and many other scenarios related, it is difficult to choose, if the choice is not good, not only performance is not improved, may also decline, so it is generally believed that the spin lock is not extensible.

1.2 Spin optimization strategy

---> On the selection of the spin lock cycle, the hotspot thinks that the best time should be the time of a thread context switch, but not at the moment. After investigation, currently only through the Assembly suspended a few CPU cycles, in addition to spin cycle selection, Hotspot also carries out many other spin optimization strategies, specifically as follows:

(1) If the average load is less than CPUs, always spin

(2) If more than one thread (CPUS/2) is spinning, then the thread blocks directly

(3) Delay spin time (spin count) or ingress blocking if the thread that is spinning is discovering that the owner has changed

(4) If the CPU is in power-saving mode, the spin is stopped

The worst case of---> Spin time is the memory latency of the CPU (CPU a stores a data, and CPU B learns the direct difference between the data)

The difference between thread priorities is appropriately discarded---> Spin

---> that synchronized realized when to use a spin lock? The answer is when the thread enters Contentionlist, that is, before the first step. When a thread enters the waiting queue, it first spins an attempt to obtain a lock, if it does not enter the waiting queue successfully. This is slightly unfair to those threads that are already waiting in the queue. There is also an unfair place where a spin thread might preempt a ready thread's lock. Spin locks are maintained by each monitoring object, one for each monitoring object.

Seven. JVM1.6 bias Lock

---> introduced the biased lock in the JVM1.6, the biased lock mainly solves the problem of the lock performance without competition, first we look at the problem of the lock without competition:

---> Now almost all of the locks are reentrant, that is, the thread that has acquired the lock can lock/unlock the Monitoring object multiple times, according to the previous hotspot design, each locking/unlock involves some CAS operations (such as a CAS operation waiting for a queue), and CAS operations delay local calls. Therefore, the idea of biased locking is that once the thread first obtains the monitoring object, and then let the monitoring object "biased" the thread, after the multiple calls can avoid the CAS operation, White is to set a variable, if found to be true no need to go through the various locking/unlock process. But there are many concepts that need to be explained, and many of the problems introduced need to be addressed:

1.1 CAS and SMP Architecture

Why is CAs introducing local latency? This starts with the SMP (symmetric multiprocessor) architecture, which probably indicates the structure of the SMP:

This means that all CPUs will share a system bus, which is connected to main memory by this bus. Each core has its own first-level cache, and the cores are symmetrically distributed relative to the bus, so this structure is called "symmetric multiprocessor."

And CAS is all called Compare-and-swap, is a CPU atomic instructions, its role is to allow the CPU to update the value of a location after the atomic, after investigation found that its implementation is based on the hardware platform assembly instructions, that is, CAS is hardware implementation, The JVM simply encapsulates assembly calls, and those Atomicinteger classes use these encapsulated interfaces.

Core1 and Core2 may load the value of a location in main memory into their own L1 cache, and when Core1 modifies the value of the position in its own L1 cache, the value of Core2 cache in L1 is "invalidated" by the bus, And Core2 once found that the value in the L1 cache (known as the cache hit missing) will load the current value of the address through the bus from memory, everyone through the bus back and forth communication is called "cache consistency Traffic", because the bus is designed to be a fixed "communication capability", If the cache consistency traffic is too large, the bus becomes a bottleneck. When the values in Core1 and Core2 are once again consistent, called "cache consistency," the ultimate goal of lock design is to reduce cache conformance traffic at this level.

While CAs happens to cause cache consistency traffic, if many threads share the same object, when a core CAs succeeds it will inevitably cause a bus storm, which is called local latency, which essentially favors locking to eliminate CAS and reduce cache consistency traffic.

Cache Consistency:

The above mentioned cache consistency, in fact, there is a protocol support, now the general Protocol is MESI (first supported by Intel), specific reference: Http://en.wikipedia.org/wiki/MESI_protocol, will carefully explain this part later.

Cache exceptions to consistent traffic:

In fact, not all CAs will cause bus storms, which are related to the cache conformance protocol, specific reference: Http://blogs.oracle.com/dave/entry/biased_locking_in_hotspot

NUMA (Non Uniform Memory Access achitecture) Architecture:

With SMP and asymmetric multiprocessor architectures, it is now mainly used on some high-end processors, the main feature is that there is no bus, no public memory, each core has its own memory, for this structure is not discussed here.

1.2 inclined to lift

An important problem with biased locking is that, in a multi-contention scenario, if another thread is competing for a biased object, the owner needs to release the biased lock, and the release process introduces some performance overhead, but overall the benefit of a bias lock is greater than the CAS cost.

Eight: summary

With regard to locks, there are other techniques introduced in the JVM, such as lock expansion, which are not very significant compared to spin locks and bias locks, and are not introduced here.

As can be seen through the above introduction, the bottom of the synchronized mainly rely on the Lock-free queue, the basic idea is the spin after blocking, competition after the switch to continue the competition lock, a little sacrifice fairness, but achieved high throughput.

Multi-Thread: The principle of synchonized lock implementation < a >

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More