Java Lock---Bias lock, lightweight lock, spin lock, Heavyweight lock

Source: Internet
Author: User
Tags cas switches

Before doing a test, repeatedly executed several times, found the result is the same:
1. Single-threaded synchronized is the most efficient (at the time it felt its efficiency should be the worst);
2. Atomicinteger efficiency is the most unstable, different concurrency performance is not the same: short time low concurrency, efficiency is higher than synchronized, sometimes even higher than longadder, but high concurrency, performance is not as good as synchronized, The performance is very unstable under different conditions;
3. Longadder performance is stable, in all kinds of concurrency, performance is good, the overall performance is best, short time lower concurrency than Atomicinteger performance is almost, long time high concurrency under the highest performance (can let Atomicinteger step down);

Understanding the basics of locks

If you want a thorough understanding of the ins and outs of Java locks, you need to understand the following basics first.

One of the basics: Types of locks

The lock is classified by macro, which is divided into pessimistic lock and optimistic lock.

Optimistic lock

Optimistic lock is a kind of optimistic thinking, that is, read more write less, encountered concurrent write the possibility of low, every time to take the data when they think others will not be modified, so will not lock, but in the update will be judged during this period when others have to update this data, to take in writing, read the current version number, Then the lock operation (compare with the previous version number, if the same update), if it fails, repeat the read-compare-write operation.

Optimistic locks in Java are basically implemented by CAS, which is an updated atomic operation that compares the current value to whether the value passed in is the same as if it was updated or failed.

Pessimistic lock

Pessimistic lock is pessimistic thinking, that is, write more, encountered concurrent write high probability, every time to take the data when they think others will be modified, so every time in reading and writing data will be locked, so that others want to read and write this data will block until the lock. The pessimistic lock in Java is the lock in the SYNCHRONIZED,AQS framework is the first attempt to get the CAs optimistic lock to obtain the lock, can not be obtained, will be converted to pessimistic locks, such as retreenlock.

The second basic knowledge: The cost of Java thread blocking

Java threads are mapped to the operating system native thread, if you want to block or wake up a thread requires operating system intervention, need to switch between the state and kernel mentality, this switch consumes a lot of system resources, because the user state and the kernel state have their own dedicated memory space, special registers, etc. The user state switch to the kernel state needs to pass to many variables, the parameter to the kernel, the kernel also needs to protect the user state to switch some register value, the variable and so on, in order to switch back to the user state after the kernel state call to continue to work.

    1. If the thread state switch is a high-frequency operation, this will consume a lot of CPU processing time;
    2. This synchronization strategy is obviously very bad for a simple block of code that needs to be synchronized, and it takes longer than the user code to take the lock pending operation.

Synchronized will cause the thread that is not locked to enter the blocking state, so it is a heavy-weight synchronous manipulation in the Java language, known as a heavyweight lock, in order to alleviate the above performance problem, the JVM has introduced a light-weight lock and a bias lock since 1.5, and they all belong to the optimistic lock by default with spin lock enabled.

The cost of defining Java thread Switching is one of the basics of understanding the pros and cons of various locks in Java.

The third of basic knowledge: Markword

Before introducing the Java lock, let's say what is Markword,markword is part of the Java object data structure, to learn more about the structure of Java objects can click here, here only to do Markword detailed introduction, Because the object's Markword and Java various types of locks are closely related;

The length of the Markword data is 32bit and 64bit, respectively, in 32-bit and 64-bit virtual machines (without opening the compression pointer), and its last 2bit is the lock status flag bit, which marks the state of the current object, the state of the object, and determines what the Markword stores, as shown in the following table:

Status flag Bit Store Content
Not locked 01 Object hash code, object generational age
Lightweight locking 00 Pointer to lock record
Expansion (heavy-lock) 10 Perform a weight-locked pointer
GC Flag 11 Empty (no logging information required)
can be biased 01 Biased to thread ID, biased timestamp, object generational age

The 32-bit virtual machine Markword in different states as shown in the structure:

The Markword structure is understood, which can help to understand the locking unlocking process of Java lock.

Summary

There are 4 types of locks in Java, namely, Heavyweight, spin, lightweight and biased.
Different locks have different characteristics, each lock only in its specific scene, will have excellent performance, no lock in Java can be in all cases excellent efficiency, the introduction of so many locks is to deal with different situations;

The previous talk about the heavyweight lock is a pessimistic lock, spin lock, lightweight lock and biased lock is optimistic lock, so now you can roughly understand their scope of application, but how to use these kinds of locks, it is necessary to look at the following specific analysis of their characteristics;

Lock spin lock in Java

The principle of spin-lock is very simple, if the thread holding the lock can release the lock resource in a short time, then those who wait for the competition lock will not need to do the switch between the kernel state and the user state into the blocking suspend state, they only need to wait (spin), wait for the lock thread to release the lock immediately, This avoids the consumption of switching between the user thread and the kernel.

But thread spin is the need to consume Cup, White is to let the cup do not work hard, if you have not been able to obtain the lock, the thread can not always occupy the cup spin do not work, so need to set a spin wait for the maximum time.

If the thread holding the lock executes longer than the maximum time of the spin wait and throws no release lock, it causes the thread of the other contention lock to get no lock during the maximum wait time, when the contention thread stops spinning into a blocking state.

Advantages and disadvantages of spin lock

Spin locks reduce thread congestion as much as possible, which is less competitive for locks, and performance can be significantly boosted by code blocks that take a very short lock-up time because the spin consumption is less than the consumption of the thread-blocking and wake-up operations, which causes the thread to have two context switches!

However, if the competition of the lock is fierce, or the thread that holds the lock takes a long time to occupy the lock to execute the synchronous block, it is not suitable to use the spin lock, because the spin lock is always occupied by the CPU before acquiring the lock, accounting for XX, and there are a lot of threads competing for a lock, which will lead to a long Thread spin consumption is greater than the consumption of the thread blocking suspend operation, other threads that need cup are not able to get to the CPU, resulting in CPU waste. So in this case we have to turn off the spin lock;

Spin Lock Time threshold value

The purpose of the spin lock is to make the CPU's resources not released until the lock is acquired and processed immediately. But how do you choose to spin the execution time? If the spin execution time is too long, there will be a large number of threads in the spin state consuming CPU resources, which will affect the overall system performance. So the spin cycle is optional extra important!

The JVM's choice of spin cycles, jdk1.5 this limit is certain to write dead, in 1.6 introduced adaptive spin lock, adaptive spin lock means that the spin time is not fixed, but by the first time in the same lock spin time and the lock owner's state to decide, basically think a thread context switch time is the best one time, and the JVM also for the current C The load situation of PU has been optimized.

    1. If the average load is less than CPUs, the spin

    2. If more than one thread (CPUS/2) is spinning, then the thread blocks directly

    3. Delay Spin time (spin count) or ingress blocking if the thread that is spinning is discovering that the owner has changed

    4. Stop spinning if the CPU is in power-saving mode

    5. The worst case of spin time is the memory latency of the CPU (CPU a stores a data, and CPU B learns the direct difference between the data)

    6. The difference between thread priorities is properly discarded when spinning

Spin-Lock opening

JDK1.6 in-xx:+usespinning Open;
-XX:PREBLOCKSPIN=10 is the number of spins;
After JDK1.7, this parameter is removed and is controlled by the JVM.

The role of the heavy-lock synchronizedsynchronized

Before the JDK1.5 are used synchronized keyword to ensure synchronization, the role of synchronized believe that everyone has been very familiar with;

It can treat any non-null object as a lock.

    1. When acting on a method, the lock is an instance of the object (this);
    2. When used as a static method, the lock is a class instance, and because class related data is stored in the permanent band PermGen (jdk1.8 is metaspace), the permanent band is globally shared, so the static method lock is equivalent to a global lock of the class, locks all the threads that call the method;
    3. When synchronized acts on an object instance, it locks all blocks of code that are locked with that object.
The realization of synchronized

implementation as shown;

It has multiple queues, and object monitor stores these threads in different containers when multiple threads are accessing an object monitor together.

    1. Contention List: The competition queue, where all requests for lock threads are first placed in this competition queue;

    2. The threads in the Entry list:contention list that qualify as candidate resources are moved to the Entry list;

    3. Wait Set: Which calls the wait method the blocked thread is placed here;

    4. OnDeck: At any moment, at most only one line is impersonating in the competition lock resource, the thread is becoming OnDeck;

    5. Owner: The thread that is currently acquired to the resource is called owner;

    6. ! Owner: The thread that is currently releasing the lock.

Each time the JVM pulls out a data from the end of the queue for the lock competitor (OnDeck), but in the case of concurrency, contentionlist is accessed by a large number of concurrent threads for CAs, in order to reduce the competition for tail elements, The JVM moves a subset of threads into Entrylist as a candidate for competing threads. The owner thread migrates some of the threads in contentionlist to entrylist when unlock, and specifies that a thread in entrylist is a OnDeck thread (which is typically the first thread to go in). The owner thread does not pass the lock directly to the OnDeck thread, but the right to lock the competition to Ondeck,ondeck needs to re-compete the lock. This sacrifices some fairness, but can greatly improve the throughput of the system, and in the JVM, it is called the "competitive switching".

The OnDeck thread acquires the lock resource and becomes the owner thread, while the lock resource is still stuck in entrylist. If the owner thread is blocked by the wait method, it is transferred to the Waitset queue until it is woken up by notify or notifyall at some point, and is re-entered entrylist.

Threads in Contentionlist, Entrylist, and Waitset are blocked, and the blocking is done by the operating system (implemented with Pthread_mutex_lock kernel functions under the Linux kernel).

Synchronized is a non-fair lock. Synchronized threads enter Contentionlist, the waiting thread first attempts to spin the lock, and if it does not get to the contentionlist, it is obviously unfair to the thread that has entered the queue, Another unfair thing is that the thread that spins the lock can also preempt the lock resource of the OnDeck thread directly.

Biased lock

Java bias Lock (biased Locking) is a multithreaded optimization introduced by JAVA6.
Bias Lock, as the name implies, it will be biased to the first access lock thread, if in the run process, the synchronization lock only one thread access, there is no multi-threaded contention, the thread is not required to trigger synchronization, in this case, the thread will be added a biased lock.
If another thread preemption lock is encountered during the run, the thread holding the biased lock is suspended, the JVM removes the biased lock on it, and restores the lock to a standard lightweight lock.

It further improves the running performance of the program by eliminating the synchronization primitives in the case of non-competitive resources.

The implementation of the bias lock is biased toward the lock acquisition process:
    1. Access Mark Word bias lock is set to 1, if the lock flag bit is 01, it is confirmed to be a biased state.

    2. If it is a biased state, the test thread ID points to the current thread, if so, go to step 5, otherwise go to step 3.

    3. If the thread ID does not point to the current thread, the competitive lock is operated through CAs. If the competition succeeds, the mark Word line ID is set to the current thread ID, and then 5 is executed, and if the competition fails, 4 is executed.

    4. If the CAS gets a biased lock failure, it indicates a competition. When the global security Point (SafePoint) is reached, the thread that obtains the bias lock is suspended, the bias lock is promoted to a lightweight lock, and then the thread that is blocked at the security point continues to execute the synchronization code. (When you undo a bias lock, it causes the stop word)

    5. Executes the synchronization code.

Note: The fourth step to reach the security point SafePoint will cause stop the word, the time is very short.

The release of the biased lock:

The revocation of a bias lock is mentioned in the above step. Biased lock the thread that holds the biased lock releases the lock only when it encounters another thread trying to compete for a biased lock, and the thread does not voluntarily release the biased lock. A bias lock revocation, which waits for a global security point (at which no bytecode is executing), first pauses the thread that has a biased lock, determines whether the lock object is locked, reverts to an unlocked (flag bit "01"), or a lightweight lock (the flag bit is "00").

The applicable scenario of the biased lock

There is always only one thread executing the synchronization block, no other thread executes the synchronization block until it has finished executing the release lock, is used in the case of no competition, once the competition is upgraded to a lightweight lock, when the upgrade to a lightweight lock needs to revoke the bias lock, the reverse lock will cause the stop word operation ;
In the event of a lock competition, the bias lock will do many extra operations, especially when the reverse bias will lead to access to safety points, the safety point will lead to STW, resulting in performance degradation, which should be disabled;

View pauses – safety point pauses log

To view the security point pauses, you can turn on the security point log by setting the JVM parameter-xx:+printgcapplicationstoppedtime to make the system stop time, add-xx:+printsafepointstatistics-xx: Printsafepointstatisticscount=1 These two parameters will print out the details, you can see the use of biased lock caused by the pause, the time is very short, but in the case of severe contention, the number of pauses will be very much;

Note: The security point log cannot be opened all the time:
1. The security point log output to stdout by default, one is the cleanliness of the stdout log, and the second is that stdout redirected files may be locked if they are not/dev/shm.
2. For some very short pauses, such as canceling the bias lock, the print consumption is larger than the pause itself.
3. The safety point log is printed in a safe spot, which in itself increases the downtime of the safety point.

So the security log should only be turned on when troubleshooting.
If you want to open on the production system, add the following four parameters:
-XX:+UNLOCKDIAGNOSTICVMOPTIONS-XX:-displayvmoutput-xx:+logvmoutput-xx:logfile=/dev/shm/vm.log
Open Diagnostic (just open more flag optional, do not actively activate a flag), turn off output VM log to stdout, output to standalone file,/dev/shm directory (memory file system).

This log is divided into three parts:
The first part is the timestamp, the type of VM operation
The second part is the thread profile, enclosed in brackets.
Total: Number of bus threads in the security point
Initially_running: Number of threads running state at the start of a security point
Wait_to_block: The number of threads waiting to be paused before VM operation starts

The third part is the various stages of reaching the security point and the time taken to perform the operation, the most important of which is the VMOP

    • Spin: Waiting for the thread to respond to SafePoint call time;
    • Block: The time taken to pause all threads;
    • Sync: Equals Spin+block, which is the amount of time from the start to the safety point, which can be used to determine how long it takes to enter the safety point;
    • Cleanup: time spent cleaning;
    • VMOP: The time to actually perform VM operation.

It can be seen that many but very short security points, all of which are revokebias, are highly concurrent applications that disable biased locking.

JVM turn on/off bias lock
    • Turn on bias Lock:-xx:+usebiasedlocking-xx:biasedlockingstartupdelay=0
    • Turn off bias Lock:-xx:-usebiasedlocking
Lightweight lock

The lightweight lock is promoted by the bias, the bias lock runs in the case of a thread entering the synchronization block, when the second thread joins the lock contention, the bias lock will be upgraded to a lightweight lock;
Lock process for lightweight locks:

    1. When the code enters the synchronization block, if the synchronization object lock state is unlocked (the lock flag bit is "01" state, whether it is biased to "0"), the virtual machine will first establish a space named lock record in the stack frame of the current thread for storing the lock object the current mark Word's copy, officially called displaced Mark Word. The state of the thread stack and the object header
      is shown.

    2. Copy the Mark Word from the object header to the lock record;

    3. After the copy succeeds, the virtual machine uses the CAS action to attempt to update the object's mark Word to a pointer to the lock record and to point the owner pointer in the lock record to object Mark Word. If the update succeeds, follow step 4, or step 5.

    4. If this update succeeds, the thread has a lock on the object, and the object Mark Word's lock flag bit is set to "00", which means that the object is in a lightweight lock state, which is the state of the thread stack and the object header.
        

    5. If this update fails, the virtual machine first checks to see if the object's mark word points to the current thread's stack frame, and if it means that the current thread already has a lock on the object, it can go straight to the synchronization block and proceed. Otherwise, multiple threads compete for a lock, the lightweight lock expands to a heavyweight lock, the status value of the lock flag changes to "ten", and Mark Word stores a pointer to a heavyweight lock (mutex), and the thread that waits for the lock goes into a blocking state. The current thread attempts to use a spin to acquire the lock, and the spin is the process of taking a loop to get the lock to keep the thread from blocking.

Release of lightweight locks

Release lock thread View: Switched from a lightweight lock to a weight lock, which occurs during a lightweight lock release lock, before it copies the Markword of the lock object's head when it acquires the lock, and if it discovers that there are other threads trying to acquire the lock during the time it holds the lock, And the thread modifies the Markword, which is inconsistent with the discovery, then switches to the weight lock.

Because the heavyweight lock was modified, all display Mark Word was not the same as the original Markword.

How to remedy, that is, before entering the mutex, compare the Markword state of obj. Verify that the Markword is held by another thread.

At this point, if the thread has released Markword, then through the CAs can be directly into the thread, without entering the mutex, this role.

Attempt to get lock thread perspective: If a thread is trying to acquire a lock, the lightweight lock is being occupied by another thread, then it modifies the Markword and modifies the heavyweight lock to indicate that the entry weight is locked.

Another note: The thread waiting for a lightweight lock will not block, it will always spin waiting for the lock, and modify Markword as described above.

This is the spin lock, the thread that tries to acquire the lock, does not hang when it does not acquire the lock, and instead executes an empty loop, the spin. After several spins, the code is executed if the lock has not been acquired before it is suspended and the lock is acquired.

Summarize

Synchronized the execution process:
1. Check if Mark Word contains the ID of the current thread and, if it is, indicates that the current thread is in a biased lock
2. If not, use CAs to replace the current thread's ID with Mard Word, and if successful indicates that the current thread obtains a biased lock, the bias flag bit 1
3. If a failure occurs, the competition is reversed, the biased lock is revoked, and then the lightweight lock is upgraded.
4. The current thread uses CAs to replace the object header's mark word with a lock record pointer, and if successful, the current thread obtains the lock
5. If the failure indicates that another thread is competing for a lock, the current thread attempts to use a spin to acquire the lock.
6. If the spin is successful, it is still in a lightweight state.
7. If spin fails, upgrade to a heavyweight lock.

The above locks are implemented internally by the JVM itself, and when we execute the synchronized synchronization block, the JVM decides how to perform the synchronous operation based on the enabled lock and the contention of the current thread;

When all locks are enabled, the thread enters the critical section first to obtain a biased lock, if there is already a biased lock, it will try to acquire a lightweight lock, enable spin lock, if the spin is not acquired to the lock, then use a heavyweight lock, not get to lock the threads blocked pending, until the thread holding the lock to wake them up;

Biased lock is used in the case of no lock contention, that is, synchronous open before the current thread is not finished, no other thread will execute the synchronization block, once the second thread contention, the bias lock will be upgraded to a lightweight lock, if the lightweight lock spin reached the threshold, not acquire the lock, it will be upgraded to a heavyweight lock;

If the thread is hotly contested, then the biased lock should be disabled.

Lock optimization

The above description of the lock is not in our code can be controlled, but for reference to the above ideas, we can optimize our own thread lock operation;

Time to reduce locks

Does not need synchronous execution code, can not put in the synchronization fast inside executes does not put in the synchronization fast, may let the lock release as soon as possible;

Reduce the granularity of locks

Its idea is to physically lock the locks and split them into logical multiple locks, increasing the degree of parallelism, thus reducing lock contention. Its thought is also uses the space to change the time;

Many data structures in Java Use this approach to improve the efficiency of concurrent operations:

Concurrenthashmap

Concurrenthashmap in Java prior to jdk1.8, using an segment array

Segment< K,V >[] segments
    • 1

Segment inherits from Reentrantlock, so each segment is a reentrant lock, each segment has a hashentry< k,v > array to hold the data, the put operation, first determine which segment to put the data, Just lock the segment, execute put, and the rest of the segment will not be locked, so how many segment in the array allow the data to be stored at the same time, thus increasing the concurrency capability.

Longadder

Longadder implementation ideas are similar to Concurrenthashmap,longadder there is a cell array that dynamically changes based on the current concurrency state, and a long type of value is used to store the value in the Cell object;
When there is no concurrency contention at first, or when the cells array is initializing, a CAS is used to add the values to the base of the member variable, and in the case of concurrent contention, Longadder initializes the cells array and selects a cell lock in the cell array. The number of cells in the array, which allows the number of threads to be modified at the same time, and finally adds the value of each cell in the array, plus base values, which is the final value; The cell array can also be scaled up to the current thread contention, with an initial length of 2 and one increase per expansion. Until the expansion to greater than or equal to the number of CPUs will no longer expand, which is why longadder than CAs and atomicinteger efficiency is higher, the latter is Volatile+cas implementation, their competitive dimension is 1,longadder competitive dimension is " Cell number +1 "Why +1?" Because it also has a base, if the competition does not reach the lock will also try to add the value to base;

Linkedblockingqueue

Linkedblockingqueue also embodies this idea, in the queue Head team, in the queue tail out of teams, the team and out of teams using different locks, relative to linkedblockingarray only one lock efficiency is high;

The granularity of the lock can not be removed indefinitely, a maximum of one lock may be split into the current cup number of locks;

Lock coarsening

Most of the cases we want to minimize the size of the lock, the coarse lock is to increase the granularity of the lock;
The granularity of the lock needs to be coarse in the following scenarios:
If there is a cycle, the operation of the loop needs to be locked, we should put the lock on the outside of the loop, or each in and out of the cycle, are in and out of a critical area, the efficiency is very poor;

Using read-write locks

Reentrantreadwritelock is a read-write lock, read operation plus read lock, can be read concurrently, write operation using write lock, can only single-threaded write;

Read/write separation

Copyonwritearraylist, Copyonwritearrayset
The Copyonwrite container is the container that is copied when it is written. The popular understanding is that when we add elements to a container, instead of adding them directly to the current container, the current container is copied, a new container is duplicated, the new container is added, the element is added, and the reference to the original container is pointed to the new container. The advantage of this is that we can read the Copyonwrite container concurrently, without having to lock it, because the current container does not add any elements. So the Copyonwrite container is also a reading and writing separation of ideas, reading and writing different containers.
Copyonwrite concurrent containers are used for reading and writing less concurrent scenarios, because there is no lock when reading, but it will be locked when the changes are made, otherwise it will cause multiple threads to copy multiple copies at the same time, modify their respective;

Using CAs

If the operations that require synchronization are executed very quickly, and the threads are not competitive, then using CAS is more efficient because locking causes the thread to switch context, and if the context switch takes more time than the synchronization itself and the thread is less competitive with resources, use volatiled+ CAS operations can be a very efficient choice;

Eliminate pseudo-sharing of cache rows

In addition to the synchronization locks we use in our code and the JVM's own built-in synchronization locks, there is a hidden lock that is the cache line, which is also known as a performance killer.
In a multi-core processor, each cup has its own exclusive first-level cache, level two cache, and even a shared level three cache, in order to improve performance, the CPU read and write data is in the cache behavior of the smallest unit read and write, 32-bit CPU cache behavior 32 bytes, 64-bit cup cache behavior 64 bytes, This leads to a number of problems.
For example, multiple variables that do not need to be synchronized because they are stored in contiguous 32-byte or 64-byte, when one of these variables are needed, they are loaded into a cup-1 private cache as a cache row (although only one variable is required, but CPU reads are minimized in cache behavior. Read the variables adjacent to it), the variable read into the CPU cache is a copy of the main memory variable, which is equivalent to adding a lock to several variables in the same cache line, and any one of the variables in the cache line changes, and when the cup-2 needs to read the cache row, It is necessary to update the entire cache row in the cup-1 to main memory (even if the other variables have not changed) and then cup-2 to be able to read, and cup-2 may need to change the cache row's variable to be different from the variable in the cache row that the cpu-1 has changed. So it's equivalent to adding a synchronous lock to a few unrelated variables;
To prevent pseudo-sharing, different JDK versions are implemented differently:
1. Before jdk1.7, you will add a set of long variables before and after variables that require exclusive cache rows, relying on the padding of these meaningless arrays to make a variable own a cache line;
2. In jdk1.7 because the JVM optimizes these unused variables, it is implemented by inheriting a class that declares a lot of long variables;
3. To resolve this issue by adding sun.misc.Contended annotations in jdk1.8, to make the annotation valid, you must add the following parameters to the JVM:
-xx:-restrictcontended

The sun.misc.Contended annotation adds a 128-byte padding before the variable to isolate the current variable from other variables;

Java Lock---Bias lock, lightweight lock, spin lock, Heavyweight lock

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.