Java source profiling: Object memory layout, JVM lock, and optimization

Last Update:2017-04-20 Source: Internet

Author: User

Tags cas mutex semaphore

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, Directory

1. Initiation Knowledge Preheating: CAs principle +JVM object Head memory storage structure

2.JVM Lock Optimization: Lock coarsening, lock elimination, biased lock, lightweight lock, spin lock.

3. Summary: Biased lock, lightweight lock, the advantages and disadvantages of heavy-lock.

Second, the initiation of knowledge preheating

Introduction of 2 concepts before starting this article

2.1.cas operation

To improve performance, many of the JVM's operations rely on CAS implementations, an optimistic lock implementation. In this paper, the CAS is used in lock optimization, so it is necessary to analyze the implementation of CAS first.

Cas:compare and Swap.

JNI to complete the operation of the CPU instruction:

Unsafe.compareandswapint (this, valueoffset, expect, update);

CAS has 3 operands, a memory value of V, an old expected value of a, and a new value to be modified B. If and only if the expected value A and the memory value of the V phase, the memory value of V is modified to B, otherwise do nothing.

Open Source: Openjdk\hotspot\src\oscpu\windowsx86\vm\ atomicWindowsx86.inline.hpp, such as: 0

OS::IS_MP () This is runtime/os.hpp, the actual is to return whether multiprocessor, the source code is as follows:

As shown in the source code above (see the first int parameter), LOCK_IF_MP: Determines whether the LOCK prefix is added to the CMPXCHG directive based on the current processor type. If the program is running on a multiprocessor, add the lock prefix (lock CMPXCHG) to the cmpxchg instruction. Conversely, if the program is running on a single processor, the lock prefix is omitted (the single processor itself maintains sequential consistency within a single processor and does not require the memory barrier effect provided by the lock prefix).

2.2. Object Header

In a hotspot virtual machine, the layout of objects stored in memory can be divided into three areas: Object Header (header), instance data (Instance), and aligned padding (Padding). This article is about object headers only.

The object header of the hotspot virtual machine includes two pieces of information:

the first part, "Mark Word": used to store the object's own run-time data , such as hash code (HASHCODE), GC generational age, lock status flag, thread-held lock, biased thread ID, biased timestamp, and so on.

the second part, "Type pointer " : A pointer to the metadata of its class that the virtual machine uses to determine which class the object is an instance of. (Array, the object header must also have a piece of data to record the length of the array, because the virtual machine can determine the size of the Java object through the metadata information of the ordinary Java object, but the size of the array cannot be determined from the array's metadata.) )

32-bit hotspot Virtual Machine Object Header storage structure: (excerpt from the network)

Figure 1 the 32-bit hotspot Virtual Machine object Header

To confirm the correctness, here we see openjdk--"hotspot source markoop.hpp, Virtual machine object Head storage structure:

Figure 2 notes in hotspot source MARKOOP.HPP

In the source code, the lock flag bit is enumerated like this:

1 enum {   Locked_value             0,//00 lightweight lock 2          unlocked_value           1,//01 no lock  3          monitor_value            2,//10 Monitor Lock, also known as expansion lock, also known as weight level lock 4          Marked_value             3,//11 gc marker 5          biased_lock_pattern      5//101 bias Lock  6   };

Here is the source code comment:

Figure 3 hotspot source MARKOOP.HPP lock flag bit note

Look at Figure 3, whether 32/64-bit JVM, are 1bit biased lock +2bit lock flag bit. The above red box is biased to the lock (a display bias lock pointing to a given thread, an anonymous bias lock) corresponding to the enumeration Biased_lock_pattern, the following red box is the 4 request header structure. Corresponds to the first 4 enumerations above. We can even see the lock Flag 11 o'clock, which is used by the GC's MarkSweep (tag cleanup algorithm). (This is no longer an extension)

==================================================================

Third, the JVM lock optimization

We all know that in Java, lock synchronized performance is poor and threads are blocked.

The implementation of locks in jdk1.6 introduces a number of optimizations to reduce the overhead of lock operations:

lock coarsening (lock coarsening): This means reducing unnecessary unlock,lock operations, extending multiple successive locks into a larger range of locks. lock Elimination: Eliminates lock protection for data that is not shared by other threads outside the current synchronization block through the run-time JIT compiler's escape analysis. The escape analysis also allows the allocation of object space on the thread local stack (while also reducing garbage collection overhead on the heap). Lightweight lock (lightweight Locking): This type of lock implementation is based on the assumption that most of the synchronization code in our program is generally in a no-lock competition (that is, a single-threaded execution environment) in real-world situations. In the case of no-lock competition, it is possible to avoid invoking the heavyweight mutex at the operating system level, instead of the lock acquisition and release by relying on only one cas atom instruction in Monitorenter and Monitorexit. biased lock (biased Locking): Skip the lightweight lock without the lock race, i.e. do not execute CAS atomic instructions. Adaptive Spin (Adaptive Spinning): When a thread fails to perform a CAS operation while acquiring a lightweight lock, it enters the operating system heavyweight lock associated with monitor (mutex Semaphore) before going into a busy wait (Spinning) and then try again, after a certain number of attempts, if still not successful, then call the monitor associated with the semaphore (that is, the mutex), into the blocking state.

3.1. Bias Lock

According to the previous hotspot design, each locking/unlock involves some CAS operations (such as a CAS operation waiting for a queue), and the CAS operation delays local calls, so the idea of biased locking is that once the thread first obtains the monitored object, the monitoring object "leans" to the thread, Subsequent invocations can avoid CAS operations.

In a nutshell, there is a Threaddid field in the object header of the lock Object (the first data storage structure of the object header) , which, if empty, writes its own threadid to the ThreadID field of the lock when the lock is acquired for the first time. The position of the lock is biased within 1. So the next time you acquire a lock, directly check whether the ThreadID is consistent with its own thread ID, if it is consistent, it is assumed that the current thread has acquired the lock, so there is no need to acquire the lock again, the lock-up phase of the lightweight lock and the heavyweight lock is skipped. Increased efficiency.
Note: When the lock has a competitive relationship, you need to release the bias lock and enter the lightweight lock.

Each thread is ready to get a shared resource:

The first step is to check that the Markword inside is not putting its own threadid, if it is, indicates that the current thread is in the "biased lock". Skip the lightweight lock to execute the synchronization body directly.

Obtain a biased lock such as:

3.2. Lightweight and heavy-weight locks

As shown in the following:

The second step, if Markword is not their own threadid, lock escalation, this time, with CAs to perform the switch, the new thread according to Markword inside the existing threadid, the notification of the front-line suspension, the front-line will markword the contents of the empty. in the third step, two threads copy the object's hashcode to their newly created record space for storing locks, and then begin to compete markword through CAS operations, modifying the contents of the shared object's Markword to the address of their new record space.The fourth step, the third step of the successful implementation of the CAS resources, failed to enter the spin.The fifth step, the spin thread in the spin process, the successful acquisition of resources (that is, the thread of the previously obtained resources to complete and release the shared resources), the entire state is still in a lightweight lock state, if the spin failed sixth step into the state of the heavyweight lock, this time, the spin thread to block, Wait for the front-line to complete and wake yourself up.

Note: The JVM lock process

Bias lock-"lightweight lock"-"Heavyweight lock"

Can be upgraded from left to right, cannot be degraded from right to left

Four. Summary

This paper focuses on the optimization of the JVM to the synchronized, but when the competition is fierce, not only can not improve efficiency, but will reduce efficiency, because there is a lock upgrade process, this time will need to-xx:-usebiasedlocking to disable the biased lock. Here are the comparison of these types of locks:


Lock	Advantages	Disadvantages	Applicable scenarios
Biased lock	Locking and unlocking does not require additional consumption, and the execution of a non-synchronous method is less than the nanosecond-level gap.	If there is a lock contention between threads, additional lock revocation consumption is brought.	Applies to only one thread to access the synchronization block scenario.
Lightweight lock	The competing threads do not block, increasing the responsiveness of the program.	If a thread that is not always locked out of contention consumes the CPU by using spin 。	pursuit of response time. the synchronization block executes very fast.
Heavy-weight lock	Thread contention does not use spin and does not consume CPU.	The thread is blocked and the response time is slow.	Pursuit of throughput. The synchronization block executes more slowly.

==========================

Reference:

"In-depth understanding of Java virtual machines: JVM advanced features and Best practices" Second Edition JDK1.7 (this article JDK1.8, the content does not conflict)

Java source profiling: Object memory layout, JVM lock, and optimization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More