JVM principles explained and tuned

Last Update:2016-10-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, what is the JVM

The JVM is the abbreviation for Java Virtual machine (Java VM), which is a specification for computing devices, a fictional computer that is implemented by simulating various computer functions on a real computer.

A very important feature of the Java language is its independence from the platform. The use of Java virtual machines is the key to achieving this feature. General high-level languages if you want to run on a different platform, you need to compile at least a different target code. When the Java language virtual machine is introduced, the Java language does not need to be recompiled when it runs on different platforms. The Java language uses a Java Virtual machine to mask information related to a specific platform, allowing the Java language compiler to generate only the target code (bytecode) that runs on a Java virtual machine, which can be run unmodified on multiple platforms. When a Java virtual machine executes a bytecode, it interprets the bytecode as a machine instruction execution on a specific platform. This is why Java can be "compiled once and run Everywhere".

From the logical structure of the Java platform, we can never understand the JVM:

From a clear view of the various logical modules contained in the Java platform, and the differences between the JDK and the JRE, we can get a bird's eye view of the JVM's own physical structure:

Ii. Java code compilation and execution process

Java code compilation is done by the Java source compiler, and the flowchart is as follows:

The execution of Java bytecode is done by the JVM execution engine, and the flowchart is as follows:

The entire process of compiling and executing AVA code consists of the following three important mechanisms:

Java source Code compilation mechanism
Class loading mechanism
Class execution mechanism

Java source Code compilation mechanism

Java source code compilation consists of the following three processes:

Parse and input to symbol table
Annotation processing
Semantic analysis and generation of class files

The flowchart is as follows:

The resulting class file is made up of the following sections:

Structure information. Includes the class file format version number and the number and size of each part of the information
Meta data. The information that corresponds to the declarations and constants in the Java source code. Contains class/inherited superclass/implemented interface declaration information, domain and method declaration information, and constant pool
Method information. Corresponds to the information in the Java source code for statements and expressions. Includes byte code, exception Processor table, evaluation stack and local variable size, type record of evaluation stack, debug symbol information

Class loading mechanism

The class loading of the JVM is done through ClassLoader and its subclasses, and the hierarchy and loading order of the classes can be described as follows:

1) Bootstrap ClassLoader

Responsible for loading all classes in Jre/lib/rt.jar in $java_home, implemented by C + +, not classloader subclasses

2) Extension ClassLoader

Some jar packages that are responsible for loading the extensions in the JAVA platform, including the jar packages in $java_home Jre/lib/*.jar or-djava.ext.dirs specified directories

3) App ClassLoader

Responsible for documenting the jar packages specified in the Classpath and the class in the directory

4) Custom ClassLoader

ClassLoader that belong to the application to customize according to their own needs, such as Tomcat, JBoss will be implemented according to the Java EE specification ClassLoader loading process will first check whether the class is loaded, check the order is bottom-up, from the custom ClassLoader to bootstrap ClassLoader layer by level check, as long as a classloader has been loaded as if this class has been loaded, ensure that this class only all ClassLoader loaded once. The order of loading is top-down, that is, the upper layer tries to load the class one at a level.

Class execution mechanism

The JVM is a stack-based architecture that executes class bytecode. After the thread is created, the program counter (PC) and stack (stack) are generated, and the program counter holds the offset of the next instruction to be executed in the method, and the stack frames are stored in each stack frame, and each stack frame corresponds to each call of each method, and the stack frame is made up of the local variable area and the operand stack. Local variables are used to store local variables and parameters in the method, which are used to store the intermediate results produced during the execution of the method. The structure of the stack is as follows:

Iii. JVM memory management and garbage collection

JVM Memory Composition Structure

The JVM stack consists of heaps, stacks, local method stacks, method areas, and so on, as shown in the following chart:

1) Heap

All the memory of objects created by new is allocated in the heap, and the heap size can be controlled by-XMX and-XMS. The heap is divided into the Cenozoic and the old generation, and the Cenozoic is further divided into the Eden and survivor areas, and the final survivor is made up of from space and to space, and the structure diagram is as follows:

Cenozoic. New objects are used to allocate memory in the Cenozoic, when Eden space is insufficient, the surviving objects will be transferred to the survivor, the Cenozoic size can be controlled by-xmn, you can also use-xx:survivorratio to control the proportions of Eden and survivor.
Old generation. For storing objects that are still alive after multiple garbage collection in the Cenozoic
The Persistence zone (Permanent space) implements the method area, which mainly holds all loaded class information, method information, constant pool and so on. The-xx:permsize and-xx:maxpermsize can be used to specify the persistence of the initialization and maximum values. Permanent space is not the same as the method area, but the hotspot JVM uses Permanent space to implement the method area, and some virtual machines do not Permanent space and use other mechanisms to implement the method area.

-XMX: Maximum heap memory, such as:-xmx512m

-XMS: Initial heap memory, such as:-xms256m

-xx:maxnewsize: Max Young area Memory

-xx:newsize: initial young area memory. Typically, the Xmx is 1/3 or 1/4. New generation = Eden + 2 Survivor space. Actual free space = Eden + 1 Survivor, or 90%

-xx:maxpermsize: Maximum persistent with memory

-xx:permsize: initial persistent with memory

-xx:+printgcdetails. Print GC Information

-xx:newratio the new generation and the old age ratio, such as –xx:newratio=2, then the Cenozoic accounted for the entire heap space of 1/3, the old age accounted for 2/3

The ratio of Eden to Survivor in the Cenozoic of-xx:survivorratio. The default value is 8. That is, Eden accounts for 8/10 of the Cenozoic space, and another two Survivor 1/10.

2) stack

Each thread executes each method by applying a stack frame in the stack, each of which includes a local variable area and an operand stack for storing temporary variables, parameters, and intermediate results during the method call.

-XSS: sets the stack size for each thread. jdk1.5+ each thread stack size is 1M, generally, if the stack is not very deep, 1M is absolutely enough.

3) Local Method stack

Used to support the execution of the native method, which stores the state of each native method call

4) Method Area

Contains the class information to load, static variables, constants of the final type, properties, and method information. The JVM uses a durable generation (permanet Generation) to store the method area, which can be specified by-xx:permsize and-xx:maxpermsize to specify minimum and maximum values

Garbage collection is based on the basic recycling strategy

Reference count (Reference counting):

Compare the old recycling algorithms. The principle is that this object has a reference, that is to add a count, delete a reference and reduce a count. When garbage collection, only objects with a collection count of 0 are used. The most deadly of this algorithm is the inability to handle circular references.

Mark-Clear (Mark-sweep):

This algorithm executes in two stages. The first stage marks all referenced objects starting from the reference root node, the second stage traverses the entire heap, and the unmarked objects are purged. This algorithm needs to pause the entire application while generating memory fragmentation.

Replication (Copying):

This algorithm delimits the memory space as two equal areas, using only one of the regions at a time. During garbage collection, iterate through the current usage area and copy the objects in use to another area. The algorithm processes only the objects that are in use at a time, so the copy cost is small, and the replication has the ability to defragment the memory in the past, without a "fragmentation" problem. Of course, the disadvantage of this algorithm is also very obvious, is to need twice times the memory space.

Labeling-Finishing (mark-compact):

This algorithm combines the advantages of the "mark-clear" and "copy" two algorithms. It is also divided into two stages, the first phase marks all referenced objects starting from the root node, the second stage traverses the entire heap, clears the unlabeled objects and "compresses" the surviving objects into one of the heaps, and discharges them sequentially. This algorithm avoids the "mark-erase" fragmentation problem and avoids the space problem of the "copy" algorithm.

The JVM uses different garbage collection mechanisms for the Cenozoic and the old generation respectively

New generation of GC:

The new generation usually has a shorter survival time, so the so-called copying algorithm, based on the copying algorithm, scans the surviving object and copies it into a completely unused space, corresponding to the Cenozoic, which is copied between Eden and from space or to space. The Cenozoic uses a free pointer to control the GC trigger, the pointer keeps the last allocated object in the Cenozoic interval, and when a new object is allocated memory, it is used to check if the space is sufficient and not enough to trigger the GC. When objects are continuously allocated, the objects gradually go from Eden to Survivor, and finally to the old generation.

The JVM provides a serial GC (Serial GC), a parallel reclaim GC (Parallel scavenge), and a parallel GC (PARNEW) on the execution mechanism

1) Serial GC

The entire scanning and copying process is a single-threaded way, suitable for single CPU, the new generation of small space and the demand for pause time is not very high application, is the client level of the default GC mode, can be-XX:+USESERIALGC to enforce the specified

2) Parallel Recovery GC

In the entire scanning and replication process in a multi-threaded way, for multi-CPU, the time required for a short pause on the application, the server level is the default use of GC mode, can be-XX:+USEPARALLELGC to enforce the designation, with-XX: Parallelgcthreads=4 to specify the number of threads

3) Parallel GC

Use with concurrent GC for legacy generations

GC for old generation:

The old generation and the new generation, the object survival time is longer, more stable, so the mark (Mark) algorithm for recycling, so-called Mark is to scan out the surviving objects, and then to reclaim unmarked objects, after recycling the empty space is either merged, or marked out for the next allocation, The bottom line is to reduce the loss of efficiency caused by memory fragmentation. The JVM provides a serial GC (Serial MSC), parallel GC (parallel MSC), and concurrent GC (CMS) on the execution mechanism, and the details of the algorithm need to be further studied.

The various GC mechanisms above need to be combined, as specified in the following table:

Specify the way	New Generation GC Mode	Old Generation GC Mode
-xx:+useserialgc	Serial GC	Serial GC
-xx:+useparallelgc	Parallel Recycle GC	Parallel GC
-xx:+useconemarksweepgc	Parallel GC	Concurrent GC
-xx:+useparnewgc	Parallel GC	Serial GC
-xx:+useparalleloldgc	Parallel Recycle GC	Parallel GC
-xx:+ USECONEMARKSWEEPGC -xx:+useparnewgc	Serial GC	Concurrent GC
Unsupported combinations	1,-XX:+USEPARNEWGC-XX:+USEPARALLELOLDGC 2,-XX:+USEPARNEWGC-XX:+USESERIALGC

Iv. JVM Memory Tuning

The first thing to note is that when tuning the JVM memory, you can't just look at the memory used by the OS-level Java process, which is not exactly the actual occupancy of the reactor memory, because this value will not change after the GC, so memory tuning should use more memory viewing tools provided by the JDK. such as Jconsole and Java VisualVM.

System-level tuning of the JVM's memory is primarily intended to reduce the frequency of GC and the number of full GC times, and excessive GC and full GC will consume a lot of system resources (mainly CPU), affecting the throughput of the system. Pay particular attention to the full GC because it organizes the entire heap, resulting in a general GC due to the following situations:

Insufficient space for old generation
When tuning the object in the new generation of GC as far as possible to be recycled, so that the object in the Cenozoic to survive for a period of time and do not create too large objects and arrays to avoid creating objects directly in the old generation

Pemanet Generation Space Shortage
Increase Perm Gen space to avoid too many static objects

The average size of a statistically obtained GC promoted to an old generation is greater than the remaining space of the old generation
Control the proportions of the new generation and the old generation

System.GC () is displayed call
Garbage collection should not be triggered manually, depending on the JVM's own mechanism

Tuning is achieved by controlling the proportions of the various parts of the heap memory and the GC strategy, and the following is a look at the consequences of bad settings for each part

1) New generation set too small

First, the new generation of GC frequency is very frequent, increase the system consumption, the second is to cause large objects directly into the old generation, occupy the old generation of residual space, induce full GC

2) New generation set too large

First, the new generation has set up the General Assembly lead to the old generation too small (heap total), thus inducing full GC; second, the new generation of GC time-consuming significantly increased

In general, the new generation of the whole heap 1/3 more appropriate

3) Survivor set too small

Causes the object to reach the old generation directly from Eden, reducing the time to live in the Cenozoic

4) Survivor set too large

Causes Eden to be too small, increasing the GC frequency

In addition, the-xx:maxtenuringthreshold=n to control the generation of survival time, as far as possible to make objects in the new generation is recycled

By memory management and garbage collection, there are a variety of GC strategies and combinations for both the new generation and the old generation, and choosing these strategies is a challenge for us developers, and the JVM provides two simpler GC policy settings

1) Throughput priority

The JVM chooses the corresponding GC strategy and controls the size ratio of the Cenozoic and the old generation to achieve the throughput index. This value can be set by-xx:gctimeratio=n.

2) Pause time First

The JVM takes pause time as the indicator, chooses the corresponding GC strategy and controls the size ratio of the Cenozoic and the old generation, and tries to ensure that the application stop time caused by each GC is completed within the specified value range. This value can be set by-xx:maxgcpauseratio=n.

Finally, a summary of common JVM configuration

Heap Settings

-XMS: initial Heap Size

-XMX: Maximum Heap Size

-xx:newsize=n: Setting the young generation size

-xx:newratio=n: Sets the ratio of the younger generation to the older generation. such as: 3, the ratio of the young generation and the old generation is 1:3, the young generation of the entire young generation of old generation and 1/4

-xx:survivorratio=n: The ratio of Eden in the young generation to the two survivor districts. Note that there are two survivor districts. such as: 3, indicating Eden:survivor=3:2, a Survivor area accounted for the entire young generation of 1/5

-xx:maxpermsize=n: Setting the persistent generation size

Collector Settings

-XX:+USESERIALGC: Setting up the serial collector

-XX:+USEPARALLELGC: Setting up a parallel collector

-XX:+USEPARALLEDLOLDGC: Setting up a parallel old generation collector

-XX:+USECONCMARKSWEEPGC: Setting the concurrency Collector

Garbage collection Statistics

-xx:+printgc

-xx:+printgcdetails

-xx:+printgctimestamps

-xloggc:filename

Parallel collector settings

-xx:parallelgcthreads=n: Sets the number of CPUs to use when the parallel collector is collected. The number of parallel collection threads.

-xx:maxgcpausemillis=n: Set maximum pause time for parallel collection

-xx:gctimeratio=n: Sets the percentage of time that garbage collection takes to run the program. The formula is 1/(1+n)

Concurrent collector Settings

-xx:+cmsincrementalmode: Set to incremental mode. Applies to single CPU conditions.

-xx:parallelgcthreads=n: Set the concurrency collector the number of CPUs used by the young generation collection method for parallel collection. The number of parallel collection threads.

This article is from the "Little Water Drop" blog, please make sure to keep this source http://wangzan18.blog.51cto.com/8021085/1692220

JVM principles explained and tuned

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

JVM principles explained and tuned

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support