JVM principle, memory model and GC mechanism

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. JVM Memory Model

Program counter, local method stack, method area, Java stack, Java heap, and other implied registers.

1.1 Program Counter:

A program counter is a small memory space that can be viewed as the line number indicator of the bytecode executed by the current thread. Basic functions such as branching, looping, jumping, exception handling, and thread recovery need to be relied upon for this counter to complete.

Because the multithreading of Java virtual Machines is achieved through the way in which threads rotate and allocate processor execution time, a processor (a kernel for multi-core processors) executes only the instructions in one thread at any given moment. Therefore, in order to return to the correct execution position after the thread switch, each thread needs to have a separate program counter, the counters between the threads do not affect each other, isolated storage, we call such memory areas as " thread-Private " memory.

If the thread is executing a Java method, this counter records the address of the executing virtual machine bytecode instruction, and if the Natvie method is executing, the counter value is empty (Undefined).

1.2 Local method stack:

The local method stack (Native methodstacks) plays a very similar role to the virtual machine stack, except that the virtual machine stack performs Java methods (that is, bytecode) services for the virtual machine, while the local method stack is the Native method service that is used by the virtual machine. The language, usage, and data structure of the methods used in the local method stack in the virtual machine specification are not enforced, so the specific virtual machine is free to implement it. Even some virtual machines (such as the Sun HotSpot virtual machine) directly combined the local method stack with the virtual machine stack .

As with virtual machine stacks, the local method stack area also throws Stackoverflowerror and OutOfMemoryError exceptions.

1.3 Method Area:

The method area is inside a JVM instance, and the type information is stored in a memory logical area called a method area. Type information is extracted by the class loader from the class file when the class is loaded. Class (Static) variables are also stored in the method area.

Simply put, the method area is used to store the metadata information for the type, and a. class file is the representation of the class before it is used by the Java Virtual machine, which is loaded, connected (validated, prepared, parsed) and initialized once the class is used. The load (and then the result is transformed from the. class file into a specific data structure in the method area). This data structure will store the following information:

1.3.1 Type Information

Fully qualified name of this type

The fully qualified name of the direct superclass of this type

Whether this type is a class type or an interface type

This type of access modifier

Ordered list of fully qualified names of any direct hyper-interfaces

1.3.2 Field Information

Field name

Field type

Modifier for field

1.3.3 Method Information

Method name

Method return type

Number and type of method arguments (in order)

Modifiers for the method

1.3.4 Other Information

All class (static) variables except constants

A pointer to ClassLoader

A pointer to a class object

Chang (constant data and symbolic references to other types)

The method area mainly has the following several characteristics :

1, the method area is thread-safe. Because all threads share the method area, data access in the method area must be designed to be thread-safe. For example, if two threads attempt to access the same class in the method area, and the class has not been loaded into the JVM, only one thread is allowed to load it, while the other threads must wait

2, the size of the method area does not have to be fixed, the JVM can be dynamically adjusted according to the application needs. Also, the method area is not necessarily contiguous, and the method area can be freely allocated in a heap (even the JVM's own heap).

3, the method area can also be garbage collection, when a class is not in use (not touch), the JVM will unload this class, garbage collection

You can limit the size of the method area through the -xx:permsize and -xx:maxpermsize parameters.

1.4 Virtual Machine stacks (Java stack)

The thread is private , and its lifecycle is the same as the thread. The virtual machine stack describes the memory model that is executed by the Java method: Each method is executed at the same time creating a stack frame (stack frame) to store information such as local variables table, Operation Stack, dynamic link, method exit, and so on .

About the stack frame, refer to the JVM runtime stack frame structure detailed

The animation is produced by the result of a frame-by-frame picture continuous switching result, in fact, the operation of virtual machines and animation is similar, every program running in the virtual machine is also the result of a lot of frame switching, but these frames are stored in the method of local variables, operand stacks, dynamic links, method returns the address and some additional additional information. Each method is invoked until the completion of the process, corresponding to a stack frame in the virtual machine stack from the stack to the process.

For the execution engine, only the stack frame at the top of the stack is valid, called the current stack frame, in the active thread, and the method associated with the stack frame is called the current method . all bytecode directives run by the execution engine operate only against the current stack frame .

1.5 Heaps

The heap is the largest chunk of memory managed by the Java Virtual machine. The Java heap is an area of memory that is shared by all threads and is created when the virtual machine is started. The only purpose of this memory area is to hold the object instance, where almost all object instances allocate memory. However, with the development of JIT compiler and the gradual maturation of escape analysis technology, stack allocation, scalar substitution optimization technology will lead to some subtle changes occur, all the objects are distributed on the heap and gradually become less "absolute".

The heap is the main area of garbage collector management and is therefore often referred to as a "GC heap ."

1.5.1 heap memory and stack memory need description:

The underlying data types are allocated directly in the stack space, and the form parameters of the method are allocated directly in the stack space, and are recycled from the stack space when the method call is completed. Reference data types need to be created with new, both allocating an address space in the stack space and allocating the object's class variables in the heap space. The reference parameters of the method are allocated an address space in the stack space and point to the object area of the heap space, which is reclaimed from the stack space when the method call is completed. When the local variable is new, it allocates space in the stack space and the heap space, and when the local variable life cycle is over, the stack space is immediately reclaimed and the heap space area waits for GC to recycle. The literal parameter passed in when the method call is allocated first in the stack space and retracted from the stack space after the method call completes. String constants, Static are assigned in the data area, this is allocated in heap space. The array allocates both the array name in the stack space and the actual size of the array in the heap space.

Summary

name	characteristic	function	Configuration Parameters	Exception
Program counter	Small memory footprint, thread-private, Life cycle is the same as thread	Roughly byte code line number indicator	No	No
Virtual Machine Stack	Thread is private, life cycle is the same as thread, using contiguous memory space	Java method Execution memory model, store local variables table, Operation Stack, dynamic link, method exit and other information	-xss	Stackoverflowerror OutOfMemoryError
Java heap	Thread sharing, life cycle is the same as the virtual machine, you can not use a contiguous memory address	Save object instances, all object instances (including arrays) are allocated on the heap	-xms -xsx -xmn	OutOfMemoryError
Method area	Thread sharing, life cycle is the same as the virtual machine, you can not use a contiguous memory address	Store data such as class information, constants, static variables, just-in-time compiler-compiled code that has been loaded by the virtual machine	-xx:permsize: 16M -xx:maxpermsize 64M	OutOfMemoryError
Run a constant-amount pool	A part of a method area that is dynamic	Store literal and symbolic references

2.GC mechanism

The garbage collector usually has to do two things: detect the rubbish and recycle the rubbish. How to detect the rubbish. In general, there are several methods: 2.1 Reference counting method:

Add a reference counter to an object, and whenever there is a place to reference it, the counter adds 1, and the reference expires by 1.

Well, here's the problem, if I have two objects A and B, referencing each other, there is no other object that references them, and actually these two objects are already inaccessible, that is, the garbage object we are talking about. But reference to each other, the count is not 0, resulting in the inability to recycle, so there is another way:
2.2 Accessibility Analysis algorithm:

The root set object is searched as the starting point, and if an object is unreachable, it is a garbage object. The root set here typically includes objects referenced in the Java stack, objects referenced in the method area constant pool, objects referenced in the local method, and so on.

In short, when the JVM does garbage collection, it checks that all objects in the heap are referenced by the root set objects, and that objects that cannot be referenced are reclaimed by the garbage collector.
2.3 The general recovery algorithm also has the following several: 2.3.1 According to the basic recycling strategy

(1) Mark-Clear (mark-sweep)

The algorithm, like the first name, is divided into two phases: marking and purging. Mark all objects that need to be reclaimed and then unify the collection. This is the most basic algorithm, and subsequent collection algorithms are extended based on this algorithm.

Insufficient: inefficient; After the mark is cleared, a lot of fragmentation occurs. The effect chart is as follows:

(2) copy (copying)

This algorithm delimits the memory space to two equal regions, using only one of the regions at a time. When garbage collection, traverse the current area of use and copy the objects in use to another area. This algorithm only processes the objects in use each time, so the replication cost is small, and after the replication of the past can also be a corresponding memory collation, there will be no "fragmentation" problem. Of course, the disadvantage of this algorithm is also very obvious, that is, twice times the memory space required. The effect chart is as follows:

(3) mark-Finishing (mark-compact)

This algorithm combines the advantages of "tag-clear" and "replicate" two algorithms. It is also divided into two phases, the first phase marks all referenced objects from the root node, the second phase traverses the entire heap, clears the unmarked objects and "compresses" the surviving objects into one of the heaps, discharging them sequentially. This algorithm avoids the fragmentation problem of "tag-purge" and avoids the space problem of the "copy" algorithm. The effect chart is as follows:

2.3.2 by the way of partitioning

(1) Incremental collection (incremental collecting): real-time garbage collection algorithms, i.e. garbage collection while the application is in progress. Don't know why the collector in JDK5.0 does not use this algorithm.

(2) Generational collection (generational collecting): a garbage collection algorithm based on the analysis of object lifecycle. The object is divided into the young generation, the old generation, the permanent generation, the different life cycle objects using different algorithms (one of the above methods) for recycling. Now the garbage collector (starting from j2se1.2) uses this algorithm.

2.3.3 by System thread

(1) Serial collection: Serial collection uses single-threaded processing of all garbage collection work, because it is easy and efficient to implement without multithreading interaction. However, its limitations are also obvious, that is, the advantages of multiprocessor can not be used, so this collection is suitable for single processor machines. Of course, this collector can also be used on multiprocessor machines with small amounts of data (around 100M).

(2) Parallel collection: The Parallel collection uses multithreading to process garbage collection work, thus the speed is fast, the efficiency is high. And theoretically the more the number of CPUs, the more can reflect the advantages of parallel collectors.

(3) Concurrent collection: in comparison to serial and parallel collection, the first two of the preceding two are in the process of garbage collection, the entire environment needs to be paused, and only the garbage collector is running, so the system will have a significant pause in garbage collection, and the pause time will be longer because the heap is larger.

(Note: reference from http://pengjiaheng.iteye.com/blog/520228)

the following GC Generational recovery section, the main reference from: http://blog.csdn.net/suifeng3051/article/details/48292193 thank the original author. 2.4 Virtual machine GC process 2.4.1 Why to recycle

At the outset, the JVM's GC was done with tag-purge-compression, which is not very efficient, because as objects are allocated more and more, the list of objects becomes more and more time-consuming, scanning and moving more and more slowly, resulting in slower memory recycling. However, according to the analysis of Java application, found that most of the objects are very short survival time, only a small number of data survival cycle is relatively long, see the face of Java Object Memory survival time statistics:

As can be seen from the chart, most objects live for very short periods of time, and fewer objects are allocated. GC process for 2.4.2 virtual machines

After the introduction, we know why the JVM has to be recycled, so let's take a look at the entire recycling process.

In the initial phase, the newly created object is assigned to the Eden area, and the survivor two blocks of space are empty.

When Eden was full, minor garbage was triggered.

After scanning and marking, the surviving objects are copied to the S0, and the surviving objects are recycled.

In the next minor GC, the Eden area is consistent with the above, the objects that are not referenced are recycled, and the surviving objects are copied to the Survivor area. In the survivor area, however, all of the data in S0 are replicated to S1, and it should be noted that two of the objects moved to S0 during the last minor GC process are 1 older when copied to S1. At this time the S0 area of Eden is emptied, all surviving data is replicated to the S1 area, and the S1 area has an age-different object, as shown in the following illustration:

The next time the MINORGC repeats the process, this time survivor two areas are swapped, the surviving objects are copied to S0, the surviving object age plus the 1,eden area and another survivor area are emptied.

The following is a demonstration of the promotion process, and after several minor GC, when the age of the surviving object reaches a threshold (configurable by parameter, default is 8), it is promotion from the younger generation to the old age.

With MINORGC again and again, there will be new objects promote to the old age.

It basically covers all the recycling processes in the young generation. Eventually, the MAJORGC will occur in the old age, and the space of the old will be cleared and compressed.

From the above process can be seen, the Eden area is a continuous space, and survivor always have one is empty. After a GC and replication, a survivor holds the currently alive object, and the contents of the Eden and another survivor area are no longer needed and can be emptied directly to the next GC, where the two survivor roles are interchanged. Therefore, this way of allocating memory and cleaning up memory is extremely efficient, this garbage collection is the famous "Stop-copy (stop-and-copy)" Cleanup method (Copying the Eden area and an object still alive in a survivor to another survivor) does not mean that stopping the copy cleanup is efficient, but it is also only efficient in this case (based on the fact that most objects have a very short lifetime), and if you use stop replication in the old age, is very inappropriate.

Older generations store more objects than the younger generation, and there are large objects, the old age of memory cleanup, if using a stop-copy algorithm, it is very inefficient. In general, the algorithm used in the old age is the tag-compression algorithm, which is to mark the surviving object (there is a reference) and move all the surviving objects to one end to keep the memory contiguous. In the event of a minor GC, the virtual opportunity checks whether the size of the older age is greater than the amount of space left in the old age for each promotion, or if it is greater than, triggers a full GC directly, otherwise, see if the-xx:+handlepromotionfailure is set (Allow warranty failure) , if allowed, the memory allocation failure can be tolerated, and if not, the full GC (which means that if the-xx:+handle promotionfailure is set, the trigger MINORGC will trigger the full GC at the same time, if the MINORGC is not allowed. Even in the old age there is a lot of memory, so it is best not to do so.

On the method area that is, the collection of permanent generation, there are two kinds of recycling in the permanent generation: constants in a constant pool, useless class information, and a simple collection of constants that can be recycled without reference. For unwanted classes to be recycled, 3 points must be guaranteed:

1. All instances of the class have been reclaimed by
2. The ClassLoader of the load class has been reclaimed by
3. Class object is not referenced (that is, where the class is not referenced by reflection)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More