Java Virtual Machine-GC garbage collection mechanism analysis

Source: Internet
Author: User

Java garbage collection (garbage COLLECTION,GC)

Java supports dynamic memory allocation, garbage collection, and C + + is not supported. I guess that's one of the reasons why Java is born out of C + +.

History of the GC

The history of GC is much older than in Java, as in 1960, MIT Lisp was the first language to actually use memory dynamic allocation and garbage collection.

3 things to consider in GC
    • What memory needs to be recycled?
    • When do I recycle?
    • How to recycle?

Let's start with these three questions and take a deeper look at what the JVM GC does for us.

1. What memory needs to be recycled?

As we all know, the computer memory used by the JVM stack and the heap is uniformly managed by the JVM, except that the memory allocation and recycling of the elements in the stack is fully managed by the JVM, while the memory allocations from the objects in the heap are controlled by our Java programmers, and memory recycling is the responsibility of the JVM GC.

The life cycle of elements in the stack stack of the JVM ends with the end of the method stack or the end of the line stacks, and the amount of memory space allocated by the elements in each stack is determined when the Java code is compiled into a class bytecode file, so the JVM's memory management of the stack is relative to the heap. It's going to be a little simpler. Here the knowledge of the JVM stack is not carefully excavated, and then share. This article only targets objects in the heap memory .

In layman's words, all objects in the JVM heap memory are the target objects of the JVM GC recycle, but only the objects that have been determined to be dead are recycled by GC, or the JVM is out of order: an object is happily moving bricks and suddenly killed by an invisible hand, not even a corpse left behind. Its relatives will be very anxious Ah, the object is missing, in the end is Dead is alive, live to see people die to see corpse ah.

The Java World is a legal society, to do anything to be justified, then raises a question:

How to judge a heap in memory for an object that is dead or alive?

Only objects that have been tagged with "death" will be recycled by GC. Then we can "what objects will be recycled by GC?" "The question is a little bit of an angle:" How to tell if an object is dead and give it a ' death ' tag? ”

First, we introduce two algorithms to "sentence the death penalty" to the object:

    • Reference counting algorithm
    • Accessibility analysis algorithm
Reference counting algorithm (Reference counting)

Principle: When creating an object, add a "reference counter"to each object, and the value of the counter is incremented by 1 whenever there is a reference to it; Conversely, when a reference to the object fails, the counter value is reduced by 1. At any moment, the counter value is 0 o'clock, which means that the object is not available, or is dead.

The principle of reference counting algorithm is very simple, and the judgment of the death object is very efficient, in most cases, this is a very good decision algorithm. There are some well-known application cases: Microsoft's COM (Component Object Model) technology, the Python language all use the reference counting algorithm.

But! In the mainstream JVM, there is no choice of reference counting algorithm to manage memory, the main reason: it is difficult to solve the problem of circular reference between objects.

To give a simple chestnut:

public class Test {    public Object obj = null;        public static void main(String []args){        // 创建并初始化两个 Test 对象        Test a = new Test();        Test b = new Test();        // 让两个对象 相互循环引用        a.obj = b;        b.obj = a;                // 接下来是关键点:让两个对象的引用失效        a = null;        b = null;                // 假设执行了 GC        System.GC();    }}

The question comes, after execution System.GC(); , will A and B two objects be recycled?

We can get the answer by configuring Eclipse.ini to print GC logs on the Eclipse tool, and then comparing the heap space before the GC to the size of the heap after the GC is executed: not recycled. Therefore, it can be determined that the JVM does not use the reference counting algorithm as an algorithm for determining whether an object survives.

Extension: How to turn on Eclipse print GC log function

Accessibility Analytics Algorithm (reachability analysis)

In the mainstream of commercial programming languages (Java, C #, and the Old Lisp mentioned earlier), the main criterion is to determine whether an object survives by means of the reachability analysis.

Algorithm basic idea: through a series of objects called "GC Roots" as the starting point, starting from these nodes to search down, the path of the search is called the reference chain (Reference Chain), when an object to the GC Roots no reference chain connected (in the case of mathematical graph, it is from When the GC roots to this object is not available, it proves that this object is not available.

Legend of GC Roots:

In the example, the object 5, Object 6, and object 73 objects are associated, but they are not accessible to GC Roots, so they will be judged as recyclable objects.

In Java, there are several objects that can be used as GC Roots:

    • Objects referenced by stack frames in the JVM stack
    • Objects referenced by static member properties
    • Objects referenced by constants
    • Objects referenced in this local law (Native)

A simple summary of the Accessibility analysis algorithm: As long as the object in the Java heap disconnects from the last GC Roots, this object becomes the GC's recycling target.

2. When to recycle?

Because the creation of objects in the Java heap is controlled by our Java programmers, therefore:

    • Creation time is uncertain
    • The required memory space for creation is not deterministic

When the creation time is uncertain, how much memory space is needed is uncertain, and the JVM does not know when to prepare enough resource space for the object to create, in order to avoid 当需要为创建对象分配内存空间时,却已经没有可用的内存空间 this embarrassing situation, the JVM GC will need to secretly manipulate the JVM heap memory, Reclaim memory space that is consumed by objects that are already dead.

So when is this "timely" ? We all know the time uncertainty of GC execution, but this does not mean that the GC is on the go, so let's explain the GC timing :

In Java, after the accessibility analysis algorithm is determined, become the object of GC recovery target, not sentenced to the death penalty to execute immediately, but a reprieve.

To actually declare an object to die, at least two times to go through the tagging process:

    1. Marks the first time that an object has been lost with the GC Roots reference chain, and this object has a chance of rebirth at this time.
    2. After the first tag is still not added to the reference chain of the object, will be marked the second time to determine the recycling

If an object is found to have no reference chain connected to the GC Roots after accessibility analysis, it will be marked for the first time by the GC and a filter decision is made, and the criteria for filtering is whether it is necessary for the object to execute the Finalize () method.

When the object does not overwrite the Finalize () method, or if the Finalize () method has just been called by the JVM, the JVM will assume that there is no need to execute the Finalize () method and lose the chance of rebirth.

If this object is judged to be necessary to execute the Finalize () method, then the object is placed in a queue called F-queue, and later by a low-priority Finalizer thread to execute the Finalize () method, as long as the Finalize () During the execution of a method, the object is associated with any object on the reference chain, and the object is removed from the Recycle list when the second token is marked, and the object is completely reclaimed if it still does not have a connection at the second time.

For the finalize execution process, refer to this

The Finalize () method of an object is called only once by the system, which means that the Resurrection skill can only be used once.

With regard to the Finalize () method, it is not recommended to use, the uncertainty is too large to guarantee the order of calls of each object, finalize () can do, try-finally can do better, and more timely.

We change a popular point of the statement summed up: The first mark is to give this object to the court of a trial judgment notice, this object either appeal, appeal, and the hope of winning, or do nothing to wait for the punishment, the second mark is to those who did not win the target of chopping lijue.

Extension: Reference

The two algorithms mentioned earlier, whether it is the reference counting algorithm to determine the number of objects, or the accessibility analysis algorithm to determine whether the object's reference chain can be reached, to determine whether the object is the key to survival, are related to "reference." Here is an extension of the concept of "referencing".

Prior to JDK 1.2, the definition of references in Java was traditional:

If the value stored in the data of type reference represents the starting address of another piece of memory, it is said that this memory represents a reference.

This is also what we often think of as a Java beginner reference concept, in this definition, an object is only referenced or not referenced in two states, after JDK 1.2, the concept of Java extension, the reference is divided into: strong references, soft references, weak references, virtual references, the 4 The intensity of the reference decreases in turn.

  • Strong reference (Strong Reference): Refers to an object reference that is commonly found in program code, similar to the Object obj = new Object(); creation of "". As long as a strong reference exists, the JVM GC does not reclaim the referenced object. When there is not enough memory space, the JVM prefers to throw a outofmemoryerror error to terminate the program, and it does not rely on the random collection of strongly referenced objects to resolve out-of-memory issues.
  • Soft references (Soft Reference): Used to describe some objects that are useful but not necessary. For objects associated with soft references, these objects will be included in the collection until the system is about to have a memory overflow exception, but will not be recycled immediately, and these objects can still be used by the program, and the objects in the collection scope will be recycled only if the memory space is indeed insufficient.
  • Weak references (Weak Reference): are also used to describe non-required objects, but the strength is weaker than soft references, the life cycle is more ephemeral, and once a GC thread scans the area of memory it governs, once an object with only a weak reference is found, it will be reclaimed regardless of the current memory space.
  • Virtual reference (Phantom Reference): A virtual reference does not determine the life cycle of an object, and if an object holds only a virtual reference, it can be recycled at any time by GC, as with no reference, and of course we cannot get an object instance from a virtual reference. The only purpose of adding a virtual reference to an object is to be able to receive a system notification when the object is recycled by GC.
3, how to recycle?

Because of the space problem, this article only carries on the analysis to the realization garbage collection algorithm, does not do too much realizes the detail description.

Garbage collection is now commonly used in 3 kinds of algorithm ideas:

    • Tag-Purge algorithm
    • Replication Algorithms
    • Tagging-sorting algorithms
    • Generational collection algorithm (aggregate of the above three ideas)
① tag-Clear (Mark-sweep) algorithm

The mark-and-sweep algorithm is the most basic recycling algorithm. The algorithm is divided into two stages: "Mark" and "purge".

    1. First mark out all objects that need to be recycled
    2. After the tag is complete, all tagged objects are collected uniformly

The reason is that it is the most basic algorithm, because the subsequent recovery algorithm is based on this idea, and to improve its shortcomings.

There are two obvious disadvantages to the tag-purge algorithm:

    • Efficiency issues. Two processes are not efficient to mark and clear
    • Space problems. After the mark is cleared, there is a large amount of discontinuous memory fragmentation, too much memory fragmentation can lead to the need to allocate large objects later, you cannot find enough contiguous memory, and have to trigger another garbage collection action in advance, garbage collection is a resource-intensive action, the execution frequency is too high, Can affect the overall execution efficiency of the program.

Let's take a look at the memory changes before and after using the "mark-sweep" algorithm:

② Replication (Copying) algorithm

In order to solve the efficiency problem of "mark-sweep algorithm", "Copy algorithm" was born.

The idea of a replication algorithm: Divide the available memory by capacity into two blocks of equal size, using only one of them at a time. Use only one piece at a time, and when this piece of memory is exhausted, copy the surviving object to the other, and then clean out the used memory space once.

This allows each GC to operate on half of the entire heap memory, memory allocation is not considered complex problems such as memory fragmentation, each allocation only need to move the heap top pointers, in order to allocate memory, simple, efficient operation.

The downside is also obvious: it's too expensive to shrink the available memory to half its original size.

However, the current mainstream virtual machines are using a replication algorithm for garbage collection. Why do we have to use the shortcomings so obvious algorithm? This involves another problem: JVM heap memory generational.

Here we briefly describe the following 堆内存分代 concepts:

JVM heap memory is not a pot of stew, but the heap memory is divided into generations (generation, old generation), the purpose of generational is to optimize the performance of GC, like hard disk to partition, to build folder management files, easy to find and manage resources.

The HotSpot version of the JVM divides the Cenozoic memory area into three parts: 1 larger Eden areas and 2 smaller Survivor areas (named from and to). The default Eden and Survivor space sizes in the HotSpot JVM are 8:1, which means that each new generation of available memory space is 90% (80% +10%) of the total Cenozoic capacity, and only 10% of it is wasted.

The area of memory that is actually available in the Cenozoic is only: Eden and one of the Survivor (the first time the From,from is full and transferred to the to).

In general, newly created objects are assigned to the Eden area (the Eden area is first because some objects are larger, but not necessarily resident), and if the objects in Eden are still alive after the first Minor GC, they will be moved to the Survivor area. Object in the Survivor area each time Minor GC, age will increase by 1 years, when its age to a certain extent, will be moved to the old age.

Extensions: About Eden in the Cenozoic and two Survivor

In addition, make an extension about the new generation and the old age:

The new generation and the old age:

  • New generation: Newly created, short-lived objects are generally stored in the Cenozoic heap area
  • Old age: In the new generation, the object of surviving a certain age will be transferred to the old age heap area.

New Generation GC and old age GC:

  • New Generation GC (Minor GC): Refers to the garbage collection action occurring in the Cenozoic. Because most Java objects have a "Mirnor" feature, the GC is very frequent and faster to recycle.
  • Old age GC (Major gc/full GC): Refers to garbage collection actions occurring in the old age. Major GC is typically 10 times times slower than the Minor GC.
③ Labeling-Finishing (mark-compact) algorithm

Using the replication algorithm, when the object survival rate is relatively high, to copy more content, the corresponding operation efficiency will be reduced. In addition, if the memory space overall usage requirements more than half, such as in memory 100% of the objects are survival of the extreme situation, with the replication algorithm is not reliable, especially in the 老年代 , can not use the replication algorithm, which spawned another 老年代 characteristic of the algorithm: marker-collation algorithm.

The idea of the algorithm is that the process of tagging is consistent with the "mark-sweep" algorithm, except that the next step is not to clean up the recyclable objects directly, but rather to let all surviving objects move toward one end of the memory space and then directly clean out the memory outside the boundary of the other end of the memory, using the graph to say:

④ Generational collection (generational Collection) algorithm

The generational collection algorithm is the main algorithm used in the garbage collection of commercial virtual machines at present.

In fact, there is no special new idea of the collection algorithm, but according to the life cycle of the object, the memory is divided into the Cenozoic and the old age, and then according to the different memory regions, using the recovery algorithm in accordance with their respective characteristics. For example: In the Cenozoic, because each GC will find a large number of dead objects, only a small number of survival, the use of replication algorithm is more efficient recovery, and in the old age of the object survival rate is high, there is no additional space for its redundancy, you must use the "mark-clear" or "tag-collation" algorithm for recycling.

At this point, the knowledge about Java virtual machine garbage collection is shared here, thank you.

Reference: "In-depth understanding of Java Virtual machines: JVM advanced features and best practices"-Zhou Zhiming

Java Virtual Machine-GC garbage collection mechanism analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.