Java garbage collection mechanism

Source: Internet
Author: User

This article is the book summary of the third chapter of "in-depth understanding of Java Virtual Machine".

The Java memory structure and the process by which the Hotspot virtual machine manages objects in heap memory are described earlier. However, the creation of objects in Java programs is very frequent, and the size of the memory is limited, in order to reuse the memory, you need to garbage collection of objects in memory. In fact, this is also a difference between Java and C + +, in Java can be automated garbage collection, and C and C + + requires the programmer to manually reclaim objects that are no longer used.

Garbage collection in Java is an issue that is considered by a virtual confidential. So, in terms of virtual machines, what are the issues to consider if you want to collect garbage from virtual machine memory?

    • The memory in the Java Virtual machine is divided into the program counter, the virtual machine stack, the local method stack, the Java heap and the method area and so on which part reclaims the memory?
    • Determine the memory to be recycled, there must be a lot of content in memory, how to determine that the content is not necessary garbage?
    • The program keeps running and garbage collection is not going to work with the program, so when does garbage collection work?
    • The most important question is, how to recycle?

Java garbage collection mechanism is a very complex process, involves a lot of content, the above problem one solution.

1. Recycling Area

As you can see in the previous few, the program counters, virtual machine stacks, and local method stacks in Java memory are thread-private and the end of the thread is gone. Where the program counter is responsible for indicating the next instruction, stack frames in the stack with the method of entry and exit into the stack. The size of each stack frame is basically determined at compile time. So these areas do not need to consider memory reclamation, because the method ends or the thread stops, and the memory is recycled.

Unlike the three regions above, the Java heap and the method area are thread-shared. In the Java heap is the object that all threads create at run time, and the metadata information about the class is stored in the method area. When the program runs, we can determine which classes of metadata information needs to be loaded into the method area, which objects are created in the heap, that is, the memory allocation and recycling of this part is dynamic. And because of this, these two parts are where the garbage collector is concerned.

2, who is the rubbish?

First consider the Java heap that holds the object.

The vast majority of objects created in the program are in the Java heap, and the running of the program creates a large number of objects. But these objects are not always used, resulting in some garbage that will not be reused. This garbage occupies a valuable memory space, so it needs to be recycled. However, how can I be sure that the objects in the heap are rubbish?

A common algorithm is the reference counting algorithm, which, based on this consideration, adds a reference counter to the object, and whenever there is a reference to it, the value of the counter is incremented by 1 and the value of the counter is reduced by 1 when the reference is effective. When the value of the counter is 0 o'clock, the object cannot be used again.

The reference counting algorithm is simple, and the judgment efficiency is high. However, the mainstream Java Virtual machine does not use the reference counting algorithm to manage memory, because this algorithm is difficult to solve the problem of circular referencing between objects.

Consider the following code:

public class REFERENCECOUNTINGGC {public Object instance=null;private static final int _1mb=1024*1024; @SuppressWarnings ("unused") private byte[] bigsize=new BYTE[2*_1MB];p ublic static void Testgc () {REFERENCECOUNTINGGC obja=new REFERENCECOUNTINGGC (); REFERENCECOUNTINGGC objb=new REFERENCECOUNTINGGC (); obja.instance=objb;objb.instance=obja;obja=null;objb=null; System.GC ();} public static void Main (string[] args) {REFERENCECOUNTINGGC.TESTGC ();}}

There is an instance object in the REFERENCECOUNTINGGC class, and the two objects constructed in the test code OBJA and OBJB reference each other on the instance instance object. Thus, the reference count for each object is 2, and when two objects are assigned null, the reference count minus 1 becomes 1, which is not garbage according to the reference counting algorithm. However, it is clear that these two objects can no longer be accessed, which is garbage. In fact, after running, the Java Virtual Machine recycles the two objects as garbage.

So what is the method used in Java? is the accessibility analysis algorithm (reachability).

The basic idea of the accessibility analysis algorithm is to use a series of objects called "GC Roots" as the starting point, starting from these nodes to search down, the path of the search is called the reference chain (Reference Chain), when an object to the GC Roots no reference chain connected, that is, from the GC Roots The object is unreachable, it means that the object is not available. For example, the four objects on the left have a reference chain to the GC Roots, so it is available, and the three objects on the right to the GC Roots are unreachable, so it is not available.


In Java, the following types of objects can be used as GC Roots:

    • The object referenced in the virtual machine stack (the local variable table in the stack frame);
    • The object referenced by the class static property in the method area;
    • The object referenced by the constant in the method area;
    • The object referenced by JNI (that is, the native method) in the local method stack;

In fact, both of these methods involve reference to the object, that is, whether the object is garbage is related to the reference, so it is necessary to fully understand the reference in Java.

In fact, there are four references in Java. This is the extension of the reference concept after JDK 1.2, namely strong references (strong Reference), soft references (Soft Reference), weak references (Weak Reference), and virtual references (Phantom Reference). The intensity of the four references gradually weakened.

(1) Strong references

Strong references are commonly found in programs, such as "Object Obj=new object ()", as long as a strong reference exists, and the garbage collector does not reclaim the referenced object.

(2) Soft reference

Soft references are used to describe some objects that are useful but not necessary. For objects associated with soft references, these objects will be reclaimed for a second time before the system will have a memory overflow exception. If this collection does not have enough memory, a memory overflow exception will be thrown. The SoftReference class to implement a soft reference.

(3) Weak references

Weak references are also used to describe non-essential objects, but the strength is weaker than soft references, and the referenced objects only survive until the next garbage collection. The next time the garbage collector is working, the objects are reclaimed, regardless of whether the memory is sufficient. The WeakReference class implements a weak reference.

(4) Virtual reference

A virtual reference is the weakest reference, also called a phantom reference or phantom Reference. The existence of a virtual reference to an object does not affect its lifetime, nor can it obtain an object instance through a virtual reference. The only purpose of a virtual reference is to receive a system notification when the object associated with the virtual reference is collected by the collector. The Phantomreference class implements a virtual reference.

3, garbage may also waste to treasure

Rubbish can also be recycled and then used. In fact, even in the accessibility analysis algorithm can not reach the object, is not "dead", to really think that an object is garbage to collect, at least two times to mark the process: If the object is found unreachable after the accessibility analysis, then it is first marked and filtered, The criteria for filtering is whether this object is necessary to execute the Finalize () method. If the object does not overwrite the Finalize method, or if the Finalize method has been executed by the virtual machine, any virtual machine is not required to execute the Finalize method.

If the object is judged to be necessary to execute the Finalize method, the object is placed in a queue called F-queue and is later executed by a low-priority finalizer thread that is automatically created by the virtual machine. However, the virtual machine only triggers this method, but does not promise to wait for execution, because if an object's Finalize method executes slowly, or a dead loop occurs, it causes other objects in the F-queue object to wait, or even the entire garbage collection system crashes. Later, the GC will mark the second small-scale object in the F-queue, and if it is marked as unreachable, it will not be collected, and if it is still unreachable, it will be marked as garbage. The specific flowchart is as follows:


The following code shows what is described above.

public class FINALIZEESCAPEGC {public static finalizeescapegc save_hook=null;public void IsAlive () {System.out.println ( "Yes,i am Still Alive");} protected void Finalize () throws Throwable{super.finalize (); System.out.println ("Finalize Method executed!"); Finalizeescapegc.save_hook=this;} public static void Main (string[] args) throws Interruptedexception {save_hook=new finalizeescapegc (); Save_hook=null; System.GC (); Thread.Sleep (+); if (save_hook!=null) {save_hook.isalive ();} Else{system.out.println ("No,i am dead.");} Save_hook=null; System.GC (); Thread.Sleep (+); if (save_hook!=null) {save_hook.isalive ();} Else{system.out.println ("No,i am dead.");}}

The results are as follows:


The FINALIZEESCAPEGC class overrides the Finalize method, so it is considered necessary to perform the finalize in a filter after the GC marks the Save_hook for the first time as garbage. In the covered Finalize method, it assigns itself to the class's variable save_hook, saving itself successfully for the first time without being collected. But the second time, although the code is the same, but because the virtual machine has implemented a Finalize method, the GC does not think it is necessary to execute, in the second tag is also marked as garbage, so did not save themselves, as garbage collection.

4. Recovery method Area

In addition to the Java heap, garbage collection is also present in the method area. But the collection is less efficient here.

The method area, called the permanent generation in the hotspot virtual machine, collects two parts of the GC, discarding constants and useless classes. Collecting obsolete constants is similar to collecting objects in the Java heap. Take the collection of literals in a constant pool as an example, if a string "abc" is already in a constant pool, but there is no string object in the current system that is "ABC", that is, there is no object referencing "ABC" in the constant pool, there is no other reference to the literal, and if memory recycling occurs, And if necessary, "ABC" will be cleared out of the constant pool. Symbolic references to other classes (interfaces), methods, and fields in a constant pool are similar.

But it's a lot of trouble to judge whether a class is useless. To meet the following three conditions at the same time, a class is a useless class:

    1. All instances of the class have been reclaimed, i.e. no instances of the class exist in the Java heap;
    2. The ClassLoader that loaded the class have been recycled;
    3. The corresponding Java.lang.Class object of this class is not referenced anywhere, and it is not possible to access the methods of the class from anywhere by reflection.

The virtual machine can be recycled to meet the above three conditions. However, for a hotspot virtual machine, whether the recycle is set by the-XNOCLASSGC parameter.

5. Garbage collection algorithm

Now we know where to collect garbage and how to determine if an object is garbage. The next step is to consider how to collect garbage, the garbage collection algorithm. However, because the garbage collection algorithm involves a lot of program details, this article simply introduces the basic idea of the algorithm and its development process.

(1) Mark-Sweep algorithm

The tag-purge (Mark-sweep) algorithm is the most basic collection algorithm, and the algorithm name indicates that the algorithm's garbage collection process consists of two steps: tagging and purging. The process of judging garbage, as described earlier, is the tagging process, which cleans up objects marked as garbage during the cleanup process after labeling. The garbage collection algorithm is improved on the basis of this algorithm. This algorithm has two shortcomings: one is the efficiency of marking and clearing is not high, the second is a space problem, the mark after the purge will produce a large number of discontinuous memory fragmentation, too much space debris can cause the subsequent allocation of large chunks of memory failure, which will trigger another garbage collection operation. The execution process of the algorithm is as follows:


(2) Copy algorithm

The replication algorithm is designed to solve the problem that the tag-purge algorithm is inefficient, and it divides the available memory into two parts of equal size, one at a time. When one piece of memory is used up, it copies the surviving objects to another, and then cleans up the memory space that has been used once. This allows the entire half of the memory to be reclaimed each time, memory allocation does not need to consider the problem of memory fragmentation, as long as the top pointer to move the heap, in order to allocate the good. The execution process of the algorithm is as follows:


But this algorithm makes only half the memory available, and the cost is too high. Today's virtual machines use this method to reclaim the new generation, but instead of allocating 1:1, the heap is divided into larger Eden spaces and two smaller survivor spaces, each using Eden and a survivor space. When reclaimed, copies objects that are still alive in Eden and survivor to another survivor, and then cleans up Eden and used survivor space. The default Eden and survivor scale for the hotspot virtual machine is 8:1, that is, Eden Jian 80% space, survivor occupies 10% of the space, only 90% of the heap space can be used at a time.

However, we cannot guarantee that no more than 10% of the objects will survive each recovery, and when survivor space is not sufficient, additional memory space (old age) is required for the allocation guarantee, that is, if the survivor space is insufficient, the surviving objects go directly into the old age.

(3) marker-Collation algorithm

Replication collection algorithms require more replication operations when the object has a higher survival rate, and the efficiency is reduced. What's more, if you don't want to waste 50% of your space, you'll need extra space for the allocation guarantee to cope with the extreme situations where all the objects in memory are alive, so this algorithm is not used in the old age.

According to the characteristics of the old age, you can use another marker-collation (mark-compact) algorithm, marking the process and the tag-purge algorithm, but the next step is not to clean up the recyclable objects directly, but to tidy up the surviving objects, move the surviving objects to one end, and then clean out the memory outside the boundary. The execution of the algorithm is as follows:


In this way, there is no memory fragmentation problem.

(4) Generational collection algorithm

Today's virtual machines use the "generational collection" algorithm, which divides memory into chunks based on the life cycle of the object. Java heap space is generally divided into the new generation and the old age, so that the characteristics of each age can be used to the most appropriate collection algorithm. In the new generation, each garbage collection will have a large number of objects to die, only a small number of survival, so you can choose the replication algorithm, just copy a small number of surviving objects to complete the garbage collection. In the old age, where the survival rate of the object was high and no additional space was allocated to guarantee it, it had to be recycled using the tag-clear or mark-and-sweep algorithm.

6, the algorithm implementation of the hotspot

The object survival judgment and garbage collection algorithm are introduced from the theory point of view, then the implementation of the Hotspot virtual machine is introduced.

(1) Enumerating the root nodes

The GC Roots Accessibility analysis algorithm is used in the object survival decision, which can be used as a GC roots node primarily in global references (such as Constants and class static properties) and execution contexts (such as the local variable table in the stack frame), but many applications now have only hundreds of trillion megabytes in the method area. If you check the references here, you will inevitably consume a lot of time.

In addition, the sensitivity of the accessibility analysis to the execution time is also reflected in the GC pauses, as this work must be done in a snapshot that ensures consistency, that is, the execution of the system in the entire analysis process appears to be frozen at a certain point in time, it is not possible to analyze the process of object reference relationship is still changing circumstances, Otherwise, the results of the analysis cannot be guaranteed to be accurate. This is an important reason why the GC must pause all Java execution threads when it is in progress, and sun will call this event "Stop the World".

Since the current mainstream virtual machines are using the exact GC, there is no need to check all references and execution contexts when the system pauses, and the virtual machine has a way of knowing directly where the object's references are stored. In the implementation of the hotspot, a set of OOPMAP data structures are used to achieve this purpose, and at the time of class loading, the hotspot will compute what kind of offsets are in the object, and in the process of JIT compiling, It also records where the stack and register are referenced in a particular location. In this way, the GC can be directly aware of this information when it is scanned.

(2) Safety point

With Oopmap's assistance, the hotspot can quickly and accurately complete the enumeration of GC roots, but this can lead to a change in the reference relationship, or a lot of instructions for oopmap content changes, which consumes a lot of space if each instruction generates a corresponding oopmap.

In fact, the hotspot simply Records oopmap information in a specific location called a "safe point", where the program does not stop at all places to start the GC, and only stops when it reaches the security point. The selection of SafePoint can not be too little or too much, so the selection of the safety point is basically selected by the procedure "Whether the program has the characteristics of allowing the application to execute for a long time". Because each instruction executes very short time, the program is not likely because the instruction flow length is too long and long time execution, the most obvious feature of long time execution is the instruction sequence multiplexing, such as method call, loop jump and exception jump, and so on, the instructions with these characteristics will produce a security point.

Another problem for SafePoint is how to get all the threads (not including the threads that perform JNI scheduling) to the nearest security point when the GC occurs, and there are two scenarios: preemption and active interrupts. Preemptive interrupts do not require the execution code of the thread to actively cooperate, when the GC occurs, all the threads are first interrupted, and if the thread breaks are not on the security point, the recovery threads execute to the security point. However, there are few virtual machines that use this approach.

Another method is the active interrupt, that is, when the GC needs to interrupt the thread, do not directly to the thread operation, simply set a flag, each thread executes the active to poll this flag, found to be true when the suspension itself. The polling flag is coincident with the security point, and there is a place where the object needs to be allocated memory.

(3) Safe area

The safepoint mechanism ensures that when the program executes, it encounters a safepoint that can enter the GC within a very long period of time. But what if the program was not executed because it did not allocate CPU time or the thread was in sleep or blocked state? At this point the thread cannot respond to the JVM's interrupt request, cannot run to the security point and hangs, and the JVM cannot wait for the thread to re-execute. A safe zone is required.

A security zone is a code fragment in which the reference relationship does not change. It is safe to start a GC anywhere in the region, which means that the security zone is the extended security point.

When a thread executes code in a security zone, it first identifies itself as having entered a security zone, so that when the JVM launches the GC, it does not have to identify the threads that have entered the security zone. When a thread leaves the security zone, it checks to see if the system has completed an enumeration of GC roots or the entire GC process, and if it does, it continues to execute, or waits for a signal to safely leave the security zone.


Not to be continued

Add public number Machairodus, I will occasionally share some of the things I learned ~


Java garbage collection mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.