A common algorithm of JVM learning GC

Last Update:2017-04-05 Source: Internet

Author: User

Tags variable scope

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Source: Blog Park Zao Jianrong technology Blog--http://www.cnblogs.com/zuoxiaolong, thanks for sharing

What issues does the GC policy address?

Since it is necessary to carry out the automatic GC, there must be a corresponding strategy, and these strategies to solve the problems, roughly speaking, the main points are as follows.

1, which objects can be recycled.

2. When to reclaim these objects.

3, the use of the way to recover.

What algorithm is used by GC policy

About the three questions mentioned above, in fact, the main problem is the first, that is, which objects are recoverable, there is a relatively simple and intuitive approach, it is more efficient, known as the reference counting algorithm, the principle is: This object has a reference, then +1; Delete a reference, Then-1. Only objects with a collection count of 0 are used. The disadvantage is: (1) The problem of circular references cannot be handled. For example, objects A and B have field B, a, a.b=b and b.a=a, except that these 2 objects are no longer referenced, and in fact the 2 objects are no longer accessible, but the reference counting algorithm cannot reclaim them. (2) The method of reference counting requires a compiler's mate, and the compiler needs to generate additional code for this object. If the assignment function assigns this object to a reference, you need to increase the reference count for this object. Also, when the life cycle of a reference variable ends, the reference counter for this object needs to be updated. The method of reference counting is not actually used by the JVM because of its significant disadvantage. Imagine that, assuming that the JVM uses this GC strategy, the code below will not be expected to appear when the program ape is writing the program.

1 public class Object {2      3     Object field = NULL, 4          5 public     static void Main (string[] args) {6         Thread thread = new Thread (new Runnable () {7 public             void Run () {8                 object objecta = new Object (); 9                 Object Objectb = new Object ();//110                 Objecta.field = objectb;11                 Objectb.field = objecta;//212                 //to do something13                 Objecta = null;14                 OBJECTB = null;//315             }16         });         Thread.Start (); while         (true);         21}

This code seems a bit deliberate, but in fact in the actual programming process, is often present, such as two one-to-two database objects, each to maintain the other side of the reference, the last infinite loop just to keep the JVM does not exit, no practical significance.

For the GC we are using now, when the thread thread finishes running, objecta and OBJECTB are all used as objects to be recycled, and if our GC uses the reference counting algorithm mentioned above, the two objects will never be recycled. Even if we show the object as null after use, it doesn't work.

Here LZ roughly explain, in the code of the LZ callout 1, 2, 33 numbers, when the 1th place after the statement execution, two objects of the reference count is all 1. When the 2nd place statement is finished, the reference count of two objects becomes all 2. When the statement in the 3rd place is finished, that is, after the two are all empty, the reference count is still 1. Based on the recycling rules of the reference counting algorithm, the reference count is not recycled when it is not returned to 0.

Root Search algorithm

Due to the shortcomings of the reference counting algorithm, the JVM typically uses a new algorithm called the root search algorithm. The way it is handled is to set up several root objects, which are considered to be recyclable when any one of the root objects is unreachable to an object.

Take it for example, OBJECTD and objecte are interrelated, but because the GC roots to these two objects is unreachable, so eventually D and E will still be used as GC objects, if the reference counting method, then a-e five objects will not be recycled, when it comes to GC roots (GC root), In the Java language, there are several objects that can be roots as GC:

1. The referenced object in the virtual machine stack.

2. The object referenced by the class static property in the method area.

3. The object referenced by a constant in the method area.

4. The referenced object of JNI in the local method stack.

The first and fourth are the local variables of the method, the second is clearly defined, and the third is the constant value declared final.

The root search algorithm solves the basic problem of garbage collection, that is, the first problem mentioned above, but also the most critical problem, is which objects can be recycled, but the garbage collection obviously need to solve the latter two problems, when and how to recycle, based on the root search algorithm, In the modern virtual machine implementation, the garbage collection algorithm mainly has three kinds, namely the mark-Clear algorithm, the copy algorithm, the marking-sorting algorithm , these three kinds of algorithms have expanded the root search algorithm, but they understand still very good understanding.

First, we recall the root search algorithm mentioned in the previous chapter, which solves the problem of which objects we should recycle, but it is clearly not a responsibility for garbage collection, because we want to do garbage collection while the program (which is the Java program we run on the JVM) runs. You must let the GC thread and the thread in the program cooperate with each other in order to successfully recycle the garbage without affecting the program running.

To achieve this, the tag/purge algorithm came into being, and it was done by stopping the entire program (also known as Stop the World) when the effective memory space in the heap was exhausted, and then doing two things: Available , the first item is the tag, and the second item is cleared .

(1) Tag: The process of tagging is actually traversing all GC Roots and then marking all GC Roots objects as surviving objects.

(2) Clear: The process of purging will traverse all objects in the heap and erase all unmarked objects.

In fact, these two steps are not particularly complex and easy to understand. LZ in popular words to explain the mark/clear algorithm, that is, when the program is running, if the memory that can be used is exhausted, the GC thread will be triggered and the program is suspended, and then the surviving objects are marked again, and eventually all unmarked objects in the heap are cleared away , Next, let the program resume running.

Below LZ gives you a set of pictures describing the process above, combined with the picture, we come to the intuitive look at the process, first of all, the first picture.

This graph represents the state of all the objects during the program's operation, their flag bits are all 0 (that is, unmarked, the following default 0 is unlabeled, 1 is marked), assuming that the effective memory space is exhausted, the JVM will stop the application running and turn on the GC thread, and then start marking work. Following the root search algorithm, the state of the object is the same as when it is marked.

As you can see, all objects accessible from the root object are tagged in order to survive by the root search algorithm, and the first stage tag is completed. Next, the second phase cleanup is performed, and after the cleanup, the remaining objects and the state of the objects are shown.

As you can see, objects that are not tagged will be reclaimed and erased, and tagged objects will be left, and the marker bits will be returned to 0. Then needless to say, wake up the stopped program thread, let the program continue to run can, in fact, this process is not complicated, even can say very simple, you say right. But one of them is worth the LZ mention, that is why you have to stop the operation of the program? This is not difficult to understand, LZ, for the simplest example, assuming that our program and GC threads are running together, you can imagine such a scenario.

Let's say we just tagged the rightmost object in the graph, and for a moment, the result is a new object B in the program, and a object can reach the B object, but since the A object is already marked at the end, the B object's marker bit is still 0, because it misses the mark stage, So when the next turn to the purge phase, the new object B will be forced to remove the hard. In this way, it is not difficult to imagine the result that GC threads will cause the program not to work correctly. The above result is of course unacceptable, we just new an object, the result after a GC, suddenly become null, how to play?

Flag/Purge Algorithm disadvantages

1, first, its disadvantage is the low efficiency (recursive and full heap object traversal), and in the GC, the need to stop the application, which will lead to a very poor user experience, especially for interactive applications is simply unacceptable. Imagine, if you play a site, this site one hours to hang five minutes, you still play?

2, the 2nd main drawback, it is this way to clean up the free memory is not continuous, this is not difficult to understand, our death objects are immediately appear in the various corners of the memory, now remove them, the layout of the memory will naturally be messy. To cope with this, the JVM has to maintain a free list of memory, which is another overhead. And when allocating array objects, it's not easy to find contiguous memory space.

Replication Algorithms

We first look at the replication algorithm, the replication algorithm divides memory into two intervals , at any point in time, all dynamically allocated objects can only be allocated in one interval (called the active interval), while the other interval (called the idle interval) is idle, when the effective memory space is exhausted , the JVM suspends the program and opens the copy algorithm GC thread. The GC thread then copies all the surviving objects in the active interval to the idle interval and arranges them exactly as memory addresses, while the GC thread points to the new memory address that updates the memory reference address of the surviving object. At this point, the idle interval has been swapped with the active interval, and the garbage objects are now all in the original active range, which is now the idle interval. In fact, when the active interval is converted to a space interval, the garbage object has been reclaimed all at once, LZ gives you a picture to illustrate the problem, as shown below.

In fact, this diagram is still an example of the previous chapter, but at this time the memory is divided into two parts of the replication algorithm, below we look at the copy algorithm after the GC thread processing, two areas will become what, as shown below.

As you can see, the objects 1 and 4th are cleared, while the 2, 3, 5, and 6th objects are arranged in the free interval, which is now within the active range. At this point the left half has become an idle interval, it is not difficult to imagine that after the next GC, the left will again become the active interval. Obviously, the replication algorithm compensates for the drawbacks of memory layout confusion in the tag/purge algorithm. But at the same time, its shortcomings are quite obvious.

1, it wasted half of the memory, this is too deadly.

2, if the survival rate of the object is very high, we can be extreme, assuming that 100% live, then we need to copy all the objects again, and all the reference address reset again. The time it takes to replicate this work will become noticeable when the target survival rate reaches a certain level.

So from the above description is not difficult to see, the replication algorithm to use, at least the survival rate of the object is very low, and most importantly, we have to overcome 50% of memory waste.

Tagging/sorting algorithms

The tagging/grooming algorithm is very similar to the mark/purge algorithm, and it is also divided into two stages: tagging and grooming.

(1) Tag: its first phase is exactly the same as the mark/Purge algorithm, which iterates through GC Roots and then marks the surviving object.

(2) Organize: Move all the surviving objects, and in the order of memory address sequence, and then the end memory address after the memory is all recycled. Therefore, the second stage is called the finishing phase.

The diagram before and after GC is very similar to the graph of the copy algorithm, except that there is no difference between the active interval and the free interval, and the process is very similar to the tag/purge algorithm, we look at the state and layout of the objects in memory before the GC, as shown in.

This diagram is actually the same as the mark/clear algorithm, but the LZ in order to facilitate the memory rules of the continuous arrangement, adding a rectangle to represent the memory area. If the GC thread begins to work at this point, then the marking phase is immediately followed by the start. This phase is the same as the marking phase of the mark/purge algorithm, and we look at the state of the object after the mark phase, such as.

There is nothing to explain, next, it should be the finishing phase, we look at when the finishing phase after processing, the layout of memory, such as.

As you can see, the tagged surviving objects will be sorted, sorted by memory address, and unmarked memory will be cleared. Thus, when we need to allocate memory to a new object, the JVM only needs to hold a start address of the memory, which is significantly less expensive than maintaining an idle list, and it is not difficult to see that the tagging/sorting algorithm can not only compensate for the shortcomings of the memory area dispersion in the tag/purge algorithm, but also eliminate the replication algorithm. The high cost of half memory, the only drawback of the tag/collation algorithm is that it is not efficient, not only to mark all surviving objects, but also to tidy up the reference addresses of all surviving objects. In terms of efficiency, the labeling/sorting algorithm is lower than the copy algorithm. here LZ gives you a summary of the three algorithms in common and their respective advantages and disadvantages, let you compare, presumably will be more clear, they have the following two points in common.

1, Three algorithms are based on the root search algorithm to determine whether an object should be recycled, and support root search algorithm can work properly, is the syntax of variables in the scope of the relevant content. Therefore, the most fundamental way to prevent memory leaks is to master the variable scope, rather than using the C + + memory management approach mentioned in the previous memory management chapter.

2. When the GC thread is turned on, or when the GC process starts, they pause the application (Stop the World).

Their difference LZ follows the following points to show you. (> indicates that the former is better than the latter, = The effect is the same)

Efficiency: Copy algorithm > tag/collation algorithm > tag/purge algorithm (the efficiency here is simply a comparison of time complexity, not necessarily the case).

Memory uniformity: Copy algorithm = tagging/sorting algorithm > tag/purge algorithm.

Memory utilization: Tag/grooming algorithm = mark/purge algorithm > copy algorithm.

You can see that the mark/clear algorithm is a relatively backward algorithm, but the latter two algorithms are based on this foundation, as the saying goes, "the draft does not forget to dig well man", so you also do not forget to mark/clear this algorithm predecessors. And, at some point, tagging/erasing can also be useful.

We have three algorithms to understand clearly, it can be seen that the efficiency of the copy algorithm is well-deserved boss, but the waste of too much memory, and in order to try to take into account the above mentioned three indicators, marking/sorting algorithm is relatively smoother, but the efficiency is still not satisfactory, It has a more marked stage than the copy algorithm, and more than a mark/clear process to defragment the memory. At last, we introduce the algorithm of God-level algorithm-----Generation collection in GC algorithm. So what does the generational collection algorithm do with GC?

Category of Objects

The previous chapter has said that the generational collection algorithm is for the different characteristics of the object, and the use of the appropriate algorithm, there is no actual new algorithm generated. It is not so much a fourth algorithm as a generational collection algorithm that it is a practical application to the first three algorithms. First of all, let's explore the different characteristics of the object, then LZ and everyone to choose the GC algorithm for these objects. The objects in memory can be roughly divided into three types according to the life cycle, and the following names are the LZ personal names.

1, the death of the object: the object of the XI-sheng, popular point is to live not long to die of the object. Example: Local variables for a method , temporary variables within a loop , and so on.

2, the old Immortal object: This kind of object generally live relatively long, the age is very big still not dead, but in the final analysis, the old immortal object also almost sooner or later die, but also just almost. Examples: cache objects, database connection objects, singleton objects (singleton mode) , and so on.

3, non-extinguishing objects: such objects generally once born almost undead, they will almost always immortal, remember, just almost immortal. Example : The object in the string pool (enjoy meta mode), the loaded class information, and so on .

The area of memory that the object corresponds to

Do you remember how the JVM divided the memory when I described memory management earlier? We have the above three objects to the memory area, that is, the dead object and the old undead objects are in the Java heap , and the object in the method area, the previous chapter we have said that for the Java heap, the JVM specification requires that the GC be implemented, Therefore, for the dead object and the old immortal object, death is almost inevitable ending, but also just almost, it is inevitable that some objects will always survive to the end of the application, but the JVM specification does not require the GC of the method area, so assume that a JVM implementation does not implement a GC for the method area, Then the Immortal object is the true Immortal object. Because the life cycle of an Immortal object is too long, the generational collection algorithm is designed for the Java heap, that is, for dead objects and old undead objects .

Object Recycling for Java heap (aborted objects and old undead objects)

With the above analysis, we will look at how the generational collection algorithm handles the memory recovery of the Java heap, that is, the recovery of the dead object and the old object. Dying object: This kind of object is going to live and dying, survival time is short, remember the use of the replication algorithm requirements? That is, the survival rate of the object is not too high, so the aborted object is best suited for using the replication algorithm. Small question: 50% How to waste memory? Answer: Because the death of the general survival rate is low, so can not use 50% of memory as Idle, general, using two 10% of memory as the idle and active interval, while the other 80% of the memory is used to allocate memory for the new object, in the event of GC, 10% of the active interval with another 80% To 10% of the idle interval, then the memory of the previous 90% is released, and so on. In order for you to see this GC process more clearly, LZ gives the following illustration.

The figure highlights the situation of the respective memory in each of the three regions at each stage. Believe that looking at the diagram, its GC process is not difficult to understand. But there are two LZ need to mention, the 1th is to use this way, we only waste 10% of the memory, this is acceptable, because we swapped the memory neatly arranged with the GC speed. 2nd, the premise of this strategy is that each surviving object will not occupy more than 10% of the size of the memory, and once exceeded, the extra object cannot be copied.

In order to solve the above unexpected situation, that is, the survival of the object occupied by the memory is too large, the master will divide the Java heap into two parts to deal with, the above three areas is the first part, called the new generation or the young generation, and the rest of the part, specifically to store the old immortal object is called the old generation. Is it an apt name? Let's look at the way the old undead are handled. Old Undead: This type of object survival rate is very high, because they are mostly from the new generation, like people, live years long, and become old not dead.

Typically, when the following two cases occur, the object is transferred from the Cenozoic region to the old band area.

1, in the new generation of each object, there will be an age, when the age of these objects reached a certain level (age is the number of GC, each time the GC if the object survived, then the Age plus 1), it will be transferred to the old generation, and this into the old age of the value, generally in the JVM can be set.

2, when the new generation of living objects occupy more than 10% of memory, then the extra objects will be put into the old generation. At such times, the old generation is the "spare depot" of the new generation.

For the characteristics of an old undead object, it is obviously no longer appropriate to use the replication algorithm, because it has a high survival rate, and do not forget that if the old generation reuse the replication algorithm, it does not have a spare warehouse. Therefore, in general, the old undead object can only be tagged/collated or marked/cleared.

Object reclamation for method area (non-extinguished object)

The above two cases have solved most of the GC problem, because the Java heap is the main object of the GC, and the above also contains the entire content of the collection algorithm, the next for the non-destroyed object recycling, has not belonged to the generation of collection algorithm content. An Immortal object exists in the method area , and in our usual hotspot virtual machine (the JDK default JVM), the method area is also affectionately referred to as the permanent generation , and is a very apt name, isn't it? In fact, a long time ago, there is no permanent generation. At that time, the permanent generation and the old generation were stored together, which contained the Java class instance information and the class information. However, it was discovered that the unloading of class information almost rarely occurred, thus separating the two. Fortunately, this did improve performance, so the permanent generation was split . the GC in this part of the area uses a similar approach to older generations, since there are no "spare warehouses", both of which can only use the mark/Purge and Mark/collate algorithms .

Timing of recycling

The JVM does not collect each of the above three memory regions each time it performs GC, and most of the time it refers to the new generation. So the GC has two different types in the reclaimed area, one for the normal GC (minor GC) and one for the global GC (major GC or full GC), and they are for the following areas. Normal GC (minor GC): GC for Cenozoic regions only. Global GC (major GC or full GC): For older generations of GC, occasionally with a new generation of GC and for permanent generations of GC. Because the GC effect is not good for older generations versus permanent generations, and the memory usage of both is slow, in general, it takes several regular GC to trigger a global GC.

A common algorithm of JVM learning GC

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More