Java Virtual machine: GC Algorithm depth analysis

Source: Internet
Author: User

Copyright NOTICE: This article for Bo Master original article, reproduced please indicate the source, Welcome to exchange Study!

In the previous article, we introduced the Accessibility analysis algorithm, which solves the problem of determining which objects can be recycled, and then our garbage collection algorithm comes out. Different garbage collection algorithms have their own advantages and disadvantages, in the JVM implementation, is often not a single algorithm for recycling, but a combination of several different algorithms used to achieve the best collection effect. Next, the idea and development of several garbage collection algorithms are introduced in detail.

the most basic collection algorithm--mark/Clear Algorithm

The mark/clear algorithm is the most basic algorithm in several GC algorithms, because the subsequent collection algorithms are based on this idea and the shortcomings are improved. The basic idea of the mark/clear algorithm is the same as its name, divided into "mark" and "clear" two stages: first mark out all the objects that need to be recycled, after the mark is complete, all the tagged objects are collected uniformly.

Marking stage: The process of tagging is actually the process of the accessibility analysis algorithm described earlier, traversing all GC roots objects, marking the objects that can be reached from the GC roots object, usually in the header of the object, and record it as an object;

Purge phase: The process of purging is to iterate over the heap memory, and if an object is found not to be marked as an object (by reading the header information of the object), it is recycled.

Is the mark/clear algorithm, in the tagging phase, from the object GC root 1 can access to the B object, from the B object can be accessed to the E object, so from the GC Root 1 to B, e are reachable, in the same vein, objects F, G, J, K are reachable objects, to the purge phase, all unreachable objects will be recycled.

All Java execution threads (also known as "Stop the World") must be stopped when the garbage collector is performing a GC, because when an accessibility analysis is performed during the tagging phase, the object reference relationship is not constantly changing in the analysis process, otherwise the accuracy of the accessibility analysis results cannot be guaranteed. The application thread resumes running after the wait tag cleanup has ended.

As mentioned earlier, the subsequent collection algorithm is improved on the basis of the tag/purge algorithm, which means that the tag/purge algorithm has its shortcomings. Actually understand the principle of it, its shortcomings are not difficult to see.

1, efficiency issues. Both phases are not efficient to mark and clear because both phases need to traverse objects in memory, and many times the number of object instances in memory is very large, which is undoubtedly time-consuming, and the GC needs to stop the application, which can lead to a very poor user experience.

2, space problems. A large amount of discontinuous memory fragmentation (as can be seen) after the tag is cleared may result in too much memory space fragmentation that can lead to the inability to find enough contiguous memory and the need to trigger another garbage collection action ahead of time when a larger object needs to be allocated during the program's run.

Since there are so many drawbacks to the tag/purge algorithm, does it still have a meaning to exist? Don't worry, an algorithm has flaws, people will certainly find a way to improve it, the next two algorithms are on the basis of the mark/clear algorithm to improve.

Replication Algorithms

In order to solve the efficiency problem, the replication algorithm appeared. The principle of the replication algorithm is to divide the available memory by capacity into two blocks of equal size, each using one of them. When this piece of memory is used up, the surviving object is copied to another piece of memory, and then all objects of this memory are cleaned up once. The diagram illustrates the following:

Before recycling:

After recycling:

The replication algorithm is a memory reclaim of the entire half area each time, which reduces the time of the tag object traversal, eliminating the need to traverse through the use of the Zone object, emptying the entire area of memory directly, and storing the surviving objects in an address order when they are copied to the reserved area, which solves the problem of memory fragmentation. When allocating object memory without regard to complex issues such as memory fragmentation, you only need to allocate memory sequentially.

The replication algorithm is simple and efficient, which optimizes the efficiency of the tag/purge algorithm and the memory fragmentation problem. But its drawbacks are also obvious:

1, the memory reduced to the original half, wasted half of the memory space, the price is too high;

2, if the survival rate of the object is very high, the extreme situation assumes that the object survival rate is 100%, then we need to copy all the surviving objects, the cost of time can not be ignored.

Based on the shortcomings of the above-mentioned replication algorithm, because the new generation of objects almost all "to die" (up to 98%), now the commercial virtual machines are using a replication algorithm to recover the new generation. Due to the low survival rate of the new generation of objects, it is not necessary to divide the memory space according to the ratio of 1:1, but to divide the memory into a larger Eden space and two smaller from survivor space, to survivor space, the proportion of the three is 8:1:1. Each time you use the Eden and from Survivor Zones, the to survivor as a reserved space. When the GC starts, the object will only exist in the Eden and from survivor areas, and the to Survivor area is empty. When the GC is in progress, all surviving objects in the Eden area are copied to the to Survivor area, and in the From Survivor area, the surviving objects will be determined according to their age values to reach the age threshold (default is 15, the new generation of objects every time a cycle of garbage collection, An object with an age value of 1 will be moved to the old age, and objects that do not reach the threshold will be copied to the to Survivor area. Then empty the Eden and from survivor areas, and the surviving objects in the Cenozoic are in the to Survivor area. Next, the from survivor and to survivor zones swap their roles, i.e. the new to Survivor area is the from Survivor area where the last GC was emptied, and the new from Survivor area is the last GC's to survivor area. In any case, it is guaranteed that the to Survivor area will be empty after a round of GC. GC when there is not enough space in the to Survivor area to store the last generation of surviving objects collected, it is necessary to rely on the old age for distribution guarantees to store these objects in the old age.

tagging/sorting algorithms

replication algorithm for higher object survival rate for more replication operations, efficiency will become very low, and more crucially, if you do not want to waste 50% of memory space, you need to have additional memory space to allocate security to deal with in-memory object 100% Survival Extreme situation, therefore, In the old age, because the survival rate of the object is very high, the replication algorithm is not suitable. According to the characteristics of the old age, high people put forward another algorithm: Marker/collation algorithm. From the name, this algorithm is similar to the mark/purge algorithm, in fact, the tagging/sorting algorithm is still the same as the tag/purge algorithm, but the next step is not to recycle the recyclable objects directly, but to let all surviving objects move to one end, and then directly clean out the memory outside the edge of the end.

Before recycling:

After recycling:

As you can see, recyclable objects are cleaned up after recycling, and the surviving objects are stored in memory in regular order. This way, when we allocate memory to new objects, the JVM only needs to hold the starting address of the memory. The labeling/sorting algorithm not only compensates for the problem of memory fragmentation in the tag/purge algorithm, but also eliminates the high cost of half of the replication algorithm memory, which is double benefit. But any algorithm has shortcomings, like no one is perfect, the disadvantage of labeling/sorting algorithm is not high efficiency, not only to mark the surviving objects, but also to tidy up the reference address of all the surviving objects, less efficient than the replication algorithm.

Figuring out the principles of the above three algorithms, let's make a simple ranking of these algorithms from several aspects.

efficiency : Copy algorithm > tag/collation algorithm > tag/purge algorithm (tag/purge algorithm has a memory fragmentation problem, which may trigger a new round of garbage collection when allocating memory to large objects)

Memory uniformity rate : Copy algorithm = tagging/sorting algorithm > tag/purge algorithm

Memory Utilization : Tag/grooming algorithm = mark/purge algorithm > copy algorithm

As can be seen from the simple evaluation above, the mark/purge algorithm has been relatively backward, but the draft does not forget to dig well man, it is the predecessor of several algorithms, is the foundation, in some scenarios it also has a useful.

The Ultimate Algorithm--generational collection algorithm

The current commercial virtual machines are using the generational collection algorithm, said it is the ultimate algorithm, because it combines the advantages of the previous algorithms, the combination of the algorithm used for garbage collection, rather than it is a new algorithm, rather, it is the actual application of the first several algorithms. The idea of generational collection algorithm is to divide the memory into several blocks according to the life cycle of the object, generally it divides the Java heap into the new generation and the old generation (also has a permanent generation, is the Hotspot special realization, other virtual machine implementation does not have this concept, the permanent generation collection effect is very poor, In general, the permanent generation is rarely garbage collected, so that the most appropriate collection algorithm can be used according to the characteristics of each age.

Cenozoic: The DPRK is born, and the survival time is very short.

Old age: After several minor GC survived, the survival cycle is long.

In the Cenozoic, each garbage collection found that a large number of objects died, only a small number of survival, so the use of replication algorithm to recover the new generation, only a small number of objects to pay the cost of replication can be collected, and the old age of high-survival objects, not suitable for the use of replication algorithms, and if the old age using replication algorithm, It does not have extra space for allocation guarantees, so it must be recycled using either a tag/cleanup algorithm or a tagging/grooming algorithm.

To summarize, the principle of generational collection algorithm is to use the replication algorithm to collect the new generation, using the tag/cleanup algorithm or marker/collation algorithm to collect the old age.

The above describes the principles, advantages and disadvantages of several collection algorithms, and their common denominator: when the GC thread starts (that is, garbage collection), the application pauses (Stop the world). Understanding this knowledge has laid the foundation for us to study the operation principle of the garbage collector. The above is a summary of my personal study, Welcome to exchange study.

Java Virtual machine: GC Algorithm depth analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.