JVM: Garbage collector

Source: Internet
Author: User

If the collection algorithm is the method of memory recycling, then the garbage collector is the specific implementation of memory recycling. There is no provision for how the garbage collector should be implemented in the Java Virtual Machine specification, so the garbage collectors provided by different vendors and versions of virtual machines can vary greatly, and typically provide parameters for users to assemble the collectors used in each era according to their own application characteristics and requirements. The collector discussed here is based on the hotspot virtual machine after JDK 1.7 Update 14. This virtual machine contains all the collectors as shown in:

Shows 7 collectors that act on different generations, and if there is a connection between the two collectors, it means they
Can be used in combination. The region in which the virtual machine is located indicates whether it belongs to the new generation collector or the old collector. Next, I will introduce the characteristics, fundamentals and usage scenarios of these collectors, and focus on the two relatively complex collectors of CMS and G1 to understand some of their operational details.

I. Serial collector

The serial collector is the most basic and longest-growing collector ever (before JDK 1.3.1) is a virtual machine
The only option that the new generation collects. As you can see by name, This collector is a single-threaded collector, but its "single-threaded" meaning does not only mean that it uses only one CPU or one collection thread to complete garbage collection, but more importantly, when it is garbage collected, all other worker threads must be paused until it is collected. "It is the default Cenozoic collector that the virtual machine runs in client mode." It also has a place better than other collectors: simple and efficient (single-line turndown with other collectors), and for environments that limit a single CPU, the serial collector can naturally get the highest single-threaded collection efficiency due to the overhead of no thread interaction. in the user's desktop scenario, the memory assigned to the virtual machine management is generally not very large, collecting a few 10 trillion or even one hundred or two hundred trillion of new generation (only the new generation of memory used, desktop applications are basically no longer large), the pause time can be controlled in dozens of milliseconds up to 100 milliseconds, This pause is acceptable as long as it does not occur frequently. Therefore, the serial collector is a good choice for virtual machines running in client mode.

Two. Parnew Collector

parnew collector is actually a multi-threaded version of the serial collector , except for garbage collection using multiple threads, the remaining behavior includes all the control parameters that are available to the serial collector (Example:-xx:survivorratio,-xx:pretenuresizethreshold,-xx:handlepromotionfailure, etc.), collection algorithms, Stop the world, object assignment rules, The recycle strategy is exactly the same as the serial collector, and the two collectors share quite a lot of code in the implementation. It is important to note that the parnew collector has no better effect than the serial collector in a single CPU environment, even due to the overhead of thread interaction. The collector is not guaranteed to exceed the serial collector in an environment of two CPUs implemented with Hyper-Threading technology. of course, as the number of CPUs that can be used increases, it is good for the efficient use of system resources in GC. It is enabled by default the number of collection threads is the same as the number of CPUs, in the CPU is very large (such as 32, now the CPU easily 4 cores plus Hyper-threading, the server more than 32 logical CPUs more and more cases), you can use-XX: The parallelgcthreads parameter to limit the number of threads that are garbage collected.

The Parnew collector, in addition to multi-threaded collection, does not have much innovation compared to the serial collector, but it is the preferred Cenozoic collector for many virtual machines running in server mode, one of which is performance-independent but important because In addition to the serial collector, it currently works with the CMS collector only. during JDK 1.5, the hotspot launched a garbage collector--cms collector (Concurrent Mark Sweep, which is almost a landmark in strong interactive applications, which will be detailed later in this article), The collector is the first truly concurrent (Concurrent) collector in the Hotspot virtual machine, which first implements the garbage collection thread to work with the user thread (basically) at the same time.

CMS as a collector of the old age, but not with the JDK 1.4.0 already exist in the new generation collector parallel scavenge work together, so in JDK 1.5 using the CMS to collect the old age, The Cenozoic can only choose one of the parnew or serial collectors . The Parnew Collector is also the default Cenozoic collector after you use the-XX:+USECONCMARKSWEEPGC option, or you can use the-XX:+USEPARNEWGC option to force it to be specified.

Three. Parallel Scavenge collector

The Parallel scavenge collector is a new generation collector that also uses a collection of replication algorithms and is a parallel multi-threaded collector. The Parallel scavenge collector is characterized by its focus on other collectors, where the focus of collectors such as CMS is to minimize the downtime of user threads when garbage collection occurs, while the Parallel scavenge collector's goal is to achieve a controlled throughput (throughput). the shorter the pause time, the more suitable for the need to interact with the user program, good response speed can improve the user experience, and high throughput can be efficient use of CPU time, as soon as possible to complete the operation of the program tasks, mainly for the background operation and do not need too many interactive tasks.

The Parallel scavenge collector provides two parameters for precise control of throughput, respectively, the-xx:maxgcpausemillis parameter that controls the maximum garbage collection pause time, and the-xx:gctimeratio parameter that directly sets the throughput size.

the value allowed for the Maxgcpausemillis parameter is a number of milliseconds greater than 0, and the collector will try to ensure that the memory collection takes no longer than the set value. But let's not think that if you set the value of this parameter a little bit, you can make the system garbage collection faster, and theGC pause time is reduced by sacrificing throughput and the new generation of space : The system makes the new generation smaller, Collecting 300MB is certainly faster than collecting 500MB, which also leads directly to garbage collection taking place more frequently, originally 10 seconds collected once, each pause 100 milliseconds, now becomes 5 seconds collect once, each pause 70 milliseconds. The pause time is indeed falling, but the throughput is lowered.

the value of the Gctimeratio parameter should be an integer greater than 0 and less than 100, which is the ratio of garbage collection time to total time, equivalent to the reciprocal of the throughput. If this parameter is set to 19, the maximum GC time allowed is 5% of the total time (that is, 1/(1+19)), and the default value is 99, which is the maximum allowable garbage collection time of 1% (that is, 1/(1+99)).

Parallel scavenge collectors are often referred to as" throughput first "collectors because they are closely related to throughput. In addition to the two parameters above, the parallel scavenge collector also has a parameter-xx:+useadaptivesizepolicy worth paying attention to. This is a switch parameter, when this parameter is opened, there is no need to manually specify the size of the Cenozoic (-XMN), the ratio of Eden to Survivor Area (-xx:survivorratio), the age of promotion to the old age (-xx: Pretenuresizethreshold), the virtual opportunity collects performance monitoring information according to the current system operation, dynamically adjusts these parameters to provide the most suitable pause time or maximum throughput, which is called the GC Adaptive Tuning Strategy (GC Ergonomics).

for the collector operation is not very understanding, manual optimization is difficult, the use of parallel scavenge collector with adaptive adjustment strategy, the Memory Management tuning task to the virtual machine to complete will be a good choice. simply set the basic memory data (such as-XMX to set the maximum heap), and then use the Maxgcpausemillis parameter (more attention to maximum pause time) or gctimeratio (more attention to throughput) parameters to set an optimization target for the virtual machine, The tuning of the specific details parameter is done by the virtual machine. The adaptive Tuning strategy is also an important difference between the parallel scavenge collector and the Parnew collector.

Four. Serial Old collector

Serial old is an older version of the Serial collector, which is also a single-threaded collector, using the tag-integer
"Algorithm". The main meaning of this collector is also to use the virtual machine in client mode. In the case of server mode, it has two main uses: one for use with the parallel scavenge collector in JDK 1.5 and earlier, and the other for a backup of the CMS collector, which takes place in a concurrent collection concurrent used when Mode failure.

It is necessary to note that theParallel scavenge collector architecture itself has a PS MarkSweep collector for the old age collection, not directly using the serial, but this PS MarkSweep collector and serial Old implementation is very close, so in many official materials are directly serial old instead of PS MarkSweep to explain, this is also used in this way

Five. Parallel Old collector

Arallel Old is an older version of the parallel scavenge collector, using multithreading and the "mark-and-organize" algorithm. This collector is only available in JDK 1.6 , before which the new generation of parallel scavenge collectors has been in a state of relative embarrassment. The reason is that if the new generation chose the Parallel scavenge collector, the old age in addition to the serial (PS MarkSweep) collectors have no choice (Remember said Parallel Scavenge Collector unable to work with CMS collector ?). Due to the "drag" of the old collector in the service-side application performance, the use of the parallel scavenge collector may not be able to maximize the throughput of the overall application, because the serial of the single-threaded old collection can not take full advantage of the server multi-CPU processing capacity, In the old age and the hardware more advanced environment, this combination of throughput is not necessarily parnew plus CMS combination "to force."

Until the Parallel old collector appears, the "throughput first" collector finally has a veritable combination of applications that prioritize the Parallel scavenge Parallel old collector in the context of throughput and CPU resource sensitivity.

Six. CMS collector

The CMS (Concurrent Mark Sweep) collector is a collector that targets the shortest recovery pause time. At present, a large part of the Java application focus on the Internet or B/s system services, such applications pay particular attention to the response speed of the service, hope that the system pauses the shortest time, in order to bring a better experience to users. CMS collectors are very much in line with the needs of such applications. The collector is implemented based on the "tag-purge" algorithm, which is more complex than the previous collectors, and is divided into 4 steps, including the following:

  • Initial tag (CMS initial mark)
  • Concurrency token (CMS concurrent mark)
  • Re-tagging (CMS remark)
  • Concurrent Purge (CMS concurrent sweep)

The initial tag, re-tagging these two steps still need "Stop the World". The initial tag is just
Mark the object that the GC roots can directly relate to, and it's fast.

The concurrent tagging phase is the process of GC rootstracing. While the re-tagging phase is to fix the tag record of the part of the object that is causing the tag to change during the concurrent tagging period because the user program continues to work, the pause time of this phase is generally slightly longer than the initial marking stage, But it is much shorter than the time of concurrent tagging.

The memory reclamation process for the CMS collector is performed concurrently with the user thread, because the most time-consuming concurrency token and the concurrent cleanup process of the collector thread can work with the user thread throughout the process.

CMS is an excellent collector, its main advantages in the name has been reflected in: Concurrent collection, Low stop
Meal But the CMS is far from the perfect level, it has the following 3 obvious drawbacks:

1.CMS Collector is very sensitive to CPU resources

Programs that are designed for concurrency are more sensitive to CPU resources. In the concurrency phase, it does not cause the user thread to pause, but it slows down the application because it occupies a portion of the thread (or CPU resources), and the total throughput is reduced. If the CPU load is relatively large, and half of the computing power to execute the collector thread,
May cause the user program execution speed to reduce a lot suddenly, actually also is unacceptable to the person. To cope with this situation, the virtual machine provides a CMS collector variant called the "Incremental Concurrency Collector" (Incremental Concurrent Mark sweep/i-cms), What is done and the single CPU years of the PC operating system using preemptive to simulate the multi-tasking mechanism of the idea, is the concurrent tagging, cleanup, the GC thread, the user thread alternately run, to minimize the GC thread exclusive resources time, so that the whole garbage collection process will be longer, However, the impact on the user program will appear less, that is, the speed drop is not so obvious. The practice proves that the CMS collector effect in increments is very general, in the current version, I-CMS has been declared as "deprecated", that is no longer advocating user use.

2.CMS Collector cannot handle floating garbage (floating garbage)

The CMS collector is unable to handle floating garbage (floating garbage), and a "Concurrent Mode Failure" failure may occur resulting in another full GC. Because the CMS concurrent cleanup phase user thread is still running, along with the program run naturally there will be new garbage constantly generated, this part of the garbage appears after the tagging process, the CMS can not be processed in the secondary collection of them, and had to leave the next GC to clean up. This part of the rubbish is called "floating rubbish". also because the user thread needs to run during the garbage collection phase, there is also a need to reserve enough memory space for the user thread to use, so the CMS collector cannot wait until the old age is almost completely filled up like the other collectors and then collects it, and needs to reserve a portion of the space for the program to run when it is collected concurrently. in the default settings of JDK 1.5, the CMS collector will be activated when the old age uses 68% of the space, this is a conservative setting, if the old age in the application is not too fast, you can properly adjust the parameter-xx: The Cmsinitiatingoccupancyfraction value increases the trigger percentage to reduce the number of memory recoveries for better performance, and in JDK 1.6, the boot threshold for the CMS collector has been increased to 92%. if the memory reserved during the CMS operation does not meet the needs of the program, a "Concurrent Mode Failure" failure occurs, and the virtual machine will start a fallback plan: temporarily enable the serial old collector to re-age garbage collection, So the pause time is very long. So the parameter-xx:cmsinitiatingoccupancyfraction set too high can easily lead to a lot of "Concurrent Mode Failure" failure, performance is reduced.

3.CMS collectors generate a lot of space debris

CMS is a collector based on the "tag-clear" algorithm, which means that there will be a large number of collections at the end
Space debris generation. When there is too much space debris, large objects will be assigned a lot of trouble, often the old age still has a lot of space remaining, but can not find enough contiguous space to allocate the current object, have to trigger a full GC in advance. to solve this problem, the CMS collector provides a-xx:+usecmscompactatfullcollection switch parameter (which is on by default) to enable the merge process to turn on memory fragmentation when the CMS collector is not up to full GC. The process of memory collation is not concurrent, the space debris problem is not, but the pause time must not be long. The Virtual Machine Designer also provides another parameter,-xx:cmsfullgcsbeforecompaction, which is used to set how many times the uncompressed full GC is followed by a compressed (the default is 0, which indicates that each entry into the full The GC is defragmented).

Seven. G1 collector

The G1 (Garbage-first) collector is one of the most cutting-edge results of today's collector technology development. It is a garbage collector for service-side applications. The mission assigned to it by the Hotspot development team is to replace the CMS collectors released in JDK 1.5 in the future (in the longer term). Compared to other GC collectors, the G1 has the following special
Point:

Parallel and concurrent : G1 can take full advantage of the hardware advantages of multi-CPU, multi-core environment, use multiple CPUs (CPU or CPU core) to shorten the time of Stop-the-world, some other collectors have to pause the Java thread to perform the GC action, The G1 collector can still have Java programs continue to execute in a concurrent manner.

Generational collection : As with other collectors, generational concepts remain in G1. Although G1 can manage the entire GC heap independently without the need for other collector mates, it can handle newly created objects in different ways and old objects that have survived for a period of time to get better collection results.

Spatial Integration : Unlike the CMS's "mark-and-clean" algorithm, G1 is based on the "mark-and-sweep" algorithm implementation of the Collector, from the local (two region) on the basis of the "copy" algorithm is implemented, but anyway, Both of these algorithms mean that there is no memory space fragmentation during the G1 operation and that the collected data will provide regular, usable memory. This feature is useful for long-running applications where large objects are allocated without triggering the next GC ahead of time because of the inability to find contiguous memory space.

predictable pauses : This is another big advantage of G1 relative to the CMS, reducing the pause time is a common concern of G1 and CMS, but G1 in addition to the pursuit of low pauses, but also to establish a predictable pause time model, can let the user explicitly specified in a length of M milliseconds in a time fragment, The time spent on garbage collection must not exceed n milliseconds, which is almost a feature of the real-time Java (RTSJ) garbage collector.

The collection of other collectors before G1 is the whole new generation or the old age, and G1 is no longer the
Sample. With the G1 collector, the memory layout of the Java heap differs greatly from that of other collectors, which divides the entire Java heap into separate, equal-sized regions (region), although it retains the concept of the Cenozoic and the old, but the Cenozoic and the old are no longer physically isolated, They are all collections that are part of the region (which does not need to be contiguous).

The G1 collector is able to establish a predictable pause-time model because it can plan to avoid full-area garbage collection throughout the Java heap. G1 tracks the value of the garbage accumulation in each region (the amount of space collected and the amount of time it takes to reclaim), maintains a prioritized list in the background, and prioritizes recovering the most valuable region per time, based on the allowable collection times ( This is the reason for the Garbage-first name). This uses the region to divide the memory space and has the priority area recovery method, guaranteed the G1 collector to obtain the highest collection efficiency in the limited time.

Here is a summary of all the parameters involved:

Resources:

Deep understanding of Java Virtual Machine JVM advanced features and best practices Zhou Zhiming

JVM: Garbage collector

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.