Java's garbage collection mechanism

Last Update:2018-07-06 Source: Internet

Author: User

Tags garbage collection

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

In the C + + language, programmers must handle every memory allocation with care, and must manually release the memory space that has been occupied after the memory is used. A memory leak problem occurs when the memory is not fully released, that is, the allocated, but never-freed, block.

In the Java language, it gives programmers a good promise: programmers don't need to manage memory because the JVM has GC to automate garbage collection. actually otherwise

Garbage collection does not perform GC at any time as required by the programmer.
Garbage collection does not clean up memory in a timely manner, although sometimes the program requires additional memory.
The programmer cannot control garbage collection.

Based on the facts above, it is necessary to thoroughly understand the automatic memory management mechanism of the JVM so that it can be programmed in applause. This article is a basic understanding of the JVM's memory management mechanism from the two knowledge points of garbage collection and memory allocation .

Why garbage collection?

With the running of the program, the memory of the instance objects, variables and other information Occupy more and more memory, if not timely garbage collection, will inevitably bring the program performance degradation, and even due to the lack of available memory caused some unnecessary system anomalies.

Which rubbish to recycle?

In the various parts of the Java Memory Runtime area, where the program counters, JVM stacks, and the 3 regions of the local method stack are synchronized with threads, the memory they occupy is automatically freed as the thread is destroyed, so there is no need to think too much about garbage collection issues in these areas.

While the Java heap and the method area are different, multiple implementation classes in an interface may require different memory, and multiple branches in one method may require different memory, and we can only know which objects are created while the program is running, and this portion of memory allocation and recycling is dynamic, so GC is required.

When does garbage collection take place?

Before the garbage collector reclaims the Java heap , it first determines which of the object instances are "alive" and which are "dead" (that is, no references exist).

In many textbooks, the reference counting algorithm is used to determine whether an object can be recycled: Add a reference counter to the object, each quoted once, and the counter plus 1; When the reference fails, the counter is reduced by 1; When the counter remains at 0 o'clock for a period of time, the object is considered recyclable. However, this algorithm has obvious drawbacks: when two objects are referenced by each other, but they are no longer useful, they should be garbage collected as a rule, but they are referenced by each other and are not eligible for garbage collection, so this memory cleanup cannot be handled perfectly. Therefore, Sun's JVM does not use the reference counting algorithm , but the accessibility analysis algorithm is used for garbage collection.

The basic idea of the Accessibility analysis algorithm is that through a series of objects called "GC Roots" as the starting point, the search path is called the reference chain , and when an object to the GC Roots there is no reference chain when connected, it proves that this object is not available. As shown, objects Object5, OBJECT6, object7 are associated with each other, but they are not accessible to GC roots, so they will be judged as recyclable objects.

Whether it is a reference counting algorithm or a accessibility analysis algorithm , they determine whether an object's survival is related to "referencing." After JDK 1.2, Java expanded the concept of references by introducing strong, soft, weak, and virtual four references, which were gradually weakened in order of 4 reference intensities. The concept of these kinds of references, the reader can understand, there is not much to repeat.

In addition, even in the accessibility analysis algorithm is not able to reach the object, is not "not dead". If the class overrides the Finalize () method and is not called by the virtual machine, then the virtual opportunity invokes a finalize () method to complete the final work, in which case the object can be "reborn" if it is re-associated with any object on the reference chain ; If the object has not escaped at this time, then it is really recycled.

The garbage collector first determines whether a class is a "useless class" before it recycles the method area, and the class needs to satisfy the following 3 conditions in order to be considered a "useless class":

All instance objects of the class have been reclaimed.
The ClassLoader that loaded the class have been recycled.
The corresponding Java.lang.Class object of this class is not referenced anywhere and cannot be used to access the class's methods at any place.

How do I do garbage collection?

In the Java heap, memory is divided into Cenozoic and older generations , with a ratio of 1:2. The new generation is suitable for those objects whose life cycle is short, frequently created and destroyed, and the old generation is suitable for objects with relatively long life cycles and large objects that require a large amount of contiguous memory space.

As shown, the Cenozoic is divided into the Eden and survivor areas, while the survivor area is divided into two parts of the same size: Fromspace and Tospace. The default space ratio of Eden and a survivor area is 8:1, which can be set by -xx:survivorratio . In most cases, objects are allocated in the new generation of Eden, and when Eden Space is low, the virtual machine initiates a minor GC to transfer the surviving objects to the survivor area. The Cenozoic uses a replication algorithm to collect memory.

The old generation is used to store objects that are still alive after multiple garbage collection in the Cenozoic , and some large objects that require a large amount of contiguous memory space. There is also a dynamic object age decision in the JVM: if the sum of all objects of the same age in the survivor space is greater than half the size of survivor space, objects older than or equal to that age can go directly to the old generation. The old generation uses the tag-finishing (compression) algorithm to collect memory.

Garbage collection algorithm

In the above, we mentioned the replication algorithm and the tag-collation (compression) algorithm, which is one of the common GC algorithms.

Tag-purge algorithm (mark-sweep)

Mark-Clear is the most basic GC algorithm, divided into "mark" and "purge" two phases: first mark out all the objects that need to be recycled, and then scan and recycle all the tagged objects. It has two deficiencies: first, the efficiency of marking and clearing two processes is not high; second, a large number of discontinuous memory fragments are generated after the tag is cleared, and too much space fragmentation can result in the subsequent failure to find enough contiguous memory when allocating large objects and the need to trigger another GC action ahead of time.

Replication Algorithm (Copying)

As mentioned earlier, the Cenozoic is divided into 1 Eden areas and 2 survivor zones, where the default space ratio of Eden and one survivor area is 8:1, that is, another survivor area is idle. During garbage collection, the objects that are still alive in the Eden and survivor areas are copied one at a time to another survivor space, and then the Eden and survivor space that you just used is cleared away. When the second block of survivor space is not enough, it is necessary to rely on the old generation for the allocation guarantee. The replication algorithm is suitable for the new generation.

Tag-organize (compress) algorithm (mark-compact)

The tagging process is still the same as the tag-purge algorithm, but the next step is not to clean up the recyclable objects directly, but rather to have all the surviving objects move toward one end, and then directly clean out the memory outside the end boundary. The tag-collation algorithm is suitable for older generations.

Generational collection algorithm (generational collecting)

According to the characteristics of garbage collection objects, the optimal way of different stages is to use the appropriate algorithm for garbage collection in this stage, the generational algorithm is based on this idea, it divides the memory interval according to the characteristics of the object, and uses different recovery algorithms to improve the efficiency of garbage collection according to the characteristics of each block. Taking hot Spot virtual machine As an example, it divides the Java heap into Cenozoic and old generation, so that the most appropriate collection algorithm can be used according to the characteristics of each age.

Garbage collector classification

The hotspot virtual machine that is based on JDK 1.7 Update 14 contains all the collectors as shown in.

Serial collector

The serial collector has two main features: first, it uses only single thread for garbage collection, and second, it has exclusive garbage collection.

When the serial collector is garbage collected, threads in the Java application need to be paused and wait for the garbage collection to complete, which results in poor user experience. Nonetheless, the serial collector is an extremely efficient collector with a proven, long-time production environment. The new generation serial processor uses a replication algorithm to achieve relatively simple, logical processing that is particularly efficient and without the overhead of thread switching. Hardware platforms such as single-CPU processors or smaller application memory are not particularly advantageous, and can perform more than parallel and concurrent recyclers. In a HotSpot virtual machine, use the -xx:+useserialgc parameter to specify the use of the Cenozoic serial collector and the old generation serial collector. When the JVM is running in Client mode, it is the default garbage collector.

Parnew Collector

The parallel collector is a garbage collector that works in the new generation, and it simply multithreading the serial collector. It has the same recycling strategy, algorithms, and parameters as the serial collector.

The parallel collector is also an exclusive collector, and the application is paused during the collection process. However, because the parallel collector uses multi-threading for garbage collection, it produces a shorter pause time than the serial collector on a CPU with higher concurrency, and in a system with a single CPU or a weaker concurrency, the effect of the parallel collector is not better than the serial collector, because of the multi-threaded pressure, Its actual performance is likely to be worse than the serial collector.

The parallel collector can be turned on using the parameter -XX:+USEPARNEWGC, which sets the new generation to use the parallel collectors, the old generation uses the serial collector.

Parallel Scavenge Collector

The Cenozoic Parallel recovery collector is also a collector using the replication algorithm. On the surface, it is a multi-threaded, exclusive collector, just like a parallel collector. However, the parallel collection collector has an important feature: it is very concerned about the throughput of the system.

The Cenozoic Parallel collection collector can be enabled with the following parameters:

-XX:+USEPARALLELGC: The new generation uses the parallel collection collector, the old generation uses the serial collector.
-XX:+USEPARALLELOLDGC: Both the Cenozoic and the old generation use the parallel collection collector.

In addition, the parallel collection collector differs from the parallel collector in that it supports an adaptive GC throttling policy that uses -xx:+useadaptivesizepolicy to turn on adaptive GC policies. In this mode, parameters such as the size of the Cenozoic, the proportions of Eden and survivor, and the age of the older generations are automatically adjusted to achieve a balance between heap size, throughput, and pause time. In the difficult situation of manual tuning, you can use this adaptive method to specify only the maximum heap of the virtual machine, the target throughput (Gctimeratio), and the Pause time (maxgcpausemills), and let the virtual machine do its own tuning work.

Serial Old Collector

The old generation serial collector uses a tag-compression algorithm. Like the new generation serial collector, it is also a serial, exclusive garbage collector. Because old generation garbage collection typically uses more time than the new generation garbage collection, the application is likely to pause for a few seconds or even longer in applications with large heap space once the old generation serial collector starts. However, the old generation serial collector can be used in conjunction with a variety of new generation recyclers, and it can also act as a backup collector for the CMS collector. To enable an old generation serial collector, you can try using the parameter -XX:+USESERIALGC to specify that the new generation, the old generation, use the serial collector.

Parallel Old Collector

The old generation of the parallel collection collector is also a multi-threaded parallel collector. Like the new generation of parallel recovery collectors, it is also a collector that pays attention to throughput. The old generation parallel collection collector uses the tag-compression algorithm, which is JDK1.6 after it has been enabled.

Using -XX:+USEPARALLELOLDGC , you can use the parallel collection collector in both the Cenozoic and the old generation, a pair of garbage collector combinations that pay great attention to throughput, which can be considered in a throughput-sensitive system. The parameter -xx:parallelgcthreads can also be used to set the number of threads when garbage collection occurs.

CMS (Concurrent Mark Sweep) collector

Unlike the parallel collection collector, the CMS collector focuses on system downtime. The CMS is the abbreviation for Concurrent Mark Sweep, which means the concurrent tag cleanup, known from the name, that it uses the tag-purge algorithm, and that it is a garbage collector that uses multi-threaded concurrent recycling.

When CMS works, the main steps are: initial tag, concurrency token, re-tagging, concurrency cleanup, and concurrency reset. Where initial markup and re-tagging are exclusive to system resources, concurrent tagging, concurrent purging, and concurrent resets can be performed with the user thread. Therefore, as a whole, CMS collection is not exclusive, and it can be garbage collected during application run-up.

Based on the tag-purge algorithm, the initial tag, concurrency token, and re-tagging are all intended to mark the objects that need to be recycled. Concurrent cleanup is the formal collection of garbage objects after the tag is completed, and the concurrent reset refers to the re-initialization of the CMS data structures and data after the garbage collection is complete, ready for the next garbage collection. Concurrent tagging, concurrent cleanup, and concurrent resets can all be performed with the application thread.

Although the CMS collector does not completely suspend the application thread during its main working phase, it has a certain impact on the application throughput during the CMS execution period because it executes concurrently with the application thread to preempt the CPU. The number of threads that the CMS starts by default is (PARALLELGCTHREADS+3)/4), Parallelgcthreads is the number of threads for the new generation of parallel collectors, and can also be manually set by -xx:parallelcmsthreads parameters The number of threads for the CMS. When CPU resources are strained, the performance of the application can be very bad in the garbage collection phase, influenced by the CMS-collector thread.

Since the CMS collector is not an exclusive collector, the application is still working continuously during the CMS recycling process. In the process of application work, garbage is constantly generated. These newly generated garbage cannot be purged during the current CMS recycling process. Also, because the application is not interrupted, you should also ensure that the application has sufficient memory available during the CMS recycle process. Therefore, the CMS collector does not wait for the heap memory to saturate before it is garbage collected, but when the current heap memory usage reaches a certain threshold, it starts to recycle to ensure that the application still has enough space in the CMS to support the application to run.

This collection threshold can be specified by using -xx:cmsinitiatingoccupancyfraction , which is 68 by default. That is, when the space usage of the old generation reaches 68%, a CMS recycle is performed. If the memory usage of the application grows rapidly and there is already an out-of-memory condition during the execution of the CMS, the CMS recycle will fail and the JVM will start the old generation serial collector for garbage collection. If so, the application will be completely interrupted until the garbage collection is complete, and the application may have a long pause time. Therefore, the -xx:cmsinitiatingoccupancyfraction can be tuned according to the characteristics of the application. If the memory growth is slow, you can set a slightly larger value, the large threshold can effectively reduce the trigger frequency of the CMS, reduce the number of old generation recycling can significantly improve application performance. Conversely, if application memory usage grows quickly, this threshold should be lowered to avoid frequent triggering of old generation serial collectors.

The tag-purge algorithm will cause a large amount of memory fragmentation, and the discrete free space cannot allocate large objects. In this case, even if the heap memory still has a large remaining space, it may be forced to do a garbage collection in exchange for a single piece of contiguous memory available, which is quite detrimental to system performance, and in order to solve this problem, the CMS collector also provides several algorithms for memory compression finishing.

The -xx:+usecmscompactatfullcollection parameter allows the CMS to defragment a memory once the garbage collection is complete. The defragmentation of memory fragments is not concurrent. The -xx:cmsfullgcsbeforecompaction parameter can be used to set the number of times the CMS is recycled, and then memory compression is performed.

G1 Collector

The goal of the G1 collector is to act as a garbage collector for a server, so it is expected to outperform the CMS collector for throughput and pause control.

Compared to the CMS collector, the G1 collector is based on the tag-compression algorithm. As a result, it does not produce space debris, and it is not necessary to do an exclusive defragmentation work once the collection is complete. The G1 Collector also allows for very precise pause control. It allows the developer to specify that the garbage collection time does not exceed N when the pause is M. Using the parameter -XX:+UNLOCKEXPERIMENTALVMOPTIONS–XX:+USEG1GC to enable the G1 collector, set the target pause time for the G1 collector:-xx:maxgcpausemills=20,- xx:gcpauseintervalmills=200.

The above-mentioned garbage collectors are analyzed from different angles and can be divided into different types:

by the number of threads , it can be divided into serial garbage collector and parallel garbage collector. The serial garbage collector uses only one thread at a time for garbage collection, and the parallel garbage collector turns on multiple threads at the same time for garbage collection. Using a parallel garbage collector on a CPU with strong parallelism can shorten the pause time of GC.
according to the mode of operation, it can be divided into concurrent garbage collector and exclusive garbage collector. The concurrent garbage collector works alternately with the application thread to minimize application downtime; Once the exclusive garbage collector (Stop the World) runs, it stops all other threads in the application until the garbage collection process completely finishes.
fragmentation can be divided into compressed garbage collector and non-compressed garbage collector. The compressed garbage collector will compress the surviving objects after the recycle is complete, eliminating the recovered fragments, and the non-compressed garbage collector does not do this step.
according to the working memory interval , it can be divided into the new generation garbage collector and the old generation garbage collector.

and to evaluate a garbage collector's good or bad, you can use the following indicators:

throughput : The ratio of the time spent by an application to the total elapsed time of the system during the lifetime of the application. Total system uptime = Application time +GC time consuming. If the system is running 100MIN,GC time consuming 1min, then the system throughput is (100-1)/100=99%.
garbage collector Load : In contrast to throughput, the garbage collector load is the ratio of the time spent by the garbage collector to the total system uptime.
Pause Time : The pause time of the application when the garbage collector is running. For an exclusive collector, the pause time may be longer. When using a concurrent collector, the program's pause time is shortened because the garbage collector and the application are running alternately, but the throughput of the system may be lower because it is probably less efficient than an exclusive garbage collector.
garbage Collection Frequency : Refers to how long the garbage collector runs. In general, for fixed applications, the garbage collector should be as low as possible. Generally, increasing heap space can effectively reduce the frequency of garbage collection, but it may increase the amount of downtime that is generated by recycling.
reaction time : The amount of memory space that is occupied by an object when it is called Garbage is released.
heap allocation : Different garbage collector allocations to heap memory may be different. A good garbage collector should have a reasonable breakdown of heap memory intervals.

Summary

In this article, we mainly from why, what, when, and how to do garbage collection 4 aspects of the Java garbage collection mechanism to do a basic understanding, but also understand the GC 4 algorithms, and garbage collector classification overview and evaluation indicators.

Resources

"The road of becoming God-Basic article" jvm--garbage Collection

Java's garbage collection mechanism

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More