Deep understanding of JVM garbage collection

Last Update:2016-02-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. The significance of garbage collection

In C + +, the memory occupied by an object is occupied until the end of the program and cannot be assigned to other objects until it is explicitly released, whereas in Java the memory becomes garbage when there is no object reference to the memory originally assigned to an object. A system-level thread of the JVM automatically frees the block of memory. Garbage collection means that the object that the program no longer needs is "useless information," and that information is discarded. When an object is no longer referenced, the memory reclaims the space it occupies so that the space is later used by the new object. In fact, in addition to releasing useless objects, garbage collection can also erase memory-logged fragments. The memory is fragmented because the object being created and the garbage collector frees up the memory space that the discarded objects occupy. Fragmentation is a free memory hole between the blocks of memory allocated to an object. Defragmentation moves the occupied heap memory to one end of the heap, and the JVM allocates the compiled memory to the new object. Garbage collection can automatically free up memory space and reduce the burden of programming. This gives Java virtual machines some advantages. First, it can make programming more efficient. In the absence of a garbage collection mechanism, it may take a lot of time to solve a hard-to-understand memory problem. When programming in the Java language, the garbage collection mechanism can greatly shorten the time. Second, it protects the integrity of the program, and garbage collection is an important part of the Java language Security strategy. One potential drawback of garbage collection is that its overhead affects program performance. The Java Virtual machine must trace the objects that are useful in the running program and eventually release the useless objects. This process takes the processor's time. Secondly, the incompleteness of garbage collection algorithm, some garbage collection algorithms used earlier can not guarantee that 100% collected all the discarded memory. Of course, with the continuous improvement of garbage collection algorithm and the efficiency of software and hardware running, these problems can be solved. In general, Java developers can not focus on the allocation of heap memory and garbage collection in the JVM, but fully understanding the Java feature allows us to use resources more efficiently. Also note that the Finalize () method is the default mechanism for Java, and sometimes you can write your own Finalize method to ensure explicit release of object resources.

Second, the determination of the object

Almost all object instances are stored in the Java heap, and before the garbage collector reclaims objects in the heap, it is important to determine whether the objects are still useful and to determine whether the object is a garbage object with the following algorithm:

1. Reference counting method

The reference count is an early policy in the garbage collector. In this approach, each object (not a reference) in the heap has a reference count. For an object A, as long as any one object references a, the reference counter of A is incremented by 1, and when the reference is invalidated, the reference counter is reduced by 1. Object A can no longer be used as long as the value of the reference counter for object A is 0.

The reference counting method is simple and the judgement efficiency is very high. However, this algorithm has obvious flaws, and in the case of circular references, the objects that are referenced by the loop are not recycled. such as A=b,b=a, the reference counters for object A and object B are not 0 at this time. However, there is no 3rd object in the system that references a or B. That is, A and B are garbage objects that should be recycled, but because of the mutual reference between the garbage objects, which makes the garbage collector unrecognized, causing a memory leak.

2. Root Search algorithm

The basic idea of this algorithm is to use a series of objects called "Gcroots" as the starting point, starting from these nodes to search down, the path of the search is called the reference chain, when an object to Gcroots no reference chain connected, it proves that this object is not available. In the Java language, the objects that can be used as gcroots include the following:

The object referenced in the virtual machine stack (the local variable table in the stack frame). The object referenced by the class static property in the method area. The object referenced by a constant in the method area. The reference object for the JNI (native method) in the local method stack. Iii. Types of references

Whether a reference counting algorithm is used to determine the number of references to an object, or whether the reference chain of an object can be reached by a root search algorithm, it is "reference" to determine whether the object is alive. The general reference types are divided into strong references (strongreference), soft references (softreference), weak references (weakreference), virtual references (phantomreference) four, and these four reference intensities gradually weaken in turn.

1, strong references are commonly found in the program code, similar to "objectobj=newobject ()" Such a reference, as long as the strong reference exists, the garbage collector will never reclaim the referenced object. When there is not enough memory space, the Java virtual Machine prefers to throw a outofmemoryerror error, which causes the program to terminate abnormally, and does not rely on random recycling of strongly referenced objects to resolve out-of-memory issues. If you do not use it, you can assign a value of Obj=null, the setting OB shown is NULL, and the GC considers that the object does not have a reference, and this object can be reclaimed.

Strong references are very common in real-world applications, and the clear () method in the collection class uses a strong reference to the source code of the Clear () method in HashMap:

A table array is defined in the HashMap class, and you can see that the contents of each array are assigned NULL when the clear () method is called to empty the array. Unlike Table=null, strong references persist, avoiding memory allocations that are re-allocated when other methods are used in arrays. Using methods such as the clear () method to free memory is particularly useful for the type of reference that is stored in the array, which frees up memory in a timely manner.

2. Soft references are used to describe some objects that are also useful, but are not required. For objects associated with soft references, if memory is sufficient, the garbage collector does not reclaim the object, and if memory is insufficient, the memory of those objects is reclaimed. After JDK1.2, the SoftReference class is provided to implement the soft reference. soft references can be used to implement memory-sensitive caches. a soft reference can be used in conjunction with a reference queue (Referencequeue), and if the object referenced by the soft reference is reclaimed by the garbage collector, the Java Virtual machine will add the soft reference to the reference queue associated with it.

Soft references are primarily used in memory-sensitive caches, which are often used in Android systems. In general, Android apps use a large number of default images, which are used in many places. If you read the picture every time, the slow speed of reading the file requires hardware operation, which results in lower performance. So we're thinking about caching the images and reading them directly from memory when needed. However, because the picture occupies a large memory space, caching many images requires a lot of memory, it is likely to be more prone to outofmemory exceptions. In this case, we can consider using soft-referencing techniques to avoid this problem. SoftReference can solve the problem of oom, each object is instantiated by a soft reference, the object is stored in the form of a cache, when the object is called again, then directly through the soft reference to the Get () method, you can get the object in the resource data, So there is no need to read again, directly from the cache can be read, when the memory will occur when Oom, the GC will quickly clear all soft references to prevent oom.

Let's look at a simple code:

3. Weak references are also used to describe non-essential objects, and when the JVM is garbage collected, the objects associated with the weak references are reclaimed, regardless of sufficient memory. After jdk1.2, use the WeakReference class to implement weak references. The difference between a weak reference and a soft reference is that an object with only a weak reference has a shorter life cycle. Let's look at a simple example:

4. The virtual reference differs from the previous soft reference and weak reference, and it does not affect the life cycle of the object. Represented in Java with the Java.lang.ref.PhantomReference class. If an object is associated with a virtual reference, it can be reclaimed by the garbage collector at any time, as with no reference to it. Note that a virtual reference must be used in association with a reference queue, and when the garbage collector is ready to reclaim an object, if it finds a virtual reference, it adds the virtual reference to the reference queue associated with it. The program can see if the referenced object is going to be garbage collected by judging whether the reference queue has been added to the virtual reference. If the program discovers that a virtual reference has been added to the reference queue, it can take the necessary action before the memory of the referenced object is reclaimed. Let's look at a simple example:

Four, garbage collection algorithm

1. Tag-purge algorithm (mark-sweep)

The tag-purge algorithm is divided into two phases: Mark and clear all the objects that need to be recycled, collect all tagged objects uniformly after the mark is complete, and the tagging process is actually the root search algorithm to determine whether the object is alive or not. The algorithm is mainly less than two: one is the efficiency problem, marking and clearing two processes is not efficient, and the other is a spatial problem, after the mark is cleared, there will be a lot of discontinuous memory fragmentation, too much space fragmentation may cause later when the program is running to allocate a larger object, Unable to find enough contiguous memory and had to trigger another garbage collection action ahead of time. Mark-Clear the execution of the algorithm as shown in the following procedure:

Figure I, "mark-clear" algorithm

2. Copy algorithm (coping)

The algorithm is to divide the memory into two blocks of equal size, each time using one piece, and when garbage is collected, copy the surviving object to the other, and then clean up the whole piece of memory. In this way, each time the entire half of the memory recovery, memory allocation without regard to memory fragmentation and other complex situations, simple, efficient operation. This method is suitable for short-lived objects, and continuous replication of long-lived objects results in reduced efficiency. The execution of the replication algorithm is as follows:

Figure II, Replication algorithm

3. Labeling-Sorting algorithm (mark-compact)

The replication algorithm has more replication operations when the object has higher survival rate, and the efficiency will be reduced. The more common case in older times is that most objects are surviving objects. If the replication algorithm is still used, the cost of replication will be high due to the large number of surviving objects. The tag-collation algorithm is an old-age recovery algorithm that is the same as the tagging process for the mark-and-sweep algorithm, but instead of cleaning up the recyclable objects directly, it allows all surviving objects to move toward one end and then directly cleans up memory outside the end boundary. This method not only avoids the fragmentation, but also does not need two pieces of the same memory space, it is cost-effective relatively high. The algorithm looks like this:

Figure III, "marking-finishing" algorithm

4, Generational collection algorithm

According to the characteristics of garbage collection objects, the optimal way of different stages is to use the appropriate algorithm for garbage collection in this stage, the generational algorithm is based on this idea, it divides the memory interval according to the characteristics of the object, and uses different recovery algorithms to improve the efficiency of garbage collection according to the characteristics of each block. In general, the Java heap is divided into the new generation and the old age, the new generation uses the replication algorithm, the old age using marker-collation algorithm.

Five, garbage collector

Garbage collection algorithm is the theoretical basis of memory recovery, and garbage collector is the concrete implementation of memory recovery. Here's a look at some of the garbage collectors provided by the hotspot (JDK7) virtual machine, which allows users to assemble the collectors used in each generation according to their needs. The hotspot virtual machine's garbage collector looks like this:

Figure Iv. garbage collector for hotspot virtual machines

1. Serial Collector

This collector is a single-threaded collector that uses the Copy collection algorithm, which suspends all worker threads until the collection is complete and the default Cenozoic collector when the virtual machine runs in client mode. The advantage is simple and efficient (compared to the single thread of other collectors), the serial collector does not have the overhead of an off-the-shelf interaction for environments that limit a single CPU, and garbage collection can achieve the highest single-thread collection efficiency. Such as:

Figure V, serial/serialold collector run

2. Parnew Collector

The Parnew collector is actually a multithreaded version of the serial collector, with the exception of multiple threads for garbage collection, the rest of the behavior including algorithms, STW, Object assignment rules, and recycling policies are all the same as the serial collectors. The Parnew collector is the preferred Cenozoic collector in many virtual machines running in server mode, and one important reason is that only it can be used with the CMS collector in addition to the serial collector. Parnew collectors in a single CPU environment is not better than the serial effect, or even worse, two CPUs may not necessarily run, but as the number of CPUs increases, performance will gradually increase. The working process of the Parnew collector is as follows:

Figure VI, Parnew/serialold collector run

3. Parallelscavenge Collector

The Parallelscavenge collector is a new generation collector, which is a parallel multi-threaded collector using a replication algorithm.

The Parallelscavenge feature is that its focus is different from other collectors, and the focus of collectors, such as CMS, shortens the user thread's pause time as much as possible in the garbage collection. The goal of the Parallelscavenge collector is to achieve a controllable throughput (throughput). The so-called throughput is the ratio of CPU time spent running user code to the total CPU consumption. throughput = Run user code time/run user code time + garbage collection time.

Compared with the strategy of high throughput and short pause time, the main emphasis is on the efficient use of CPU time, the task is completed more quickly, it is suitable for background operation without too much interaction, while the latter emphasizes the user interaction experience.

4. Serialold Collector

The single-threaded collector, the old version of the serial collector, uses the "mark-and-organize" algorithm, which is used primarily in client mode, if in server mode, It has two major uses: one for use with the Parallelscavenge collector in JDK1.5 and previous versions, and the other for use as a backup plan for the CMS collector, which is used when a CMF occurs on a concurrent phone.

5. Parallelold Collector

Parallelold is the old version of the Parallelscavenge collector, using multithreading and the "mark-and-organize" algorithm. The working process of the Parallelold collector is as follows:

Figure VII, Parallelscavenge/parallelold collector run

6. CMS Collector

The CMS (concurrentmarksweep) collector is a collector that targets the shortest recovery pause time. The CMS collector is implemented based on the "tag-purge" algorithm, and the entire collection process is broadly divided into 4 steps:

(1) initial mark (Cmsinitialmark): Tag gcroots can be directly associated with the object, fast.

(2) Concurrency token (cmsconcurrentmark): The gcroots root search algorithm phase is determined to determine if the object is alive.

(3) Re-tagging (Cmsremark): Fixed a tag record for the part of the object that caused the tag to change during the concurrent tag because the user program continued to run.

(4) Concurrent Cleanup (cmsconcurrentsweep)

The initial and re-tagging phases still require stop-the-world, and the collector can work with the user thread during the longest concurrent token and concurrent cleanup process throughout the process. So overall, the memory recycling process for the CMS collector is performed concurrently with the user thread.

The advantages of CMS collector: Concurrent collection, low pause, but the CMS is still far from perfect, the main device has three significant shortcomings:

(1) The CMS collector is very sensitive to CPU resources. In the concurrency phase, although the user thread does not pause, it consumes CPU resources and causes the reference program to slow down and the total throughput to decrease. The number of recycled threads that the CMS starts by default is: (Number of CPUs +3)/4.

(2) CMS collector can not handle floating garbage, may appear "Concurrentmodefailure", failed to cause another FULLGC generation.

(3) The last drawback, CMS is a collector based on the "tag-purge" algorithm, which is collected using the "mark-sweep" algorithm, resulting in a lot of fragmentation. Too much space debris will cause a lot of trouble with object allocation, such as large objects, where memory space cannot find contiguous space to allocate and have to trigger a FULLGC in advance.

Figure VIII, CMS collector run

7. G1 Collector

The G1 collector is a garbage collector for service-side applications that replaces the CMS collector. Compared to other GC collectors, G1 has the following characteristics:

(1) Parallel and concurrency: Take full advantage of the hardware advantages of multi-CPU, multi-core environment, use multiple CPUs to shorten the Stop-the-world pause time, in the collection process in a concurrent way to let Java threads continue to execute.

(2) Generational collection: There is still a generational concept that does not require other collectors to be able to manage the entire GC heap independently, and is able to handle newly created objects in different ways and have survived for a period of time, over multiple GC objects for better collection results.

(3) Space integration: G1 from the overall view, is based on the "tag-collation" algorithm implementation, from the local (between two region) is based on the "copy" algorithm. There is no memory fragmentation during the run, enabling the program to run for long periods of time to allocate large objects without leaving the next GC in advance because of the inability to find contiguous memory.

(4) Predictable pauses: G1 In addition to the pursuit of low pauses, can also establish a predictable pause time model.

The G1 collector operation can be broadly divided into the following steps:

(1) Initial tag: Only the object that gcroots can be directly associated with is marked, and the Tams (Nexttopatmarkstart) value is modified to allow the next stage of the user program to run concurrently, creating a new object in the correctly available region. This phase requires a halt to the user thread.

(2) Concurrency token: From Gcroots, the object in the heap is analyzed for accessibility, the surviving objects are found, and it takes a long time to execute concurrently with the user thread.

(3) Final tag: fix tag records that have changed during concurrent tagging, this phase requires a stalled thread and can be executed in parallel.

(4) Screening and recycling: The recovery value and cost of each region are sorted, according to the user's expected GC pause time to make a recycling plan, garbage collection.

Deep understanding of JVM garbage collection

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deep understanding of JVM garbage collection

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Deep understanding of JVM garbage collection

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support