Go: In-depth understanding of Java G1 garbage collector

Source: Internet
Author: User

History of the Java garbage collector

First stage, Serial (serial) collector

Before jdk1.3.1, the Java virtual machine could only use the serial collector. The serial collector is a single-threaded collector, but its "single-threaded" meaning does not merely mean that it uses only one CPU or one collection thread to complete garbage collection, and more importantly, when it is garbage collected, all other worker threads must be paused until it is collected.

PS: How to turn on the serial collector

-xx:+useserialgc

Second stage, Parallel (parallel) collector

The parallel collector, also known as the throughput Collector, has the primary advantage of using multithreading to complete garbage cleanup, compared to the serial collector, which can take advantage of multicore features and dramatically reduce GC time.

PS: How to turn on the parallel collector

-xx:+useparallelgc-xx:+useparalleloldgc

Third stage, CMS (concurrency) collector

The CMS collector pauses all application threads when minor the GC, and is garbage collected in a multi-threaded manner. Instead of pausing the application thread at full GC, a number of background threads are used to periodically scan older space to reclaim objects that are no longer in use.

PS: How to open the CMS collector

-xx:+useparnewgc-xx:+useconcmarksweepgc

Phase IV, G1 (concurrency) collector

The G1 collector (or garbage priority collector) is designed to minimize the pauses that occur when handling large heaps (larger than 4GB). The rate of memory fragmentation is greatly reduced relative to the benefits of the CMS.

PS: How to turn on the G1 collector

-xx:+useg1gc

Two, learn G1

G1 's first paper (appendix 1), published in 2004, was only available in Jdk1.7u4 in the year 2012. Oracle plans to turn G1 into the default garbage collector in Jdk9 to replace the CMS. Why is Oracle strongly recommending G1, and what are the advantages of G1?

First, the G1 design principle is simple and feasible performance tuning

Developers simply need to declare the following parameters:

-xx:+useg1gc-xmx32g-xx:maxgcpausemillis=200

Where-XX:+USEG1GC is to open the G1 garbage collector, the maximum memory for-xmx32g design heap memory is set to 32g,-xx:maxgcpausemillis=200 the maximum pause time for the GC is 200ms. If we need tuning, we only need to modify the maximum pause time if the memory size is certain.

Second, G1 the new generation, the old age of the physical space division was canceled.

This way we no longer have to set up each generation in separate spaces, without worrying about whether the memory is sufficient for each generation.

Instead, the G1 algorithm divides the heap into several regions, which still belong to the generational collector. However, some of these areas include the Cenozoic, and the new generation of garbage collection still uses the way to suspend all application threads, copying the surviving objects to the old age or survivor space. The old age is also divided into many areas, and the G1 collector completes the cleanup by copying objects from one region to another. This means that, during normal processing, G1 completes the heap compression (at least partial heap compression), so there is no CMS memory fragmentation problem.

In G1, there is also a special area called the humongous area. If an object occupies more than 50% of the partition capacity, the G1 collector thinks it is a giant object. These mega objects, by default, are allocated directly to older generations, but if it is a short-lived mega-object, it can have a negative impact on the garbage collector. To solve this problem, G1 divides a humongous area, which is used to store giant objects. If an H-zone does not contain a giant object, then G1 will look for contiguous H-partitions to store. In order to find a contiguous H-zone, it is sometimes necessary to start the full GC.

PS: In Java 8, the persistent generation also moved to the normal heap memory space, to the meta-space.

Object Assignment Policy

Speaking of the allocation of large objects, we have to talk about the object allocation strategy. It consists of 3 stages:

    1. Tlab (thread local Allocation buffer) thread locally allocated buffer
    2. Allocation in Eden area
    3. Humongous District Distribution

Tlab allocates a buffer locally for the thread, and its purpose is to allocate the object as quickly as possible. If an object is allocated in a shared space, we need to use some synchronization mechanisms to manage the free space pointers in these spaces. In Eden space, each thread has a fixed partition for allocating objects, which is a tlab. When assigning objects, no synchronization is required between the threads.

For objects that cannot be allocated in the Tlab space, the JVM attempts to allocate in Eden space. If the Eden space cannot accommodate the object, it can only allocate space in the old age.

Finally, G1 provides two GC modes, young GC and mixed GC, both of which are stop the World (STW). Here are the 2 modes that we'll look at separately.

three, G1 young GC

Young GC is primarily a GC for the Eden Zone, which is triggered when the Eden space is exhausted. In this case, the data for the Eden space moves into the survivor space, and if the survivor space is not enough, some of the data in the Eden Space is promoted directly to the old generation space. Data from the survivor area is moved to the new survivor area, and some data are promoted to the old age space. The final data for the Eden space is empty, the GC stops working, and the application thread resumes execution.

At this point, we need to consider a question, if only the GC Cenozoic objects, how can we find all the root objects? Are all the objects of the old age rooted? That would take a lot of time to scan down. So, G1 introduced the concept of RSet. Its full name is remembered Set, which is used to track object references that point to a heap.

In the CMS, there is also the concept of rset, in the old age there is an area used to record the reference to the new generation. This is a point-out, when the young GC, when scanning the root, only need to scan this area, and do not need to scan the entire old age.

But in the G1, and did not use Point-out, this is because a partition is too small, the number of partitions too much, if it is used point-out, it will cause a lot of scanning waste, and some do not need GC of the partition reference also scanned. So the G1 use point-in to solve. Point-in means which partitions refer to objects in the current partition. This avoids an invalid scan by simply scanning these objects as roots. Since there are many new generations, do we need to record references between the new generation? This is not necessary because all new generations will be scanned each time the GC is taken, so you only need to record references between the old and the new generation.

It is important to note that if you refer to a large number of objects, evaluators need to handle each reference, the evaluator overhead is very large, in order to solve the problem of the evaluator cost, in G1 introduced another concept, card-table (Card Table). A card table logically divides a partition into a fixed-size contiguous region, each of which is called a card. Cartoons are often smaller, between 128 and 512 bytes. The card table is typically a byte array that identifies the spatial address of each partition by the index of the card (that is, the array subscript). By default, each card is not referenced. When an address space is referenced, the value of the array index corresponding to this address space is marked as "0″", which is marked as dirty, and RSet also records the array subscript. In general, this rset is actually a hash table,key is another region's starting address, value is a collection, the element inside is the card Table index.

Young GC Stage:

    • Phase 1: Root scan
      Static and local objects are scanned
    • Phase 2: Update RS
      Handling Dirty Card Queue Update RS
    • Phase 3: Handling RS
      Detect objects from younger generations pointing to older generations
    • Phase 4: Object copy
      Copy the surviving objects to the Survivor/old area
    • Phase 5: Handling reference queues
      Soft references, weak references, virtual reference processing
Four, G1 Mix GC

The Mix GC not only carries out normal generation garbage collection, but also reclaims some old-age partitions marked by background scan threads.

It has a GC step in 2 steps:

    1. Global concurrency Token (concurrent marking)
    2. Copy surviving object (evacuation)

Before the mix GC is performed, the global concurrent marking (globally concurrency token) is performed first. What is the implementation process of global concurrent marking?

In the G1 GC, it is primarily a token service for the mixed GC and is not a necessary part of a GC process. The implementation process of global concurrent marking is divided into five steps:

    • Initial tag (initial MARK,STW)
      At this stage, the G1 GC flags the root. This phase is closely related to conventional (STW) young generation garbage collection.
    • Root zone scanning (root region Scan)
      The G1 GC scans for references to the old age in the surviving area of the initial tag and marks the referenced object. This phase runs at the same time as the application (non-STW), and the next STW young generation garbage collection can be started only after the stage has been completed.
    • Concurrency token (Concurrent marking)
      The G1 GC looks for an accessible (surviving) object throughout the heap. This phase runs concurrently with the application and can be interrupted by STW's young generation garbage collection
    • Final Mark (REMARK,STW)
      This phase is STW recycled, helping to complete the tagging cycle. The G1 GC empties the SATB buffer, keeps track of the surviving objects that have not been accessed, and performs reference processing.
    • Garbage removal (CLEANUP,STW)
      In this final phase, the G1 GC performs statistical and RSet purification of STW operations. During the statistics, the G1 GC identifies areas that are completely idle and areas that can be used for mixed garbage collection. The cleanup phase is partially concurrent when a blank area is reset and returned to the idle list.

Tri-Color Labeling algorithm

Referring to concurrent tagging, we have to understand the three-color tagging algorithm for concurrent tagging. It is a useful method to describe a tracking collector, which can be used to deduce the correctness of the collector. First, we divide the object into three types.

    • Black: The root object, or the object and its child objects are scanned
    • Gray: The object itself is scanned, but the child objects in the object have not been scanned
    • White: The object is not scanned, and after all objects are scanned, the final white is a non-unreachable object, that is, the garbage object

When the GC begins to scan the object, follow the steps to scan the object:

The root object is set to black, and the child object is dimmed.

Continues the gray traversal to set the object that has scanned the child object to black.

After iterating through all the objects that can be reached, all the objects that are accessible become black. The unreachable object is white and needs to be cleaned up.

This looks nice, but if the application is running during the tagging process, the pointer to the object is likely to change. In this case, we will encounter a problem: object loss problem

Let's look at one of the following scenarios when the garbage collector scans to the following:

At this point the application performs the following actions:

A.c=c

B.c=null

In this way, the state diagram of the object becomes the following scenario:

This is when the garbage collector marks the scan again:

Obviously, at this point C is white and is considered rubbish to be cleaned out, which is obviously unreasonable. So how do we ensure that the GC tagged objects are not lost when the application is running? There are 2 possible ways to do this:

    1. Record an object while inserting
    2. Record an object when it is deleted

Just this corresponds to the 2 different implementations of CMS and G1:

In the CMS is an incremental update (Incremental update), as long as the write barrier (write barrier) found to have a reference to a white object is assigned to a field in a black object, then the white object becomes gray. It is recorded when inserted.

In G1, using the Stab (snapshot-at-the-beginning) method, when deleting all objects, it has 3 steps:

1, generate a snapshot graph to mark the surviving object at the start tag

2, in the concurrency tag when all the changed objects queued (in the write barrier all the old references to the object is non-white)

3, there may be free rubbish, will be collected in the next

This way, G1 can now know which old partitions can recycle the most garbage. When the global concurrency token is complete, a mix GC is started at some point. These garbage collections are referred to as "hybrid" because they are not only doing normal generation garbage collection, but also recovering portions of the background scan thread-tagged partitions. Mixed-type garbage collection such as:

The hybrid GC is also a replicated cleanup strategy, and when the GC is complete, the space is re-freed.

At this point, the hybrid GC is over. In the next section we go into tuning practice.

Five, tuning practice

Maxgcpausemillis Tuning

The most basic parameters for using GC are described earlier:

-xx:+useg1gc-xmx32g-xx:maxgcpausemillis=200

The first 2 parameters are well understood, the following maxgcpausemillis parameter how to configure it? This parameter, in the literal sense, is the maximum allowable pause time for the GC. G1 try to ensure that the time of each GC pause is within the Maxgcpausemillis range set. How does the G1 do the maximum pause time? This involves another concept, CSet (collection set). It means the collection of areas that are collected in a single garbage collector.

    • Young GC: Select region of all Cenozoic. Control the cost of young GC by controlling the number of region in the Cenozoic.
    • Mixed GC: Select all of the region in the Cenozoic, plus a number of older years with high collection yields based on global concurrent marking statistics. In the user-specified cost target range as far as possible to select the high-yielding old age region.

After understanding these, we can set the maximum pause time to be done. First of all, the maximum pause time we can tolerate is a limit that we need to set within this limit. But what is the value that should be set? We need to strike a balance between throughput and maxgcpausemillis. If the maxgcpausemillis is set too small, then the GC will be frequent and throughput will drop. If the Maxgcpausemillis setting is too large, the application pause time will grow. The default pause time for G1 is 200 milliseconds, and we can start here and adjust the appropriate time.

Other tuning parameters

-xx:g1heapregionsize=n

Sets the size of the G1 area. The value is a power of 2, ranging from 1 MB to a range of megabytes. The goal is to divide approximately 2048 regions according to the smallest Java heap size.

-xx:parallelgcthreads=n

Sets the value of the number of STW worker threads. Set the value of N to the number of logical processors. The value of n is the same as the number of logical processors, up to 8.

If there are more than eight logical processors, the value of n is set to about 5/8 of the number of logical processors. This applies in most cases, except for larger SPARC systems where the value of n can be around 5/16 of the number of logical processors.

-xx:concgcthreads=n

Sets the number of threads for parallel tagging. Set N to about 1/4 of the number of parallel garbage collection threads (parallelgcthreads).

-xx:initiatingheapoccupancypercent=45

Sets the Java heap occupancy threshold for the trigger tag cycle. The default occupancy rate is 45% of the entire Java heap.

Avoid using the following parameters:

Avoid using the-XMN option or other related options such as-xx:newratio to explicitly set the younger generation size. The size of the fixed young generation will override the pause time target.

Trigger Full GC

In some cases, G1 triggers the full GC, and G1 degrades the garbage cleanup using the serial collector, which uses only a single thread to complete GC work, and the GC pause time will reach the second level. The entire application is in suspended animation, unable to process any requests, and our program certainly does not want to see them. So what happens to full GC?

    • concurrency mode failed

G1 starts the tagging cycle, but before the mix GC, the old age is filled up, and G1 discards the tagging cycle. In this case, you need to increase the heap size, or adjust the period (for example, increase the number of threads-xx:concgcthreads, etc.).

    • Failed promotion or evacuation failure

The G1 does not have enough memory to be used by the surviving object or promotion object when the GC is in progress, triggering the full GC. You can see it in the log (To-space exhausted) or (to-space overflow). The way to solve this problem is to:

A, increase the value of the-xx:g1reservepercent option (and increase the total heap size accordingly), and increase the amount of reserved memory for the target space.

b, by reducing the-xx:initiatingheapoccupancypercent early-start marker cycle.

C, you can also increase the number of parallel tag threads by increasing the value of the-xx:concgcthreads option.

    • Mega Object allocation failed

When a giant object cannot find a suitable space to allocate, it launches the full GC to free up space. In this case, you should avoid allocating large chunks of objects, increasing memory, or increasing-xx:g1heapregionsize, so that jumbo objects are no longer giant objects.

As space is limited, there are many tuning practices in G1, which is not listed here, we can explore in the ordinary practice. Finally, expect Java 9 to be released formally, the default use of G1 for the garbage collector Java performance will not improve it?

Go: In-depth understanding of Java G1 garbage collector

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.