Full text reproduced: http://pengjiaheng.iteye.com/blog/548472
the bottleneck of garbage collection
Traditional generational garbage collection method has reduced the burden of garbage collection to application to a certain extent, and pushed the application throughput to a limit. But one of the things he couldn't solve was the application suspension that the full GC brought. In some scenarios where real-time requirements are high, the request backlog and request failures caused by GC pauses are unacceptable. This type of application may require the return time of the request to be within hundreds of or even dozens of milliseconds, if the generation of garbage collection to achieve this indicator, can only limit the maximum heap settings in a relatively small range, but this has limited the application itself processing power, is also not acceptable.
The generational garbage collection method does also take into account the real-time requirements and provides a concurrent collector, which supports the setting of the maximum pause time, but is limited by the memory partitioning model of generational garbage collection, and its effect is not ideal.
In order to achieve real-time requirements (in fact, the Java language was originally designed in the embedded system), a new garbage collection method, it supports both short pause time, but also support large memory space allocation. Can be a good solution to the problems caused by traditional generational approach.
Evolution of Incremental Collection
The method of incremental collection can theoretically solve the problems caused by traditional generational methods. Incremental collection divides the heap space into a series of memory blocks, using a subset of them first (not all of them), and then putting the surviving objects in the previously used parts into the unused space in the garbage collection, so that you can achieve the effect of using edge collection all the time. Avoid the traditional generation of the entire use of the end of the suspension of the recycling situation.
Of course, the traditional generational collection method also provides concurrent collection, but he has a very deadly place, that is, the entire heap as a block of memory, which will cause fragmentation (cannot be compressed), on the other hand, each collection is the whole heap collection, unable to choose, in the pause time control is still very weak. and incremental way, through the memory space of the block, just can solve the above problem.
Garbage firest (G1)
This part of the content of the main reference here, this article is considered to G1 algorithm paper interpretation. I'm not adding anything, either.
Goal
From the design goal to see G1 is entirely for large-scale applications and preparation.
Support for a large heap
High throughput
--Support for multi-CPU and garbage collection threads
--using parallel collection in the case where the main thread is paused
--use concurrent collection in the case where the main thread is running
Real-time target: can be configured to consume up to m milliseconds for garbage collection in n milliseconds
Of course G1 to achieve real-time requirements, relative to the traditional generational recovery algorithm, there will be some loss in performance.
Algorithm detailed
G1 is a great way to get to a perfect place. He learned the benefits of incremental collection by dividing the entire heap into a single, equal-sized region. The memory is collected and divided in region; At the same time, he also absorbed the characteristics of the CMS, the garbage collection process into several stages, dispersed a garbage collection process, moreover, G1 also recognized the idea of generational garbage collection, that different objects have different life cycle, can take different collection methods, therefore, It also supports generational garbage collection. In order to achieve predictable recovery time, G1 after scanning the region, the size of the active objects in the order, the first collection of those active objects small region, so as to quickly reclaim space (less active objects to copy), because the active object is small, it can be considered that most of the garbage, So this approach is called Garbage first (G1) garbage collection algorithm, namely: garbage priority recycling.
Recycling steps:
Initial tag (Initial marking)
G1 for each region, there are two identity bitmap, one for previous marking bitmap, and one for the next marking Bitmap,bitmap that contains a bit of address information to point to the object's starting point.
Before starting initial marking, first the concurrent empty next marking bitmap, then stop all application threads, and scan the object that identifies the direct access to root in each region, placing the value of top of the region in the next top at Mark Start (Tams), after which all application threads are resumed.
The conditions that trigger the execution of this step are:
G1 defines a threshold for the percentage of the JVM Heap size, called H, and a h,h value of (1-h) *heap size, the current value of H is fixed, and subsequent G1 may change it dynamically, depending on the operation of the JVM, in a generational manner, G1 also defines the value of a U and soft limit,soft limit as H-u*heap Size, and when the memory used in the Heap exceeds the soft limit value, it is Perform this step as soon as possible within the allowed GC pause time range after the up is executed;
In pure mode, the G1 makes a ring between the marking and the clean up so that clean up can fully use the marking information, and when clean up starts to recycle, the regions that brings up the most memory space is first recycled, When the regions is recycled into less space, G1 re-Initializes a new marking and clean up ring.
Concurrency token (Concurrent marking)
The objects that were scanned by the previous initial marking are traversed to identify the active state of the underlying objects of those objects, and the relationships that have been recorded to remembered set logs during this period have been applied to objects that have been concurrently modified by the thread. The newly created objects are placed in an address range higher than the top value, and these newly created objects are active by default and modify the top values.
Final Mark Pause (final marking pause)
When the remembered set logs of the application thread is not full, the filled RS buffers is not placed, in which case the changes to the card recorded in the remebered set logs are updated, so this step is required. This step is to remembered the application thread exists in the contents of the set logs, and modify the corresponding remembered sets, this step needs to pause the application, parallel running.
Survival object calculation and cleanup (live Data counting and Cleanup)
It should be noted that in G1, it is not that final marking pause is executed, it is certain to perform cleanup this step, because this step needs to suspend the application, G1 in order to achieve quasi-real-time requirements, It is necessary to reasonably plan when to execute the cleanup based on the pause time that is caused by the maximum GC specified by the user, and there are several other situations that will trigger the execution of this step:
G1 uses a replication method to collect, it must be guaranteed every time the "to space" is sufficient, so G1 to take the strategy is when the memory space has been used to achieve h, the implementation of cleanup this step;
For the G1 of Full-young and Partially-young's generational mode, there is also a situation that triggers cleanup execution, Full-young mode, G1 based on the application's acceptable pause time, and the recovery of young Regions takes time to estimate the value of a yound regions, cleanup executes when the number of young regions assigned to the JVM reaches this value; Partially-young mode, The cleanup is executed as frequently as possible within the acceptable pause time range of the application, and the maximum Non-young regions cleanup is executed.
JVM garbage Collection Mechanism summary (4): A new generation of garbage collection algorithms