Deep understanding of JVM (6)-Java garbage collection mechanism GC
C/C ++ developers have the highest rights in terms of memory management, but they also assume great maintenance responsibilities. With the help of the JVM (Java Virtual Machine) management mechanism, Java programmers no longer have to worry about memory leakage and memory overflow. Therefore, this article will discuss in depth the internal structure and running principle of JVM.
I. How to determine whether an object is dead
When GC wants to recycle an object, how does it determine that the object is dead (that is, it cannot be used again)? when an object is no longer used, this object can be recycled.
(1) reference counting algorithm
Reference count is an early policy in the garbage collector. In this method, each object (not referenced) in the heap has a reference count. When an object is created and assigned to a variable, the variable count is set to 1. When any other variable is assigned a reference to this object, the count is incremented by 1 (a = B, then the object referenced by B + 1 ), however, when a reference of an object exceeds the lifecycle or is set as a new value, the reference count of the object is reduced by 1. Any object with 0 reference count can be collected as garbage. When an object is garbage collected, the number of objects it references is reduced by 1.
The reference counting algorithm is easy to implement, easy to understand, and highly efficient in judgment. In most cases, it is a very good algorithm. However, it is worth noting that,Mainstream Java virtual machines do not use the reference counting algorithm.The main reason is that it is difficult to solve the cross-loop reference between objects. A simple example:
public class Main { public static void main(String[] args) { MyObject object1 = new MyObject(); MyObject object2 = new MyObject(); object1.object = object2; object2.object = object1; object1 = null; object2 = null; }}class MyObject{ public Object object = null;}
In the code, the object object1 and the object object2 reference each other. In this case, the reference counting algorithm will never be recycled, but in actual conditions, such mutual guidance has no practical significance.
(2) Accessibility Analysis Algorithm
A series of objects called "GC Roots" are used as the starting point to start searching down from these nodes. The paths searched through are referred to as reference chains. When an object to GC Roots is not connected by any reference chain (in graph theory, it is impossible to connect from GC Roots to this object), it is proved that this object is unavailable.
However, note that the JVM does not immediately recycle objects if they are not reachable,Objects that are determined to be inaccessible must go through at least two marking processes to become recyclable objects.If you still haven't escaped the possibility of becoming a recyclable object during the two marking processes, it will basically become a recyclable object.
Blue: surviving objects
White: identifies recyclable objects <喎?http: www.bkjia.com kf ware vc " target="_blank" class="keylink"> VcD4NCjxoMiBpZD0 = "two garbage collection algorithms"> 2. Garbage Collection Algorithms
After determining which garbage can be recycled, the garbage collector starts to recycle the garbage. However, there is a problem: how to efficiently recycle the garbage. Java Virtual Machine specifications do not clearly define how to implement the Garbage Collector. Therefore, virtual machines of various manufacturers can implement the Garbage Collector in different ways, so here we will only discuss the core ideas of several common garbage collection algorithms.
(1) mark-clear Algorithm
The most basic collection algorithm is the Mark-Sweep algorithm, which, like its name, is divided into two phases: Mark and clear: first, mark all objects to be recycled. After marking is completed, all marked objects are recycled. the marking process of these objects is described in the previous section about object marking and determination. It is the most basic collection algorithm because the subsequent collection algorithms are based on this idea and the shortcomings are improved. There are two main disadvantages: efficiency issues, low efficiency in both marking and clearing processes, and space problems. After marking and clearing, a large number of discontinuous memory fragments are generated, too many space fragments may cause the system to fail to find enough continuous memory to trigger another garbage collection action in advance when a large object needs to be allocated during the program running.
(2) replication algorithm
To solve the efficiency problem, a collection algorithm called Copying has emerged, which divides available memory into two equal-size blocks by capacity, use only one of them at a time. When the memory of this block is used up, copy the still living objects to the other block, and then clear the used memory space. In this way, the whole half-zone memory is recycled each time, and the memory allocation does not need to consider complicated situations such as memory fragmentation. As long as the heap top pointer is moved, the memory can be allocated in order, which is easy to implement, efficient operation. The cost of this algorithm is to reduce the memory to half of the original size, which is a little higher. Today, commercial virtual machines all use this collection algorithm to reclaim the new generation.
(3) Tag-Sorting Algorithm
The replication collection algorithm requires a large number of replication operations when the object survival rate is high, and the efficiency will be reduced. More importantly, if you do not want to waste 50% of the space, you need to allocate extra space for guarantee to deal with the extreme situation where 100% of all objects in the memory to be used survive, therefore, this algorithm cannot be directly used in the old age.
According to the characteristics of the old age, someone proposed another Mark-Compact algorithm, which is still the same as the Mark-clear algorithm, however, the subsequent steps do not directly clean the recyclable objects, but move all the surviving objects to one end, and then directly clear the memory outside the end boundary.
(4) Generational collection Algorithm
Currently, commercial Virtual Machine garbage Collection uses the Generational Collection algorithm, which has no new idea, the memory is divided into several parts based on the object lifecycle. Java heap is generally divided into the new generation and the old generation, so that the most appropriate collection algorithm can be used according to the characteristics of each generation. In the new generation, a large number of objects are found to die each time during garbage collection, and only a small number of objects survive. The replication algorithm is used to complete the collection by paying the copy cost of a small number of surviving objects. In the old age, because the object has a high survival rate and no extra space to allocate a guarantee for it, you must use the "mark-clean" or "mark-sort" algorithm for recovery.
Iii. Garbage Collector
If the collection algorithm is the method of memory collection, the garbage collector is the specific implementation of memory collection.Java virtual machine specification does not specify how garbage collectors are implemented. Therefore, different vendors and virtual machines of different versions provide garbage collectors in a large area. The collectors discussed here are based on Sun HotSpot VM version 1.6 Update 22.
Serial COLLECTOR: a new generation of collectors, which uses the stop replication algorithm, one thread for GC, and other worker threads for suspension. Use-XX: + UseSerialGC to run in Serial + Serial Old mode for memory recovery (this is also the default value for running virtual machines in Client Mode)
ParNew COLLECTOR: a new generation of collectors. The stop replication algorithm is used. The multi-threaded version of the Serial collector uses multiple threads for GC. Other working threads are paused, so the garbage collection time is shortened. Use the-XX: + UseParNewGC switch to control the memory collection by combining ParNew + Serial Old collectors; Use-XX: ParallelGCThreads to set the number of threads for memory collection.
Parallel Scavenge collector: A New Generation collector that uses the stop replication algorithm to focus on CPU throughput, that is, the time/total time for running user code. For example, the JVM runs for 100 minutes, of which the user code runs for 99 minutes, when the Garbage Collector collects data for 1 minute, the throughput is 99%. This collector can use the CPU with the highest efficiency and is suitable for running background operations (focus on collectors that shorten the garbage collection time, such as CMS, with little wait time, so it is suitable for user interaction to improve user experience ). Use the-XX: + UseParallelGC switch to control the garbage collection using the Parallel Scavenge + Serial Old collector combination (which is also the default value in Server mode); Use-XX: GCTimeRatio is used to set the percentage of execution time to the total time. The default value is 99, which means 1% of the time is used for garbage collection. Use-XX: MaxGCPauseMillis to set the maximum pause time of GC (this parameter is only valid for Parallel Scavenge)
Serial Old COLLECTOR: collector in the Old age, single-thread collector, and mark (the sorting method is Sweep (cleanup) and Compact (Compact). The cleanup method is to remove discarded objects, only surviving objects are left. compression is used to fill up the mobile object and ensure that the memory is divided into two parts. One part is all objects and the other part is idle.) The algorithm uses a single thread for GC, other working threads are paused (note that the mark sorting algorithm is cleared in the Old age, and other threads need to be paused). Before JDK1.5, the Serial Old collector and ParallelScavenge were used together.
Parallel Old COLLECTOR: collector in the Old age, multi-thread, multi-thread mechanism, and Parallel Scavenge are not bad. Mark-based sorting (unlike Serial Old, the sorting here is Summary (Summary) and Compact (compression), which means copying surviving objects to pre-prepared areas, rather than clearing obsolete objects like Sweep (cleaning, when Parallel Old is executed, other threads still need to be paused. Parallel Old is useful in multi-core computing. After the appearance of Parallel Old (JDK 1.6), it works well with Parallel Scavenge to fully reflect the effect of Parallel Scavenge's preferential collector throughput. Use the-XX: + UseParallelOldGC switch to control the use of the Parallel Scavenge + Parallel Old combined collector for collection.
CMS (Concurrent Mark Sweep) COLLECTOR: the collector of the old generation, designed to get the shortest recovery pause time, using the Mark clearing algorithm, multithreading, the advantage is that concurrent collection (the user thread can work with the GC thread at the same time), with little pause. Use-XX: + UseConcMarkSweepGC for ParNew + CMS + Serial Old for memory reclaim. Use ParNew + CMS first (for the reason, see the following). When the user thread memory is insufficient, use the alternative solution Serial Old for collection.
CMS collects three tags and then clears them. the first two of the three tags are the initial tag and re-tag (stop the world at this time )), the Initial mark (Initial Remark) is the object that can be associated with GC Roots (that is, the referenced object). The pause time is short, and the Concurrent mark (Concurrent remark) it is the process of executing GC Roots to search for references without the user thread pause. Remark is the part with marked changes during the initial marking and concurrent marking, therefore, the pause time is much smaller than the concurrency mark, but longer than the initial mark. After the mark is completed, concurrent cleanup is started, and the user thread does not need to pause.
Therefore, in the CMS cleanup process, only the initial tag and remark need to be paused for a short time, and the concurrent tag and concurrent cleanup do not need to pause the user thread. Therefore, the efficiency is very high and is suitable for high interaction scenarios.
CMS also has its disadvantages. It consumes additional CPU and memory resources. When the CPU and memory resources are insufficient and the CPU usage is low, will increase the burden on the system (CMS default number of startup threads (number of CPUs + 3)/4 ).
In addition, in the concurrent collection process, the user thread is still running, and the memory garbage is still generated, so the "floating garbage" may be generated. This time cannot be cleared, and only the next Full GC can be cleared, therefore, during GC, You need to reserve enough memory for the user thread to use. Therefore, the CMS collector does not trigger Full GC when it is Full in the old age, but uses more than half of it (the default value is 68%, that is, 2/3, Which is set using-XX: CMSInitiatingOccupancyFraction) full GC is required. If the memory consumed by the user thread is not very large, you can increase-XX: CMSInitiatingOccupancyFraction appropriately to reduce GC times and improve performance. If the reserved user thread memory is insufficient, concurrent Mode Failure will be triggered. In this case, the Standby solution will be triggered: Use the Serial Old collector for collection, but the pause time will be long, so-XX: CMSInitiatingOccupancyFraction should not be too large.
In addition, CMS uses the mark clearing algorithm, which will lead to the generation of memory fragments. You can use-XX: + UseCMSCompactAtFullCollection to set whether to perform fragment after Full GC, and use-XX: CMSFullGCsBeforeCompaction is used to set the number of Full GC times that are not compressed before a Full GC is executed.
G1 COLLECTOR: officially released in JDK1.7, which is very different from the new generation and old age concepts of the current situation. It is rarely used and will not be introduced. Iv. Memory Allocation and recovery policy Java Virtual Machine Structure
Java Memory Allocation
Distribution of generations in the Java heap
(1) Young (Young generation): Used to store new objects. When an object is created, the memory allocation first occurs in the young generation (large objects can be directly created in the old generation). Most objects will not be used soon after creation, as a result, it quickly became inaccessible, so it was cleared by the GC mechanism of the young generation (IBM research shows that 98% of objects are quickly extinct ), this GC mechanism is called Minor GC or Young GC. Note: Minor GC does not mean that the memory of the young generation is insufficient. In fact, it only indicates GC in the Eden area.
The memory allocation on the young generation is like this. The young generation can be divided into three areas: Eden area (used to indicate the area where the memory is allocated for the first time) and two survival zones (Region vor 0, Region vor 1 ).
Most newly created objects will be allocated to the Eden area, and most of them will soon die out. The Eden area is a continuous memory space, so the memory allocated on it is extremely fast;
When the Eden area is full, execute Minor GC to clear the extinct objects and copy the remaining objects to region vor0 (in this case, Region vor1 is blank, there is always one of the two vbrs that is blank );
After that, each time the Eden area is full, execute Minor GC and add the remaining objects to kernel vor0;
When the remaining vor0 is full, copy the objects that are still living to zoovor1. After Minor GC is executed in the Eden area, add the remaining objects to zoovor1 (at this time, optional vor0 is blank );
When the two active zones are switched several times (the HotSpot virtual machine is controlled for 15 times by default with-XX: MaxTenuringThreshold, which is later than this value, objects that are still alive (in fact, only a small part, such as the objects we define) will be copied to the old age.
(2) Old ):It mainly stores memory objects with long lifecycles in applications.
If the object remains alive for a long time in the Young generation but is not cleared (that is, after several Young GC times), it will be copied to the old generation, the space of the old generation is generally larger than that of the young generation, and more objects can be stored. GC occurs less frequently on the old generation than on the young generation. When the old generation has insufficient memory, Major GC will be executed, also called Full GC.
You can use the-XX: + UseAdaptiveSizePolicy switch to control whether dynamic control policies are used. If dynamic control is enabled, the size of each region in the Java heap is dynamically adjusted and the age of the Java heap.
If the object is large (such as a long string or a large array) and the Young space is insufficient, the large object will be directly allocated to the old age (large objects may trigger early GC and should be used less, avoid using short-lived large objects ). -XX: PretenureSizeThreshold is used to control the size of objects directly promoted to the old age. objects larger than this value are directly allocated to the old age.
There may be situations where older generation Objects Reference New Generation objects. If Young GC is required, you may need to query the entire age to determine whether the Young GC can be cleared. This is obviously inefficient. The solution is to maintain a 512 byte block in the old generation-"card table", where all records of the New Generation objects referenced by the old generation objects are recorded. When Young GC is used, you only need to check it here, and you do not need to check all the old ages. Therefore, the performance is greatly improved.
(3) Permanent (Permanent ):It refers to the permanent storage area of the memory, that isMethod AreaStores information about the Class and Meta. When the Class is loaded, it is placed in the PermGen space area. unlike the Heap region where the Instance is stored, GC (Garbage Collection) does not clean up PermGen space during the main program running, so if your APP loads a lot of classes, the PermGen space error may occur.
There are two types of permanent generation recycling: constants in the constant pool, useless class information, and constant recycling is very simple, and can be recycled without reference. For useless classes to be recycled, three points must be guaranteed:
All instances of the Class have been recycled; The ClassLoader of the loaded Class has been recycled; the Class Object of the Class object has not been referenced (that is, the Class is not referenced through reflection ).
Permanent replacement is not necessary. You can use parameters to set whether to recycle classes. HotSpot provides-Xnoclassgc for control.
Use-verbose,-XX: + TraceClassLoading,-XX: + TraceClassUnLoading to view the class loading and unloading information-verbose,-XX: + TraceClassLoading can be used in the Product hot spot; -XX: + TraceClassUnLoading requires fastdebug HotSpot support.
V. GC Parameters
Heap settings
-Xms: initial heap size
-Xmx: Maximum heap size
-XX: NewSize = n: Set the young generation size
-XX: NewRatio = n: Ratio of the young generation to the old generation. For example, if the value is 3, the ratio of the young generation to the old generation is. The young generation accounts for 1/4 of the young generation and the old generation.
-XX: Ratio vorratio = n: Ratio of the Eden zone in the young generation to the two region vor zones. Note that there are two vor zones. For example, 3 indicates Eden: Primary vor =. One primary vor zone accounts for 1/5 of the young generation.
-XX: MaxPermSize = n: sets the persistent generation size.
Collector settings
-XX: + UseSerialGC: sets the serial collector.
-XX: + UseParallelGC: set parallel collectors
-XX: + UseParalledlOldGC: sets the parallel elder generation collector.
-XX: + UseConcMarkSweepGC: sets the concurrent collector.
Garbage collection statistics
-XX: + PrintHeapAtGC heap details
-XX: + PrintGCDetails GC details
-XX: + PrintGCTimeStamps: Print GC time information
-XX: + PrintTenuringDistribution: prints the age information.
-XX: + HandlePromotionFailure old-age allocation guarantee (true or false)
Parallel collector settings
-XX: ParallelGCThreads = n: set the number of CPUs used for parallel collector collection. Number of parallel collection threads.
-XX: MaxGCPauseMillis = n: sets the maximum pause time for parallel collection.
-XX: GCTimeRatio = n: Set the percentage of the garbage collection time to the running time. The formula is 1/(1 + n)
Concurrent collector settings
-XX: + CMSIncrementalMode: Set to incremental mode. Applicable to a single CPU.
-XX: ParallelGCThreads = n: set the number of CPUs used when the concurrent collector is used for collecting data in parallel in the young generation. Number of parallel collection threads.