Java Garbage Collection Overview
Java GC (Garbage Collection, garbage collection, garbage collector) mechanism, is one of the main differences between Java and C++/C, as a Java developer, generally do not need to specifically write memory recycling and garbage cleanup code, memory leaks and overflow problems, Nor does it need to be as jittery as C programmers. This is because there is an automatic memory management and garbage cleanup mechanism in the Java Virtual machine. In a nutshell, this mechanism marks the memory in the JVM (Java Virtual machine) and determines which memory needs to be recycled, automatically reclaims memory according to a certain recycling strategy, and never Stops (Nerver Stop) to ensure that the memory space in the JVM There is a memory leak and overflow problem with the placement.
The Java GC mechanism mainly accomplishes 3 things: Determine what memory needs to be recycled, determine when to perform GC, and how to perform GC.
Learning the Java GC mechanism can help us troubleshoot various memory overflow or leak issues in our daily work, address performance bottlenecks, achieve higher concurrency, and write more efficient programs.
Phase of the GC
For each object, garbage collection is divided into two stages: finalization and reclamation.
Finalization: Refers to the method of finalize for running this object. Reclamation: Reclaims the memory used by this object.
Basic steps of the GC process
First, confirm that the object is unreachable and will be recycled. Second, if the object has a Finalize method, the object is added to the finalization queue, and then at a point in time the Finalize method is called to release the resource in Finalize. Finally, the memory occupied by the object is recycled.
Questions about the Finalize method
The Finalize method allows the GC process to do more things, increasing the burden on the GC. If the Finalize method of an object takes too long to run, it causes the Finalize method of other objects to be deferred. In the Finalize method, if you create a strong reference reference to another object, this prevents the object from being GC. The Finalize method can be executed in an indeterminate order (that is, to avoid using the Finalize method in a scenario where security is strictly required). It is not guaranteed that the Finalize method will be called in a timely manner, and the program may have exited, but the Finalize method has not been invoked.
Metrics for measuring GC (GC Metrics)
Throughput (throughput): The percentage of total elapsed time that has not been spent executing GC. Pauses (paused): The number of times the GC pauses during the run-time program. Or, the average duration and maximum duration of pauses in the number of pauses of interest. Footprint (footsteps?) ): The size of the heap memory currently in use. Promptness (timeliness): How long an object that is no longer being used can be erased and freed of its memory.
General GC algorithm
All GC algorithms used in Java are variants of the generic GC algorithm concept.
Assumptions for the General GC algorithm:
Recently created objects are likely to be unreachable soon (unreachable, can be recycled), such as local variables declared inside the method, when the program runs out of the scope of the local variable, the object referenced by the local variable is soon unreachable. The longer an object remains up to (reachable), the less likely it is to be recycled.
In the Java GC, the object is divided into generations (generation) or spaces (space). Java divides the object into young (younger generation), tenured (old generation) and perm (permanent generation). During GC, objects move from one space to another.
Object Spaces (objects space)
- Young: The newly created object is saved in the younger generation, and the objects in this generation can be recycled in "minor" or "major" collections.
- Tenured: Older generations have survived from younger generations and can only be recycled in "major".
- Perm: A permanent generation holds the objects required by the JVM, such as class objects and method objects, as well as their bytecode and internal strings. The GC for objects in Perm means that all classes are unloaded.
The size of each block is determined by the current size of the memory and can be changed at run time. The relationship between each space is as follows:
Young Spaces (Youth space)
- Eden Space: Stores newly created objects since the last GC was completed, in addition to objects belonging to Perm. When minor collection occurs, the object or GC in Eden space is cleared away or moved to survivor space.
- Survivor spaces: This space stores the young object that survived the last GC. In the minor GC, these objects are either cleared by GC or moved to a different survivor space.
Minor Collections and Major collections
- Minor Collection was executed when young space was fully occupied. It is faster than major collections because minor collection only checks for major collection a corresponding subset object. The frequency of minor collection is higher than that of major collection.
- Major collection is executed when tenured space is fully occupied. He'll clean up tenured and young.
Three ways to run the GC
In Java5 and Java6, there are 4 garbage collection algorithms, one algorithm will no longer support, the remaining three garbage collection algorithms are:serial,throughput and concurrent low pause .
Stop the world (the way to stop All programs): In this way the GC is running, and all programs in the JVM are not allowed to run until the GC is complete. Serial Collector at this time do minor and major collection. Throughput collector At this time do major collector. Incremental (incremental run mode): No Java GC algorithm is currently supported for this operation. When the GC runs in this way, the GC allows the program to do a short period of work, and then do the garbage collection work. Concurrent (parallel Run): Throughput collector at this time do minor collect,concurrent low pause collector at this time do minor and major collection. In this mode of operation, the GC and the program run in parallel, so the program is only briefly paused.
GC algorithm
Serial algorithm: Use-xx:+USESERIALGC to turn on GC for this algorithm. The GC uses the same thread as the application to do minor collection and major collection. Throughput: Use -xx:+USEPARALLELGC to turn on this algorithm GC. GC uses multithreading to do minor collection to reduce the time the program stops. However, for major collection, use the same thread as the same program. When a multi-core CPU is available, and the program has a large number of short life-cycle objects, it is better to have no limit to the program pause time. Concurrent low Pause: Use -xx:+useconcmarksweepgc to turn on this algorithm GC. Use multithreading to do minor and major collection. When there is a multi-core CPU, and the program has a large number of long-life-cycle objects, and the program pauses time is limited, the effect is better.
When does GC occur?
The time that the GC occurs is affected by the heap memory size. If the heap memory is small, the GC executes quickly, but quickly fills up, so the GC is more frequent, and if the heap memory is large, the GC performs slower and does not fill up quickly, so the comparison frequency is relatively low
Basic GC Debugging
Throughput goal-xx:gctimeratio=N: Indicates the amount of CPU time spent on the total time to run the program. Maximum pause time goal-xx:maxgcpausemillis=N: The maximum number of milliseconds the program pauses per GC. Footprint Goal: If the other targets are reached, first reduce the heap size until the first two goal are no longer satisfied and then slowly increase. Until the first two goal are satisfied. -xms=n (starting) and-xmx=n (maximum) heap size, both parameters should be familiar, which is the minimum heap memory used by the JVM and the maximum heap memory count. -xx:minheapfreeratio=n,-xx:maxheapfreeratio=N: The minimum and maximum amount of free heap memory and the proportion of heap memory being used. When the free heap memory ratio is less than minheapfreeratio, the memory space begins to expand. When the free heap memory ratio is greater than maxheapfreeratio, the memory space begins to decrease. -xx:newsize=n,-xx:maxnewsize=n: The default size of young space (including Eden + Survivor 1 + Survivor 2). -xx:newratio=proportion of N:young and tenured. -xx:survivorratio=N: The ratio between each survivor space and Eden. -xx:maxpermsize=the maximum size of the n:perm. -xx:targetsurvivorratio=N: The target scale of the space survived after each GC. -xx:+DISABLEEXPLICITGC: When this parameter is turned on, calling System.GC () in the program will not work. The default is off. -xx:+SCAVENGEBEFOREFULLGC: When this parameter is turned on, minor collection is executed once per major collection. Open by default. -xx:+usegcoverheadlimit: When this parameter is turned on, if the total elapsed time of 98% of the time is doing GC, then Outofmemmoryerror is thrown. Open by default.
4 ways to learn Java GC mechanism
1, how the memory is allocated;
2, how to ensure that the memory is not recovered by error (that is, which memory needs to be recycled);
3, under what circumstances the GC and the manner in which the GC is executed;
4, how to monitor and optimize the GC mechanism.
The basic algorithm of GC mechanism is: generational collection
The collection methods for each generation are described below.
Young generations:
In fact, in the previous section, we have introduced the new generation of major garbage collection methods, in the Cenozoic, using the "Stop-copy" algorithm to clean up, the new generation of memory divided into 2 parts, 1 parts of the Eden region larger, 1 parts survivor smaller, and is divided into two equal parts. Each time you clean up, copy the Eden area and an object that is still alive in the survivor to another survivor, and then clean up the Eden and the survivor just now.
It can also be found here that the two parts of the stop-copying algorithm are not always equal (the traditional stop-copy algorithm has two parts of memory equal, but the new generation uses 1 large Eden areas and 2 small survivor areas to avoid this problem)
Because most of the objects are short-lived, and even survived the survivor, so the Eden area and the survivor ratio is larger, the hotspot default is 8:1, that is, the 80%,10%,10% of the Cenozoic respectively. If the amount of memory survived in the Survivor+eden is more than 10% in a single collection, some of the objects need to be allocated to the old age. Use the-xx:survivorratio parameter to configure the capacity ratio of the Eden Zone Survivor area, which is 8 by default, representing Eden:survivor1:survivor2=8:1:1.
Old Age:
The old age stores more objects than the younger generation, and there are large objects, in the old age of memory cleanup, if you use the stop-copy algorithm, it is quite inefficient. In general, the algorithm used in the old age is the marker-collation algorithm, that is, mark out the surviving objects (which are referenced) and move all the surviving objects to one end to ensure the memory is contiguous. In the event of a minor GC, the virtual opportunity checks whether the size of each promotion into the old age is greater than the remaining space size of the old age, and if it is greater, it triggers the full GC directly, otherwise the-xx:+handlepromotionfailure is set (allowing the warranty to fail) , if allowed, only MINORGC is tolerated, memory allocation fails, and if not allowed, full GC is still performed (this means that if-xx:+handle promotionfailure is set, triggering MINORGC triggers the full GC at the same time. Even if there is a lot of memory in the old age, it is best not to do so.
Method Area (permanent generation):
There are two types of recovery for a permanent generation: constant pool constants, useless class information, constant recycling is simple, no references can be recycled. For recycling of useless classes, 3 points must be guaranteed:
- All instances of the class have been recycled
- The ClassLoader of the loaded class has been recycled
- Class object is not referenced (that is, where the class is not referenced by reflection)
Recovery of a permanent generation is not required, and parameters can be used to set whether the class is recycled. Hotspot provides-XNOCLASSGC for control using-verbose,-xx:+traceclassloading,-xx:+traceclassunloading can view class load and unload information-verbose,-xx:+ Traceclassloading can be used in the product version of the hotspot,-xx:+traceclassunloading requires the Fastdebug version of Hotspot support
In the GC mechanism, the important role is the garbage collector, garbage collector is the implementation of the GC, the Java Virtual Machine specification for the garbage collector does not have any provisions, so different vendors implement the garbage collector is not the same
Before introducing the garbage collector, it is important to be clear that in the new generation of stop-replication algorithms, the Meaning of "stop (Stop-the-world)" is to suspend execution of all other threads while reclaiming memory. This is inefficient, and now a variety of new generation collectors are optimizing this point, but still only the time to stop is shortened, not completely canceled stop.
- serial Collector: The new generation collector, using the Stop replication algorithm, uses one thread for GC, and other worker threads to pause. Use-XX:+USESERIALGC to run memory reclamation using serial+serial old mode (this is also the default value that the virtual machine runs in client mode)
- parnew Collector: The New generation collector, using the Stop copy algorithm, The multi-threaded version of the serial collector, with multiple threads for GC, other worker threads paused, focused on shortening the garbage collection time. Use the-XX:+USEPARNEWGC switch to control the collection of memory using the parnew+serial old collector combination, and use-xx:parallelgcthreads to set the number of threads that perform memory reclamation.
- Parallel Scavenge collector: The new generation collector, using the Stop Replication algorithm, focuses on CPU throughput, that is, the time/total time to run user code, such as: The JVM runs for 100 minutes, which runs user code 99 minutes, garbage collection 1 minutes, The throughput is 99%, the collector can use the most efficient CPU, suitable for running background operations (attention to shorten the garbage collection time collector, such as CMS, waiting time is very small, so suitable for user interaction, improve the user experience). Use the-XX:+USEPARALLELGC switch to control the garbage collection using the Parallel scavenge+serial old collector (which is also the default value in server mode); Use-XX: Gctimeratio to set the proportion of user execution time to total time, default 99, which is 1% of the time used for garbage collection. Use-xx:maxgcpausemillis to set the maximum pause time for GC (this parameter is only valid for parallel scavenge)
- Serial old collector: older collectors, single-threaded collectors, using tag grooming (cleanup) and compact (compact), clean-up is the discarded object, only the surviving objects, compression is moving the object, Fill the space to ensure that the memory is divided into 2 pieces, a whole object, a piece of idle) algorithm, using single-threaded GC, other worker thread paused (note, in the old age of the labeling algorithm cleanup, also need to suspend other threads), before JDK1.5, Serial The old collector is used in conjunction with the Parallelscavenge.
- Parallel old collector: older collector, multi-threaded, multi-threaded mechanism with Parallel scavenge bad, using tag collation (unlike serial, here is the summary (summary) and compact (compression) , the summary means that the surviving objects are copied to the pre-prepared area, rather than the algorithm that cleans up the discarded objects like sweep (cleanup), and the other threads still need to be paused when parallel old executes. Parallel old is useful in multicore computing. Parallel Old appeared (JDK 1.6), with the Parallel scavenge with a good effect, fully embodies the Parallel scavenge collector throughput first effect. Use the-XX:+USEPARALLELOLDGC switch to control the collection using the Parallel scavenge +parallel old combo collector.
- CMS (Concurrent Mark Sweep) Collector: The old collector, dedicated to obtaining the shortest recovery pause time, using the tag cleanup algorithm, multi-threading, the advantage is concurrent collection (the user thread can work concurrently with the GC thread), the pause is small. Use-XX:+USECONCMARKSWEEPGC for memory reclamation of parnew+cms+serial old, take precedence over PARNEW+CMS (see later), and when the user thread is out of memory, use the fallback scenario Serial old collection.
CMS collects the method is: first 3 times mark, then 1 clears, 3 times the first two marks in the mark is the initial mark and the re-mark (at this time still need to stop (stop the World)), the initial tag (Initial Remark) is the object that the GC roots can be associated with (that is, the object that has the reference) , the pause time is short, the concurrency token (Concurrent remark) is the process of performing a GC roots lookup reference, does not require a user thread to pause, and the re-tagging (remark) is the part of the marked change that needs to be marked during the initial and concurrent tags, so add this part The process of marking, the pause time is much smaller than the concurrent tag, but slightly longer than the initial tag. After the token is completed, the concurrency cleanup begins without requiring a user thread to pause. Therefore, in the CMS cleanup process, only the initial marking and re-tagging need a short pause, concurrent tagging and concurrent cleanup do not need to pause the user thread, it is highly efficient and suitable for high-interaction situations. The CMS also has drawbacks, it needs to consume additional CPU and memory resources, when CPU and memory resources are tight, CPU is less, it will increase the system burden (CMS default boot thread number is (CPU number +3)/4). In addition, in the concurrent collection process, the user thread is still running, still produce memory garbage, so it is possible to generate "floating garbage", this time can not be cleaned up, only the next full GC cleanup, so during the GC, you need to reserve enough memory for the user thread to use. So the collector using the CMS is not the old age to trigger full GC, but to use the more than half (default 68%, that is, 2/3, set with-xx:cmsinitiatingoccupancyfraction), it is necessary to carry out fully GC, If the user thread consumes memory that is not particularly large, you can properly raise the-xx:cmsinitiatingoccupancyfraction to reduce the number of GC times and improve performance, if the reserved user thread memory is not enough, it will trigger concurrent Mode Failure, At this point, the fallback scenario will be triggered: collect using the serial old collector, but the pause time is long, so the-xx:cmsinitiatingoccupancyfraction should not be too large. Also, the CMS uses a markup cleanup algorithm that causes memory fragmentation and can be used to set whether to defragment after the full GC,-xx:+usecmscompactatfullcollection Use-xx:cmsfullgcsbeforecompaction to set the full GC with compression once, after the number of uncompressed full GC executions.
- G1 Collector: Officially released in JDK1.7, with the current situation of the new generation, the concept of old age is very different, the current use less, do not introduce.
Note the difference between concurrency (Concurrent) and Parallelism (Parallel):
Concurrencymeans that the user thread executes concurrently with the GC thread (not necessarily parallel, possibly alternately, but in general) and does not need to pause the user thread (in fact, the user thread in the CMS still needs to be paused, only very short, and the GC thread executes on the other CPU);
ParallelCollection means that multiple GC threads work in parallel, but the user thread is paused at this point, so the serial and parallel collectors are parallel, and the CMS collector is concurrent.
Java Foundation-gs (garbage Collection)