What are the common garbage collectors in Java?
In fact, the garbage collector (Gc,garbage Collector) is closely related to the specific JVM implementations, and different vendors (IBM, Oracle), different versions of the JVM, offer different choices. Next, I'll talk about the most mainstream Oracle JDK.
It is the oldest garbage collector, "Serial" embodied in its collection work is single-threaded, and in the
During the garbage collection process, the notorious "Stop-the-world" status is entered. Of course, its single-threaded design also means a streamlined GC implementation, no need to maintain complex data structures, and initialization is simple, so it has always been the Client mode
Default options for the next JVM.
From the perspective of the age, it is often called Serial old, which uses the labeling-sorting (mark-compact) algorithm, which distinguishes it from the new generation of replication algorithms.
The corresponding JVM parameters for the Serial GC are:
-XX:+UseSerialGC
is obviously a new generation GC implementation, which is actually a multithreaded version of the Serial GC, the most common should be
Use the scene is to cooperate with the old age CMS GC work, the following are the corresponding parameters
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
- CMS (Concurrent Mark Sweep) GC
Based on the tag-purge (Mark-sweep) algorithm, the design goal is to minimize the pause time, which is important for response time sensitive applications such as the Web, and until today there are still many systems using CMS GC. However, the CMS uses the tag-purge algorithm, there is a memory fragmentation problem, it is difficult to avoid the long run and so on in the case of full GC, resulting in a bad pause. Also, since the emphasis is on concurrency (Concurrent), the CMS consumes more CPU resources and competes with the user thread.
In earlier versions of JDK 8, it was the default GC selection for the server mode JVM, also known as
is a throughput-first GC. Its algorithm is similar to the Serial GC, although the implementation is much more complex, characterized by the new generation and the old generation of GC are parallel, in the common server environment more efficient.
The Open options are:
-XX:+UseParallelGC
In addition, the Parallel GC introduces developer-friendly configuration items that allow us to set targets such as pause time or throughput, and the JVM will automatically adapt, such as the following parameters:
-XX:MaxGCPauseMillis=value-XX:GCTimeRatio=N // GC 时间和用户时间比例 = 1 / (N+1)
G1 GC This is a GC implementation that takes into account throughput and pause times, and is the default GC option after Oracle JDK 9. G1 can be intuitive to set the target of the pause time, compared to the CMS GC,G1 may not be able to achieve the best case of CMS delay pause, but the worst situation is much better.
G1 GC still has the concept of the age, but its memory structure is not a simple stripe partition, but similar to a chess region. Region is a copy algorithm, but as a whole, it can actually be regarded as a marker-collation (markcompact) algorithm, which can effectively avoid memory fragmentation, especially when
When the Java heap is very large, the advantages of G1 are even more pronounced.
G1 throughput and pause performance are very good, and are still constantly improving, while the CMS has been marked as obsolete in JDK 9 (deprecated), so the G1 GC is worth your deep mastery.
Collection of principles and basic conceptual object instances for garbage collection
Automatic garbage collection is a prerequisite for knowing which memory can be released. That is, how to determine whether an object can be recycled.
is mainly two aspects, the most important part is the object instance, all is stored on the heap, also is the method area the metadata and so on information, for example the type no longer uses, unloads the Java class seems to be very reasonable.
For object instance collection, there are mainly two basic algorithms, reference counting and accessibility analysis .
The reference counting algorithm , as its name implies, is to add a reference count to the object that records the case where the object is referenced, and if the count is 0, the object is recyclable. This is a resource recycling option for many languages, such as Python, which is more fiery because of AI, and it supports both reference counting and garbage collection mechanisms. Specifically which of the best is to look at the scene, the industry has a large-scale practice of only preserving the reference counting mechanism to improve throughput attempts.
Java does not choose a reference count because it has a basic problem, which is that it is difficult to handle circular reference relationships.
In addition, the accessibility analysis of Java selection, the various reference relationships of Java, to some extent, the accessibility problem is further complicated, this type of garbage collection is often known as a tracking garbage collection (tracing garbage Collection). The simple principle is that the object and its reference relationship as a graph, the selected active object as a GC Roots, and then tracking the reference chain, if an object and GC Roots between the unreachable, that is, there is no reference chain, then can be considered recyclable objects. The JVM will use the virtual machine stack and the objects that are referenced in the local method stack, the object and constants referenced by the static property as GC Roots.
The garbage collection of the method area is more complicated and easy to comb. In general, initialization of class loader-loaded types is not class-unloaded (unload), while common types of uninstallation are often
The corresponding custom ClassLoader itself is recycled, so it is necessary to prevent the metadata area (or the earlier permanent generation) from OOM if the dynamic type is used extensively. In the JDK after 8u40, the following parameters are already the default:
-XX:+ClassUnloadingWithConcurrentMark
Common garbage collection algorithms
Mainly divided into three categories.
- Replication (Copying) algorithm
The new generation GC, which I mentioned earlier, is basically based on the replication algorithm, which copies the live objects to the to region, and the order of the objects is placed in the copy process to avoid fragmentation of memory.
The cost of doing this is that since it is necessary to make a copy, it is necessary to reserve the memory space in advance and to have some waste; In addition, for G1, a GC that splits into a large region, copying instead of moving means that the GC needs to maintain the object reference relationship between the region, which is not a small cost. Whether it's memory consumption or time overhead.
- Tag-Purge (Mark-sweep) algorithm
Mark the work first, identify all the objects to be reclaimed, and then clear them. This is done in addition to the limited efficiency of marking and cleaning processes, and the inevitable fragmentation of the problem, which makes it unsuitable for particularly large heaps; otherwise, the pause time may not be acceptable once the full GC is present.
- Labeling-Finishing (mark-compact)
Similar to tag-purge, but to avoid memory fragmentation, it moves objects during the cleanup process to ensure that the moved objects occupy contiguous memory space.
Note that these are just basic algorithm ideas, the actual GC implementation process is much more complex, and currently in the development of the forefront GC is a composite algorithm, and both parallel and concurrency.
If you are interested in this aspect of the algorithm, you can refer to a more interesting book, "garbage Collection algorithm and implementation", although its content is not around Java garbage collection, but the general algorithm to explain the comparative image.
Garbage collection Process
First, Java applications constantly create objects, usually allocated in the Eden area, and trigger minor GC when their space consumption reaches a certain threshold. Still referenced objects (green squares) survived, copied to the Survivor region selected by the JVM, and the objects not referenced (yellow squares) are recycled. Note that I tagged the "number 1" for the surviving object, which is to indicate the object's time to live.
Second, after a Minor Gc,eden will be idle until the Minor GC trigger condition is reached again, another Survivor zone becomes the to zone, and the live and from zone objects of the Eden zone are copied to the to zone and stored The live age count will be added 1.
Thirdly, a process similar to the second step occurs many times, until the object age count reaches the threshold, when the so-called promotion (Promotion) process occurs, as shown, objects exceeding the threshold are promoted to the old age. This threshold value can be specified by parameter:
-XX:MaxTenuringThreshold=<N>
This is followed by the older GC, depending on the selected GC option, corresponding to the different algorithms. The following is a simple tagging-sorting algorithm process, where useless objects in the old age are purged, and the GC organizes the objects to prevent memory fragmentation.
Usually we called the old GC Major GC, the cleanup of the whole heap is called full GC, but this is not so absolute, because the different old-time GC algorithms actually behave differently, such as CMS, "concurrent" is reflected in the cleanup work running concurrently with the worker thread.
GC is still in rapid development, the current default option G1 GC is constantly improving, many of the shortcomings we originally thought, such as serial full GC, Card Table Scan inefficient, etc., have been greatly improved, for example, after JDK 10, full GC has been running in parallel, in a very Multi-scenario, its performance is also slightly better than the parallel full GC implementation of the Parallel GC.
Even if the Serial GC, although relatively old, but simple design and implementation is not necessarily obsolete, its own overhead, whether it is the cost of GC-related data structures, or the cost of the thread, are very small, so with the rise of cloud computing, in serverless and other new applications, The Serial GC found a new stage.
Unfortunately, the CMS GC, because of the theoretical shortcomings of its algorithm, and so on, although there are still very large user groups, but has been marked as obsolete, if there is no organization actively assume the maintenance of CMS, it is likely to be removed in the future version.
If you are looking at JDK 11, which is still in development, you will find that the JDK has added two new GC methods, namely:
Epsilon GC
Simply put it is a GC that does not do garbage collection, it seems a little strange, in some cases, for example, in the
Performance testing, it may be necessary to clearly determine how much overhead the GC itself generates, which is typical of its application scenario.
Zgc
This is a super GC implementation of Oracle Open source, with surprisingly scalable capabilities, such as support for the T-bytes level heap size, and ensure that in most cases, the latency will not exceed 10ms. Although it is still in the experimental stage and supports only Linux 64-bit platforms, its capabilities and potential are very much anticipated.
Of course, other vendors also offer a variety of unique GC implementations, such as relatively well-known low-latency gc,zing and Shenandoah.
Resources:
27th Lecture | What are the common garbage collectors in Java?
"JVM" JVM garbage collector, garbage collection algorithm, useless objects