"In-depth understanding of Java Virtual machines" garbage collection mechanism

Source: Internet
Author: User

This article is from the "in-depth understanding of Java Virtual Machine," a book, it is very recommended that you look at this book. Other articles in this series:
"In- Depth understanding of Java virtual machines" Java memory area model, object creation process, common oom

1, garbage collection to solve the problemGarbage collection (garbage COLLECTION,GC), to design a GC, the following three things need to be considered: (1) What memory needs to be recycled? (2) When to recycle? (3) How to recycle?
what memory needs to be recycled? According toJava Memory Area model, object creation process, common oomin the Java memory model described in, where the program counters, virtual machine stacks, the local method stack 3 regions are born with threads, the stack frames in the stack execute the stack and stack operations methodically as the method enters and exits . The amount of memory allocated in each stack frame is basically known at the time the class structure is determined, so there is certainty in memory allocation and recycling in these areas, so there is no need to think too much about recycling , because when the method ends or the thread ends, the memory is naturally recycled. for the Java heap and the method area, the Java heap is the place where the instance objects are stored, and we only know what objects will be created while the program is running, and this part of the memory allocation and recycling is dynamic, so the garbage collector is concerned about this part. for a method area (or a permanent generation in a hotspot virtual machine), garbage collection is primarily about reclaiming the two parts: obsolete constants and useless classes . For obsolete constants, it is mainly to judge whether there is any object reference in the current system, and for the useless classes, the following three conditions are required:(1) All instances of the class have been recycled, i.e. no force exists in the heap;(2) The ClassLoader that loaded the class have been recycled;(3) The Java.lang.Class object corresponding to the class is not referenced anywhere, and the method of accessing the class can no longer be accessed anywhere by reflection;satisfying the above three conditions is also only "can" be recycled, but also according to some of the configuration parameters of the hotspot integrated consideration.
When do I recycle? before the garbage collector recycles the heap, the first thing to do is to determine which of these objects is "alive", which has been "dead", and we need to recycle those objects that have "died". An algorithm that determines whether an object survives : (1) Reference counting algorithmthe algorithm process is as follows: "Add a reference counter to the object, and whenever there is a place to reference it, the counter value is incremented by 1, and when the reference fails, the counter value is reduced by 1; The object with counter 0 at any time is impossible to use again." The reference counting algorithm is simple, and the decision efficiency is very high, most of the cases is a good algorithm. But there is one more important drawback:It is difficult to solve the problem of circular references between objects。 For example, J assumes that the variable obja, OBJB is an object instance of a class, Obja holds a member that points to OBJB, at which time the reference count of OBJB is 1, and in OBJB a member that points to Obja, Obja has a reference count value of 1; At this point, even if the obja, The OBJB is set to null, at which point two objects cannot be reclaimed because the two objects are null, but their reference count values are also 1.
(2) accessibility analysis algorithmThe current mainstream virtual machines, such as the Java default virtual machine hotspot, are used in this way. The basic idea of the algorithm is: "ThroughA rangeThe object called "GC Roots" as the starting point, starting from these nodes to search down, the path that the search traversed is called the reference chain, when an object to the GC Roots without anyWhen a reference chain is connected (or from a GC roots to an object that is unreachable), it proves that the object is not available. " The objects that can be used as GC roots include: 1) The object referenced in the virtual machine stack (local variable table in the stack frame), 2) The object referenced by the class static static property in the method area, and 3) the object referenced by the constant final in the method area ; 4) The object referenced by JNI (that is, generally referred to as the native method) in the local method stack;
It is important to note that even objects unreachable in the accessibility analysis algorithm are not "immortal", and to truly declare an object dead, at least to go through Two playsTagging process: If the object discovers that there is no reference chain connected to the GC roots after the accessibility analysis, itwill be marked for the first timeAndto filter once, the criteria for filtering are whether this object is necessary to perform a finalize () method。 WhenObject does not overwrite the Finalize () method Orthe Finalize () method has been called by the virtual machine (that is, the Finalize () method of the object can only be called once), the virtual machine treats both cases as "no need to execute". If this object is determined to beIt is necessaryExecute the Finalize () method, then this object will be placed in a queue called F-queue, and later by a virtual machine automatically established by a low-priority finalizer thread to execute it (that is, to execute the object's Finalize () method, where the so-called "execution"    is a value virtual opportunity to trigger this method, but not to wait for it to run to the end, primarily to prevent the object's finalize method from performing slowly or looping, causing other objects to fail, causing the memory recycling system to crash. The Finalize () method is the last chance for an object to escape the fate of death, and later the GC will make a second small-scale mark on the object in F-queue, if the object is to successfully save itself in Finalize ()-just need to re-associate with any object on the reference chain. For example, you assign yourself (this) to a class variable or to a member variable of an object, which is removed from the collection that is "about to be recycled" at the second mark, and if the object has not escaped at this time, it is basically recycled.
Thus, the process for determining the true death of an unreachable object is summarized as follows: (1) The GC makes the first tag and filters it once (filters those that overwrite the Finalize method and the Finalize method is First time call;--> (2) Another low-priority thread to call the Finalize method of the filtered object;--> (3) GC for the second time, if in the previous step those filtered objects did not save themselves in finalize, at this point, Those that have not been filtered and those that have been filtered but have not saved themselves will be recycled.
2. Garbage collection Algorithm2.1 Mark-ClearIs the most basic of a collection algorithm. It is divided into "mark" and "clear" two stages: first, all objects that need to be reclaimed are marked, and all tagged objects are collected uniformly after the mark is complete. The tagging process is the two-time tagging process described above in the Accessibility analysis algorithm. The tag-purge algorithm executes as shown: Pre-recycle State:
Post-Recycle Status:
Disadvantages:(1)Efficiency issues:The two processes marked and cleared are not of high efficiency; (2)Space issues:A large number of discontinuous memory fragments are generated after the tag is cleared, and too much space fragmentation can result in the need to allocate large objects later, and the inability to find enough contiguous memory to start another garbage collection action in advance;
2.2 Copy AlgorithmIn order to solve the efficiency problem of the above algorithm, the replication algorithm appears. It divides the available memory by capacityEqual sizeTwo blocks, using only one piece at a time. When this piece of memory is used, the surviving object is copied to the other piece, and then the used memory space is cleaned out once. of the replication algorithmAdvantages: (1) Every time is rightthe entire half of the areaMemory recovery, easy to operate, and efficient, (2) in the memory of the block memory allocation, do not consider the problem of memory fragmentation, as long as the mobile heap top pointer, sequentially allocated memory;Disadvantages: Reduce the memory to half the original, at a higher cost.
The replication algorithm executes as follows: The state before the recycle:
Post-recovery status:

According to the characteristics of the Cenozoic, the new generation of object 98% is "Facing the death of the Dead", therefore, can improve the above replication algorithm, the current commercial virtual machine is using this improved collection algorithm to Recovery of the new generation . Improved collection algorithm :According to the characteristics of the new generation, we do not need to divide the memory space according to the 1:1 ratio, but the memorydivided into a larger Eden space and two smaller survivor spaces,each time you use Eden and one of the survivor。 When recycled, the objects that are still alive in Eden and survivor are copied one at a time into another survivor space, finally clearing out Eden and the survivor space just used, after the cleanup is complete, The newly cleaned Eden and another piece of survivor space, which was put into the surviving object when recycled, used memory, and the freshly cleaned survivor as a reserved space for later use for recycling. This improved collection algorithm also has a problem, that is, in the recycling, that empty survivor space can be placed under Eden and the use of the survivor space still exist objects, if the survivor space is not enough to store the last generation of the surviving objects collected, this time need to the old age " To borrow "memory, those remaining objects that are not laid down are passedallocation of security mechanismsinto the old age.
2.3 Labeling-Sorting algorithmReplication algorithm if the object survival rate is high, you need to do more than the replication operation, the efficiency will be reduced. For the old age, the general survival rate is higher, so we need to choose Other collection algorithm: marker-collation algorithm. The tagging process is still the same as in the tag-purge algorithm, but does not clean the recyclable objects directly after the tag is complete, but instead allows all surviving objects to move toward one end and then directly cleans up memory outside the end boundary. The algorithm is as follows: pre-recovery state;
Post-Recycle Status:

2.4 Generation of collection algorithmsThe current commercial virtual machine uses this "generational collection" algorithm (Generation Collection), which divides the memory into several blocks according to the different life cycle of the object, generally divides the Java heap into the new generation and the old age, chooses the different collection algorithm according to each age characteristic. In the Cenozoic, each garbage collection found that a large number of objects died, only a small number of survival, so you can choose the "Replication Algorithm", at this time only need to pay a small number of survival objects of the cost of replication; for the old age, because the object has a higher survival rate and no additional space for the allocation of security, you must use the " Tag-organize "algorithm for recycling.
3 garbage collectorIf the collection algorithm described above is a memory recovery methodology, then the garbage collector is the specific implementation of memory recovery, according to the above, the current garbage collector is basically a collection of generations, so A garbage collector generally has a variety of garbage collection algorithms。 There are also significant differences in the garbage collectors provided by different virtual machines, as follows: The hotspot virtual machine is based on all the garbage collectors that are included in the JDK1.7 version.
Hotspot has 7 different garbage collectors, if there is a connection between the two collectors, indicating that they can be used together, wherein, Serial, parnew, Parallel scavenge belong to the new generation collector, CMS, Serial old, Parallel Old is the oldest collector, G1 is the newest collector, which can be used in the new generation and the old age.
3.1Serial (serial) collectorThe most basic and longest-growing collector. Look at the name to know, this collector is a single-threaded collector, using only one CPU or a collection thread to complete the garbage collection work, most importantly, inwhen it is garbage collected, all other worker threads must be paused, knowing that it collects the end。 Although there is this shortcoming, butis still the default Cenozoic collector that the virtual machine runs in client mode。 The advantages are:simple and efficient, without the overhead of thread interaction。 Run process
The new generation adopts the "Replication algorithm", and the old age uses the "marker-collation" algorithm.
3.2parnew CollectorThe Parnew collector is actuallyis a multithreaded version of the serial collectorApart fromusing more than one thread In addition to garbage collection, other behaviors are the same as serial collectors. Parnew is the preferred new generation collector in many virtual machines running in server mode, one of the important reasons for performance-independent, except for the serial collector, only parnew can be used with the old-age CMS collector.Parnew is a parallel collector。 In garbage collection, parallelism means that multiple garbage collecting threads work in parallel, the user thread is waiting, and concurrency means that both the user thread and the garbage collection thread execute simultaneously (not necessarily parallel, possibly alternately).
3.3Parallel Scavenge collectorThe Parallel scavenge collector is usingReplication AlgorithmsAlsoa parallel multi-threaded collector。 Similar to Parnew, but the focus of the CMS collector is different from that of the parallel scavenge, which is to minimize the downtime of the user thread at garbage collection, while the parallel scavenge collector's goal is to achieve a controlled throughput of throughput = Run user code time/(run user code time + garbage collection time).
The above three types are new generation collectors, the following is the old age collector.
3.4Serial Old collectorThe Serial old collector is an older version of the new generation Serial collector, and is also a single-threaded collector that uses"Mark-and-organize" algorithm, the main meaning of Serial old is to use the virtual machine in client mode.
3.5Parallel Old collectorParallel old is the older version of the new generation collector Prarllel scavenge, using multithreading and the "mark-and-organize" algorithm. The running process is as follows:

3.6CMS collectorThe CMS (Concurrent Mark Sweep) collector is a collector that targets the shortest recovery pause time. For the Internet station or b/s system of this attention to response speed of the service side, CMS is a good choice. As can be seen from the name Mark Sweep, the CMS is implemented based on the "tag-purge" algorithm, divided into four steps: (1) The initial tag (CMS initial mark): Just mark the object that a GC roots can directly relate to, this step requires "Stop the World" ; (2) Concurrent tagging (CMS concurrent mark): is the GC roots for the accessibility analysis phase, which can be executed concurrently; (3) Re-tagging (CMS remark): Fixed the part of the object that changed during the concurrency tag, this step requires "stop the World "; (4) Concurrent purge (CMS concurrent sweep): performs the purge phase. The execution process is as follows:
As you can see, both the initial and the re-tagging phases are parallel and require the user thread to be paused (the process is short), and concurrent in the concurrency token and concurrent purge phases, which can work with the user thread.
Benefits of CMS: concurrent collection, low pauses。 The disadvantage of the CMS: (1) The CPU resource is very sensitive, the common problem facing concurrent design program, although not cause the user thread to pause, but will reduce the throughput rate, (2) cannot clean "floating garbage", because the CMS concurrent cleanup phase user thread is still running, With the program run naturally there will be new garbage constantly appearing, this part of the garbage in the marking process, the CMS can not dispose of them in the secondary collection, it is left to the next GC, (3) will generate a lot of space debris, because the CMS is based on the "tag-clear" algorithm, The biggest drawback of this algorithm is that it generates a lot of space debris, causing problems in allocating large objects and having to trigger full GC ahead of time. To solve this problem, the CMS provides a "-xx:+usecmscompaceatfullcollection" switch parameter (enabled by default) formerge process to turn on memory fragmentation when the CMS collector is not up to full GC。
3.7G1 CollectorThe G1 Collector is the newest collector, JDK1.7 is released, is a service-oriented application of the garbage collector, has the following features: (1) Parallel and Concurrency:G1 can make full use of the hardware advantage in multi-CPU and multi-core environment, and use multiple CPUs (CPU or CPU core) to shorten the time of Stop-the-world pause;(2) Generation of collection:generational concepts remain in the G1. Although G1 can manage the entire GC heap independently of the other collector mates, it can handle the newly created objects in different ways and the old objects that have survived the GC for a period of time to get better results;(3) Spatial integration: Unlike CMS's "mark-and-clean" algorithm, G1 is a collector based on the " mark-and-sweep " algorithm, and is based on "Replication" from a local (two region) perspective. "algorithm implementation, in any case, both of these algorithms mean that the G1 operation will not generate memory space fragmentation, collected to provide regular free memory;(4) a predictable pause time;
The memory layout of the Java heap differs greatly from that of other collectors when using the G1 collector, which divides the entire Java heap into separate, equal-sized regions (region), although it retains the concept of the Cenozoic and the older generation, But the new generation and the old age are no longer physically isolated, they are part of the region (does not need continuous) collection.

The G1 collection process is divided into the following steps: (1)initial tag (Initial marking)(2)concurrency token (Concurrent marking)(3) final mark (final marking) (4) filter recovery (Live Data counting and evacuation) There are many similarities between the first few steps and the CMS. Run as follows:

(in slices from: http://blog.csdn.net/ns_code/article/details/18076173
http://blog.csdn.net/zq602316498/article/details/38757423




"In-depth understanding of Java Virtual machines" garbage collection mechanism

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.