Recently learned a Java garbage collection mechanism, the main content of the general summary:
1. What is a garbage collection mechanism
Java GC mechanism (garbage collection, garbage collection, garbage collector), is a Java-specific mechanism, as a Java developer, generally do not need to specifically write memory recycling and garbage cleanup code. This is because there is an automatic memory management and garbage cleanup mechanism in the Java Virtual machine, which saves Java developers a lot of development time.
In a nutshell, this mechanism marks the memory in the JVM (Java Virtual machine) and determines which memory needs to be recycled, automatically reclaims memory according to a certain recycling strategy, and never Stops (Nerver Stop) to ensure that the memory space in the JVM There is a memory leak and overflow problem with the placement.
Here's a look at the recycling strategy for the Java garbage collection mechanism, but first you need to understand the JVM memory model.
2.JVM memory model
① Program Counter: The program counter is a small area of memory that indicates that the current thread executes a byte code that executes to the first line and is understood to be the line number indicator of the current thread. When the bytecode interpreter is working, it removes a statement instruction by changing the value of the counter. Each program counter is used only to record the line number of a thread, so it is thread-private. The popular point is that when the CPU executes to a certain point in a thread, it switches to execute another thread, and when the CPU next cuts back, it continues to execute from where it was last cut. How to determine the location of the last cut, is achieved through the program counter.
② virtual Machine Stack: Each method of a thread executes at the same time, will create a stack frame (statck frame), the stack frame stored in a local variable table, operation station, dynamic link, method exit, etc., when the method is called, stack frame in the JVM stack, when the method execution is complete, stack frame out of the stack.
The local variable table stores the relevant local variables of the method, including various basic data types, object references, return addresses, and so on. In a local variable table, only long and double types occupy 2 local variable spaces (slots, for 32-bit machines, one Slot is 32 bits), and the others are 1 slots. It is important to note that the local variable table is determined at compile time, and the space required for the method to run is fully deterministic in the stack frame and will not change during the lifetime of the method.
Two exceptions are defined in the virtual machine stack, which throws a statckoverflowerror (stack overflow) if the thread call has a stack depth greater than the maximum allowable depth for the virtual machine, but most Java virtual machines allow the size of the virtual machine stack to be dynamically extended (with a small number of fixed lengths). So the thread can always request the stack, knowing that there is not enough memory, at this point, will throw OutOfMemoryError (memory overflow).
Each thread corresponds to a virtual machine stack, so the virtual machine stack is also thread-private.
③ Local method Stack: The local method stack in the role, the operation mechanism, the exception type and so on is the same as the virtual machine stack, the only difference is that the virtual machine stack is the implementation of the Java method, and the local method stack is used to execute the native method, in many virtual machines (such as the Sun's JDK default hotspot virtual machine) , the local method stack is used together with the virtual machine stack.
The local method stack is also thread-private.
④ Heap Area: Heap area is the most important area to understand the Java GC mechanism, not one. In memory managed by the JVM, the heap area is the largest chunk, and the heap area is the main memory area managed by the Java GC mechanism, which is shared by all threads and created when the virtual machine is started. The heap area exists to store object instances , in principle, all objects are allocated memory on the heap (although in modern technology, it is not so absolute, there are directly allocated on the stack)
⑤ Method Area: A method area is a zone shared by each thread that stores the class information that has been loaded by the virtual machine (that is, information that needs to be loaded when the class is loaded, including information such as version, field, method, interface, and so on), final constants, static variables, compiler-compiled code, and so on.
3. Garbage collection Algorithm
How to efficiently perform garbage collection. Because the Java Virtual Machine specification does not explicitly stipulate how to implement a garbage collector, each vendor's virtual machine can implement a garbage collector in different ways, so it is only a discussion of the core ideas of several common garbage collection algorithms.
①.mark-sweep (Mark-Clear) algorithm
This is the most basic garbage collection algorithm, the reason is that it is the most basic because it is the easiest to achieve, the idea is the simplest. The tag-purge algorithm is divided into two stages: the tagging phase and the purge phase. The task of the tagging phase is to mark out all objects that need to be recycled, and the purge phase is to reclaim the space occupied by the tagged objects. The exact process is as follows:
It is easy to see that the tag-purge algorithm is easier to implement, but one of the more serious problems is that it is prone to memory fragmentation, and too many fragments can cause the subsequent process to allocate space for large objects without finding enough space to trigger a new garbage collection action ahead of time.
②.copying (copy) algorithm
In order to solve the defect of mark-sweep algorithm, the copying algorithm is proposed. It divides the available memory by capacity into two blocks of equal size, using only one piece at a time. When this piece of memory is used up, copy the surviving object to another piece, and then clean up the used memory space once, so the memory fragmentation problem is not easy. The exact process is as follows:
This algorithm is simple, efficient, and not prone to memory fragmentation, but it has a high cost of using memory space because it can use less memory than half the original.
Obviously, the efficiency of the copying algorithm is very much related to the number of surviving objects, if there are many surviving objects, then the efficiency of the copying algorithm will be greatly reduced.
③.mark-compact (marker-collation) algorithm
In order to solve the defect of copying algorithm and make full use of memory space, the mark-compact algorithm is proposed. The algorithm marks the same stage as Mark-sweep, but after the token is completed, it does not clean the recyclable object directly, but instead moves the surviving object to one end and then cleans up memory outside the end boundary. The exact process is as follows:
④.generational Collection (generational collection) algorithm
The generational collection algorithm is the algorithm used by most of the JVM's garbage collectors today. Its core idea is to divide the memory into several different regions based on the life cycle of the object's survival. In general, the heap zoning is divided into the old age (tenured Generation) and the New Generation (young Generation), the characteristics of the old age is that each garbage collection only a small number of objects need to be recycled, and the new generation is characterized by a large number of objects to be recycled each time the garbage collected, Then we can take the most suitable collection algorithm according to the characteristics of different generations.
New Generation (young generation): The vast majority of newly created objects are assigned here, and since most objects become inaccessible soon after they are created, many objects are created in the Cenozoic and then disappear. The process by which objects disappear from this area is what we call "minor GC".
Old generation: objects have not become unreachable and survived from the Cenozoic and are copied here. It occupies more space than the Cenozoic. Because of its relatively large space, the GC that occurs in the old age is much less than that of the Cenozoic. The process of disappearing an object from the old age, which we call "major GC" (or "full GC")
At present, most of the garbage collectors take the copying algorithm for the new generation, because each garbage collection in the Cenozoic has to reclaim most of the objects, that is, the number of operations that need to replicate is less, but the actual is not in accordance with the ratio of 1:1 to divide the new generation of space, In general, the Cenozoic is divided into a larger Eden space and two smaller survivor space, each time using Eden space and one of the survivor space, when recycling, Copy objects that are still alive in Eden and survivor to another survivor space, and then clean up Eden and the survivor space you just used.
Because of the characteristics of the old age is that each recycling only a small number of objects, the general use of the mark-compact algorithm.
Note that there is another generation outside the heap that is the permanent generation (permanet Generation), which is used to store class classes, constants, method descriptions, and so on. The recovery of the permanent generation mainly recycles two parts: obsolete constants and useless classes.
4. Typical garbage collector
JDK7 there are altogether 5 GC types:
Serial GC
Parallel GC
Parallel Old GC (Parallel compacting GC)
Concurrent Mark & Sweep GC (or "CMS")
Garbage first (G1) GC
Where the Serial GC should not be used on the server. This GC type exists in the desktop era of a single-core CPU. Using the serial GC can significantly reduce your application's performance metrics.
Now, let's learn each GC type together.
①. Serial GC (-XX:+USESERIALGC)
The GC mode of the Cenozoic space we have already introduced, in the old-age space of the GC to take the algorithm called "Mark-sweep-compact".
The first step of the algorithm is to mark the surviving objects in the old age. Tag
The second step is to check the heap memory space from the beginning and leave only the objects that are still surviving. Clean
The final step, starting from the beginning, fills the heap memory space sequentially, and divides the memory space into two parts: one holds the object and the other is empty (compressed).
②. Parallel GC (-XX:+USEPARALLELGC)
Figure 1:serial the difference between GC and Parallel GC
From there, you can easily see the difference between serial GC and parallel GC, serial GC uses only one thread to perform GC, and parallel GC uses multiple threads, so parallel GC is more efficient. This GC is useful in memory-rich and multicore situations, so we also call it "throughput GC".
③. Parallel old GC (-XX:+USEPARALLELOLDGC)
The Parallel old GC appears after JDK5. Compared to parallel GC, the only difference is the GC algorithm for the old age. The Parallel old GC is divided into three steps: Tag-summarize-compress (mark–summary–compaction). The summary (summary) step differs from Cleanup (sweep) in that it distributes the surviving objects to different areas of the GC that are pre-processed, and the algorithm is slightly more complex than cleanup.
④. CMS GC (-XX:+USECONCMARKSWEEPGC)
Figure 2:serial GC & CMS GC
As you can see, the CMS GC is much more complex than the various algorithms I explained earlier. The first step of initializing the tag (initial mark) is simpler. This step simply looks for those surviving objects that are closest to the ClassLoader. Therefore, the time to pause is very short. After the parallel tag (concurrent mark) step, all objects referenced by the surviving object are confirmed to have been traced and verified. This step differs in that the other threads are still executing during the tagging process. At the re-tagging (remark) step, the objects referenced by the surviving object that were added or deleted in the parallel tag step are checked again. Finally, in the parallel exchange (concurrent sweep) step, the garbage collection process is forwarded. Garbage collection work is performed during the execution of other threads. Once this GC type is taken, the pause time caused by the GC can be extremely short. The CMS GC is also known as a low latency GC. It is often used in applications where response times are demanding.
Of course, this GC type has the advantage of having a short stop-the-world time, as well as the following drawbacks:
You need to think carefully before using this GC type. If the compression task has to be performed because of too much memory fragmentation, the Stop-the-world takes longer than any other GC type, and you need to consider how often the compression task occurs and how long it will take to execute.
⑤. G1 GC
Finally, let's learn the garbage collection first (G1) GC type.
Figure 3:G1 The structure of the GC
If you want to understand G1, first you have to forget the concepts of the new generation and the old age you have learned. As you can see, each object is assigned to a different lattice, which is then performed by the GC. When one area fills up, the object is assigned to another area and the GC is executed. There are no more than three steps to move from the Cenozoic to the old age. This type is created to replace the CMS GC because the CMS GC has many problems when it continues to function for a long time.
Reference article: http://www.cnblogs.com/hnrainll/archive/2013/11/06/3410042.html
Http://www.cnblogs.com/dolphin0520/p/3783345.html
Java garbage Collection Mechanism learning summary