Description:
In this paper, Garbage Collection is translated as "garbage collection", garbage collector translated as "garbage collector";
It is generally recognized that garbage collection and garbage collecting are synonyms.
Minor GC translates as: small GC; Rather than a secondary GC
Major GC translates as: large GC; Rather than the primary GC
The reason is that, in most cases, the number of Minor GC occurrences in the younger generation is many, and the translation of the secondary GC is obviously incorrect.
Full GC Translation: complete GC; For clarity, the general direct translation to full GC, the reader can understand; The large GC is similar to a complete GC, and these terms are derived from the official analysis tools and garbage collection logs. Not very uniform. 1. Introduction to garbage collection
As the name suggests, garbage collection (garbage Collection) means--find the rubbish and clean it up. But the existing garbage collection implementation is exactly the opposite: the garbage collector tracks all the objects being used and treats the rest as garbage . With this in mind, let's delve into the principle of automatic memory recycling and explore the implementation of garbage collection in the JVM.
We do not dig into the details, first from the foundation, introduce the general characteristics of garbage collection, core concepts and implementation algorithms.
Disclaimer : This article mainly explains the behavior of Oracle Hotspot and OpenJDK. Other JVMs, such as JRockit or IBM J9, may be slightly different in some ways. Manual memory Management (Manual Memory Management)
Today's automated garbage collection algorithms are extremely advanced, but let's take a look at what manual memory management is. At that time, if you want to store shared data, you must explicitly do memory allocation (allocate) and memory release (free). If you forget to release it, the corresponding block of memory cannot be used again. Memory is always occupied, but is no longer used, which is called a memory leak (memory leak).
The following is a sample program for manually managing memory in C language:
int Send_request () {
size_t n = read_size ();
int *elements = malloc (n * sizeof (int));
if (read_elements (n, elements) < n) {
//elements not freed!
return-1;
}
... Free (elements) return
0;
}
As you can see, if the program is very long, or the structure is more complex, you will probably forget to release the memory. Memory leaks were once a very common problem and can only be solved by fixing the code. Therefore, the industry is eager to have a better way to automatically recycle memory that is no longer in use, completely eliminate possible human error. This automatic mechanism is called garbage collection (garbage Collection, or GC). Intelligent Pointer (smart pointers)
The first generation of automatic garbage collection algorithms use reference counting (reference counting). For each object, just remember the number of references, and when the reference count becomes 0 o'clock, the object can be safely recycled (reclaimed). A notable example is the shared pointers of C + +:
int Send_request () {
size_t n = read_size ();
shared_ptr<vector<int>> Elements
= make_shared<vector<int>> ();
if (read_elements (n, elements) < n) {
return-1;
}
return 0;
}
SHARED_PTR is used to track the number of references. When passed as a parameter, this number is added to 1, minus 1 when leaving the scope. When the reference count changes to 0 o'clock, shared_ptr automatically deletes the underlying vector. It should be pointed out to the reader that this approach is not common in actual programming and is used only for demonstration purposes. Automatic memory management (automated Memory Management)
In the C + + code above, we explicitly declare when memory management is required. But can't all the objects have this trait? That's convenient, and developers are no longer consuming brain cells to think about where to clean up the memory. The runtime environment automatically calculates which memory is no longer in use and releases it. In other words, automatic collection of garbage . The first garbage collector was developed in 1959 for the Lisp language, and Lisp's garbage collection technology has been at the forefront of the industry since then. reference count (Reference counting)
The C + + shared pointer method that you just demonstrated can be applied to all objects. Many languages use this approach, including Perl, Python, and PHP. The following illustration shows this way well:
The Green Cloud (GC ROOTS) in the figure represents the object that the program is using. Technically, these may be local variables in the currently executing method, or a class of static variables. In some programming languages, it may be called a different name, and there is no need to pick a noun.
The blue circle represents the object that can be referenced, and the number inside is the reference count. Then, the gray circles are objects that are no longer referenced by each scope. Gray objects are considered rubbish and can be cleaned up by the garbage collector at any time.
It looks great, doesn't it? But there's a big hole in this way, and it's easy to get killed by a circular reference (detached cycle). There are no references to these objects in any scope, but because of circular references, the reference count is always greater than 0. As shown in the following illustration:
Did you see it? Red objects actually belong to garbage. However, because of the limitation of reference count, there is a memory leak.
Of course there are ways to deal with this situation, such as " Weak references " (' weak ' references), or use another algorithm to troubleshoot circular references. The aforementioned languages, such as Perl, Python, and PHP, use certain methods to solve circular reference problems, but this article does not discuss them. The following describes the garbage collection methods used in the JVM. Mark-Clear (Mark and Sweep)
First, the JVM clearly defines what is the object's accessibility (reachability). This is a vague definition of the green cloud we mentioned earlier, and there is a very specific object in the JVM called the garbage collection root element (garbage Collection Roots), which includes: local variable (locally variables) active thread ( Active threads) static domain (static fields) JNI reference (JNI references) other objects (described later ...)
The JVM uses the tag-purge algorithm (Mark and Sweep algorithm) to track all the accessible objects (that is, the surviving object), ensuring that all of the unreachable objects (Non-reachable objects) are reused for memory. It consists of two steps:
Marking (TAG): Iterates through all the accessible objects and native them in local memory.
sweeping (clear): This step ensures that the memory occupied by an unreachable object can be reused after memory allocation.
The JVM contains a variety of GC algorithms, such as Parallel scavenge (parallel cleanup), Parallel mark+copy (parallel markup + replication), and CMS, which are slightly different in their implementation, but in theory they use the above two steps.
The most important advantage of the tagging elimination algorithm is that memory leaks are no longer caused by circular references:
The bad part is that in the garbage collection process, all threads of the application need to be paused. If not paused, then the reference relationship between the objects will constantly change, so there is no statistics. This situation is called the STW pause (Stop the Worldpause, the whole line paused), let the application temporarily stop, let the JVM for memory cleanup work. There are many reasons to trigger STW pauses, in which garbage collection is the main factor.
In this manual, we will describe the implementation of garbage collection in the JVM and how to efficiently utilize GC.
Please continue reading the next chapter: 2. Garbage collection in Java-GC reference manual
Original link: What is garbage Collection?
Translator: Anchor Http://blog.csdn.net/renfufei
Translation time: October 26, 2015