Overview
Speaking of garbage collection (garbage collection,gc), most people regard this technique as the companion product of the Java language. In fact, the history of GC is far greater than that of Java, and the 1960 birth of Lisp in MIT was the first language to actually use memory dynamic allocation and garbage collection techniques. When Lisp is still in embryo, people are thinking:
Three things the GC needs to do:
what memory needs to be recycled?
When do I recycle?
How to recycle?
After half a century of development, the dynamic allocation of memory and memory recovery technology is quite mature, everything seems to enter the "Automation" era, then why do we have to understand the GC and memory allocation? The answer is simple: when you need to troubleshoot various memory overflows, memory leaks, and when garbage collection becomes a bottleneck for the system to reach higher concurrency, we need to implement the necessary monitoring and tuning of these "automated" technologies.
Take the time from half a century ago back to the present, back to our familiar Java language. The 2nd chapter introduces the various parts of the Java Memory Runtime area, where the program counter, virtual machine stack, local method stack three regions are born with threads, and the stack frames in the stack execute the stack and stack operations methodically as the method enters and exits. How much memory is allocated in each stack frame is basically known when the class structure is determined (although some optimizations are made by the JIT compiler during the run-time, but in the discussion of the conceptual model in this chapter, which is generally considered to be a compile-time view), memory allocation and recycling in these areas are determined , There is no need to think too much about recycling in these areas, because when the method ends or the thread ends, the memory is naturally recycled. While the Java heap and the method area are different, multiple implementation classes in an interface may require different memory, and multiple branches in one method may require different memory, we can only know when the program is running, which objects will be created, this part of the memory allocation and recycling is dynamic, The garbage collector is concerned with this part of memory, and the "Memory" allocation and recycling in the subsequent discussion of this book refer only to this part of memory.
Object is dead?
The heap contains almost all of the object instances in the Java world, and the first thing the garbage collector can do before it recycles the heap is to determine which of these objects are "alive" and which are "dead" (that is, objects that can no longer be used by any means).
Reference counting algorithm
Many textbooks determine whether an object survives the algorithm is this: Add a reference counter to the object, whenever there is a place to reference it, the counter value is added 1, when the reference fails, the counter value is reduced by 1, any time the counter is 0 of the object is impossible to be used again. I interviewed a lot of fresh students and some of the developers have many years of work experience, they give this question is the answer.
Objectively speaking, the reference counting algorithm (Reference counting) implementation is simple, the decision efficiency is very high, in most cases it is a good algorithm, there are some more famous application cases, such as Microsoft's COM (Component Object Model) technology, The reference counting algorithm is used for memory management in the Flashplayer, Python language, and squirrel, which are widely used in the game scripting world, using ActionScript 3. However, theJava language does not use the reference counting algorithm to manage memory, the most important of which is that it is difficult to solve the problem of mutual circular references between objects.
For a simple example, look at the TESTGC () method in the following code: the object Obja and OBJB have field instance, the assignment makes obja.instance = objb and objb.instance = Obja, besides, the two objects have no references, In fact, these two objects are no longer accessible, but because they reference each other, resulting in their reference count is not 0, so the reference counting algorithm Cannot notify the GC collector to reclaim them. The code looks like this: the flaw of the reference counting algorithm
1 Public classREFERENCECOUNTINGGC {2 Public Static voidMain (string[] args) {3 TESTGC ();4 }5 PublicObject instance =NULL;6 Private Static Final int_1mb=1024*1024;7 /**8 * The only meaning of this member property is to occupy a bit of memory so that it can be seen in the GC log if it is recycled9 */Ten Private byte[] Bigsize =New byte[2 *_1MB]; One Public Static voidTESTGC () { AREFERENCECOUNTINGGC obja=NewREFERENCECOUNTINGGC (); -REFERENCECOUNTINGGC objb=NewREFERENCECOUNTINGGC (); -Obja.instance=OBJB; theObjb.instance=Obja; -Obja=NULL; -objb=NULL; - //assuming GC is occurring on this line, can obja and objb be recycled? + System.GC (); - } +}
Operation Result:
[Full GC (System) [tenured:0k->210k (10240K), 0.0149142 secs]4603k->210k(19456K), [perm:2999k->2999k (21248K)], 0.0150007 secs] [times:user=0.01 sys=0.00, real=0.02 secs] Heap defNewGeneration Total 9216K, used 82K [0x00000000055e0000, 0x0000000005fe0000, 0x0000000005fe0000) Eden space 8192K, 1% used [0 x00000000055e0000, 0x00000000055f4850,0x0000000005de0000) from space 1024K, 0% used [0x0000000005de0000, 0x0000000005de0000, 0x0000000005ee0000) to space 1024K, 0% used [0x0000000005ee0000, 0x0000000005ee0000, 0x0000000005fe0000) tenured generation total 10240K, used 210K [0x0000000005fe0000, 0x00000000069e0000, 0x00000000069e0000) The space 10240K, 2% used [0x0000000005fe0000, 0x0000000006014a18, 0X0000000006014C00, 0x00000000069e0000) Compacting Perm gen Total 21248K, used 3016K [0x00000000069e0000, 0x0000000007ea0000, 0x000000000bde0000) The space 21248K, 14% used [0x00000000069e0000, 0x0000000006cd2398, 0x0000000006cd2400, 0x0000000007ea0000) No GKFX spaces configured.
It is clear from the running results that the GC log contains "4603k->210k", which means that the virtual machine does not reclaim them because the two objects are referencing each other, which also indicates from the side that the virtual machine does not determine whether the object survives by reference counting algorithms.
Root Search algorithm
In the mainstream of commercial programming languages (Java and C #, even the old Lisp mentioned earlier), the root search algorithm (GC Roots tracing) is used to determine whether an object survives . The basic idea of this algorithm is to use a series of objects called "GC Roots" as the starting point, starting from these nodes to search down, the path of the search is called the reference chain (Reference Chain), when an object to the GC Roots no reference chain is connected (in the case of graph theory, from GC roots to the object unreachable), it proves that this object is not available. As shown in 3-1, objects 5, Object 6, and object7 are associated with each other, but they are not accessible to GC roots, so they will be judged as recyclable objects.
In the Java language, the objects that can be used as GC roots include the following:
The referenced object in the virtual machine stack (the local variable table in the stack frame).
The object referenced by the class static property in the method area.
The object referenced by a constant in the method area.
The object referenced by the JNI (that is, the generally said native method) in the local method stack.
Talk about references again
Whether the reference count of the object is judged by the reference counting algorithm or whether the reference chain of the object can be reached by the root search algorithm, it is related to the reference to determine whether the object is alive. Prior to JDK 1.2, the definition of references in Java was traditional: if the value stored in the data of the reference type represents the starting address of another piece of memory, it is said that this memory represents a reference. This definition is pure, but too narrow, an object in this definition is only referenced or not referenced in two states, for how to describe some "tasteless, discard" object is powerless. we want to describe a class of objects: when memory when space is sufficient, it can be kept in memory, and if it is still very tense after garbage collection, you can discard those objects. many of the system's caching capabilities are consistent with this scenario.
After JDK 1.2, Java extends the concept of references into strong references (strong Reference), soft references (Soft Reference), weak references (WeakReference), Virtual reference (Phantom Reference) Four kinds , these four kinds of reference intensity gradually weaken successively.
A strong reference is a common reference in program code, such as "Object obj = new Object ()", as long as a strong reference exists, and the garbage collector never reclaims the referenced object .
soft references are used to describe some objects that are also useful, but are not required. for objects associated with soft references, these objects are listed in the collection scope and recycled a second time before the system will have a memory overflow exception. If this collection still does not have enough memory, the memory overflow exception will be thrown. After JDK 1.2, the SoftReference class was provided to implement soft references.
A weak reference is also used to describe a non-required object, but its strength is weaker than a soft reference, and the object associated with the weak reference only survives until the next garbage collection occurs. When the garbage collector is working, the objects associated with a weak reference are reclaimed regardless of whether the current memory is sufficient. After JDK 1.2, the WeakReference class was provided to implement weak references.
A virtual reference, also known as a phantom reference or phantom Reference, is the weakest reference relationship. Whether an object has a virtual reference exists, does not affect its lifetime at all, and cannot obtain an object instance through a virtual reference. The only purpose of setting a virtual reference association for an object is to expect to receive a system notification when the object is reclaimed by the collector. After JDK 1.2, the Phantomreference class is provided to implement the virtual reference.
To survive or to die?
Objects that are unreachable in the root search algorithm are not "dead", and they are temporarily in the "probation" stage to actually declare an object to die, at least two times to go through the tagging process: If the object finds no reference chain connected to the GC roots after the root search, Then it will be marked for the first time and filtered to see if it is necessary for this object to execute the Finalize () method. When the object does not overwrite the Finalize () method, or the Finalize () method has been called by the virtual machine, the virtual machine treats both cases as "no need to execute".
If the object is judged to be necessary to execute the Finalize () method, then the object will be placed in a queue named F-queue, and then executed by a low-priority finalizer thread that is automatically created by the virtual machine at a later time. The so-called "execution" here refers to the virtual opportunity to trigger this method, but does not promise to wait for it to run over. The reason for this is that if an object executes slowly in the Finalize () method, or if a dead loop (more extreme) occurs, it is likely to cause other objects in the F-queue queue to be permanently waiting, or even to crash the entire memory-recycling system. The Finalize () method is the last chance for an object to escape the fate of death, and later the GC will make a second small-scale mark on the object in the F-queue, if the object is to successfully save itself in Finalize ()-just re-associate with any object on the reference chain. For example, assigning yourself (the This keyword) to a class variable or a member variable of an object, it will be removed from the collection that is "about to be recycled" at the second mark, and if the object has not escaped at this time, it is really not far from dead. From code listing 3-2 we can see that the Finalize () of an object is executed, but it can still survive.
As you can see from the running results of code listing 3-2, the Finalize () method of the Save_hook object was actually triggered by the GC collector and escaped successfully before being collected.
1 Public classFINALIZEESCAPEGC {2 Public StaticFINALIZEESCAPEGC Save_hook =NULL;3 Public voidisAlive () {4System.out.println ("Yes,i am still alive!");5 }6 protected voidFinalize ()throwsThrowable {7Save_hook =NewFINALIZEESCAPEGC ();8 //object for the first time to successfully save himself9Save_hook =NULL;Ten System.GC (); One //because the finalizer method has a low priority, pause for 0.5 seconds to wait for it AThread.Sleep (500); - if(Save_hook! =NULL){ - save_hook.isalive (); the}Else{ -System.out.println ("No,i am dead!"); - } - //The following code is exactly the same as above, but this time the rescue failed. +Save_hook =NULL; - System.GC (); + //because the finalizer method has a low priority, pause for 0.5 seconds to wait for it AThread.Sleep (500); at if(Save_hook! =NULL){ - save_hook.isalive (); -}Else{ -System.out.println ("No,i am dead!"); - } - } in}
Operation Result:
1 Finalize Method executed! 2 Yes, I am still alive! 3 No, I am dead!
Another notable point is that the code has two pieces of exactly the same code fragment, the result of execution is a successful escape, a failure, because the Finalize () method of any object will only be automatically called by the system once, if the object faces the next recovery, its finalize () The method will not be executed again, so the second code's self-help operation failed. In particular, the above description of the Finalize () method of the object's death may have tragic artistic overtones, and I do not encourage you to use this method to save the object. Instead, I recommend that you try to avoid using it, because it is not a destructor in C + +, but a compromise that is made easier for C + + programmers to accept when Java is first born. It is expensive, uncertain, and does not guarantee the sequence of calls to individual objects. Some textbooks mention that it is suitable for work such as "Close external resources", which is a kind of self-consolation for the use of this method. Finalize () can do all the work, use try-finally or other methods can do better, more timely, you can forget the Java language and the existence of this method.
Recycling Method Area
Many people think that the method area (or the permanent generation in the Hotspot virtual machine) is not garbage collected, and theJava Virtual Machine specification does say that virtual machines can not be required to implement garbage collection in the method area , and that the "price/performance" of garbage collection in the method area is generally low: in the heap, Especially in the Cenozoic, the general application of a garbage collection generally can reclaim 70%~95% space, and the permanent generation of garbage collection efficiency is much lower than this.
the garbage collection of the permanent generation mainly recycles two parts: obsolete constants and useless classes. reclaiming obsolete constants is very similar to reclaiming objects in the Java heap. For example, if a string "ABC" has entered a constant pool in the case of a constant pool literal, the current system does not have any string object called "abc", in other words, there is no string object referencing the "ABC" constant in the constant pool. There is no other place to quote this literal, and if a memory recycle occurs at this time, and if necessary, the "ABC" Constant will be "please" out of the constant pool. The symbolic references to other classes (interfaces), methods, and fields in a constant pool are similar. It is relatively straightforward to determine whether a constant is an "obsolete constant", and the condition to determine whether a class is a "useless class" is much more harsh. Classes need to meet the following 3 conditions to be considered "useless classes":
All instances of the class have been reclaimed, that is, no instances of the class exist in the Java heap.
The ClassLoader that loaded the class have been recycled.
The corresponding Java.lang.Class object of this class is not referenced anywhere and cannot be used to access the class's methods at any place.
Virtual machines can recycle the useless classes that meet the above 3 criteria, which is simply "yes", not the same as the object, and will inevitably be recycled if not used. Whether the class is recycled, the hotspot virtual machine provides control of the-XNOCLASSGC parameters, and can also use-verbose:class and-xx:+traceclassloading,-xx:+ Traceclassunloading view load and unload information for a class.
Scenarios that use bytecode frameworks such as reflection, dynamic proxies, cglib, and dynamic generation of custom classloader such as JSPs and OSGi require virtual machines to have class offload capabilities to ensure that the permanent generation does not overflow.
Garbage collection algorithm
Because the implementation of garbage collection algorithm involves a lot of program details, and the methods of virtual machine operating memory of each platform are different, so this section does not intend to discuss the implementation of the algorithm too much, just introduces the idea of several algorithms and its development process.
Tag-Purge algorithm
The most basic collection algorithm is the "mark-Clear" (mark-sweep) algorithm , like its name, the algorithm is divided into "mark" and "clear" two stages: first mark out all the objects that need to be recycled, after the mark is completed, the unified collection of all tagged objects, Its tagging process has been basically introduced in the previous section about object tag determination. The reason is that it is the most basic collection algorithm , because the subsequent collection algorithms are based on this idea and improve their shortcomings. Its main shortcomings are two: one is the efficiency problem, the labeling and removal process is not efficient, and the other is a space problem, the mark will produce a large number of discontinuous memory fragments, too much space debris can cause, When the program needs to allocate large objects in the future, it cannot find enough contiguous memory and has to trigger another garbage rubbish collection action in advance. the tag-purge algorithm is shown in procedure 3-2.
Replication Algorithms
to solve the efficiency problem , a collection algorithm called "Replication" (Copying) appears, dividing the available memory by capacity into two blocks of equal size, using only one piece at a time. When this piece is run out, copy the surviving object to the other piece, and then clean up the used memory space once. This makes each one of the pieces of memory recycling, memory allocation will not consider the complexity of memory fragmentation, as long as the mobile heap top pointer, in order to allocate memory, simple implementation, efficient operation. But the cost of this algorithm is to reduce the memory to half of the original , it is a little too high. The replication algorithm is shown in procedure 3-3.
Today's commercial virtual machines are using this collection algorithm to recover the new generation, IBM's special research shows that the new generation of object 98% is going to die, so do not need to divide the memory space according to the ratio of 1:1, but the memory is divided into a larger Eden space and two smaller survivor space, Each time you use Eden and one of the survivor. When recycled, the objects that are still alive in Eden and survivor are copied one at a time into another survivor space, finally clearing the space of Eden and the survivor that was just used. The default Eden and survivor size ratio of the hotspot virtual machine is 8:1, that is, each new generation of available memory space for the entire Cenozoic capacity of 90% (80%+10%), only 10% of the memory will be "wasted". Of course, 98% of the objects can be recycled only in the general scenario of the data, we have no way to ensure that only a few more than 10% per collection of objects to survive, when the survivor space is not enough, you need to rely on other memory (this refers to the old age) for the allocation of security (Handle Promotion).
Memory allocation guarantee is like we go to the bank to borrow money, if we have good reputation, in 98% of the situation can be repaid on time, so the bank may be the default we will be able to repay the loan on time, only need a guarantor can guarantee if I can not pay, the bank will be deducted from his account, the banks think there is no risk. The memory allocation guarantee is also the same, if another piece of survivor space does not have enough space to hold the last generation of surviving objects collected, these objects will be directly through the allocation of security mechanisms into the old age. As to the content of the new generation's distribution guarantees, this chapter will explain the garbage collector execution rules later in detail.
Tagging-sorting algorithms
The replication collection algorithm performs more replication operations when the object has a higher survival rate and becomes less efficient. More crucially, if you do not want to waste 50% of space, you need to have additional space to allocate security, in order to deal with all the objects in the memory used in 100% survival extreme situation, so in the old age generally can not directly select this algorithm.
According to the characteristics of the old age, someone proposed another "mark-and-sweep" (mark-compact) algorithm, the marking process is still the same as the "tag-purge" algorithm, but the next step is not directly to the recyclable objects to clean up, but to let all the surviving objects moved to one end, Then directly clean out the memory outside the end boundary, as shown in 3-4 of the "mark-and-organize" algorithm. 3.3.4
Generational collection Algorithms
The current garbage collection of commercial virtual machines uses the "generational collection" (generational Collection) algorithm, which does not have any new ideas, but divides the memory into several blocks based on the different life cycles of the objects. The Java heap is generally divided into the new generation and the old age, so that according to the characteristics of each era to adopt the most appropriate collection algorithm. in the new generation, a large number of objects have been found dead in each garbage collection, only Small Amount survival, then choose the replication algorithm, only need to pay a small number of surviving objects copy cost can be completed collection. In the old age, because of the high survival rate of the object and the lack of additional space to guarantee it, it must be recycled using the "mark-clean" or "mark-sweep" algorithm.
Garbage collector
if the collection algorithm is the method of memory recycling, the garbage collector is the specific implementation of memory recycling. there is no provision for how the garbage collector should be implemented in the Java Virtual Machine specification, so different vendors, different versions of the virtual machines, may have a large difference in the garbage collector, and generally provide parameters for users to combine their own application characteristics and requirements of the collectors used in various eras. The collector discussed here is based on the Sun Hotspot virtual Machine version 1.6 Update 22, which contains all the collector 3-5 shown.
Java Garbage collector standard details and uses