jvm--on garbage collection mechanism

Source: Internet
Author: User
Tags constant data structures garbage collection reflection require

Talking about GC should be one of the most exciting techniques for Java programmers, and I believe every Java programmer has an impulse to explore the nature of GC.

This blog post revolves around three questions:

1. Which memory needs to be recycled.
2. When to recycle.
3. How to recycle. which memory needs to be recycled

First, answer the first question: objects that are no longer used need to be recycled, and classes that are not used may be recycled .

So how do we tell if an object is no longer used? There are two main ways to do this. Reference Counting Algorithm

Definition: Add a reference counter to an object, and whenever there is a reference to it, the counter is incremented by 1, and when the reference fails, the counter is reduced by one, and any object whose counter is 0 at any moment is the object that will not be used.

As we can see, the reference technology approach is simple to implement. And there are some GCS that do use reference counting algorithms, but in Java virtual machines This method is not used for memory management, because a problem is difficult to solve-the circular reference between objects .

Take a look at an example:

Class Node { 
    node next;
} 

Node A = new node (); 
Node B = new node (); 

A.next = b; 
B.next = A; 

A = null;
b = null;

As the above code, when we execute the last two lines of code, the object in the heap because there is still a circular reference, so the reference count is not 0, causing the GC to not reclaim the memory of both objects. Accessibility Analysis Algorithm

Languages such as Java and C # Use this algorithm to determine whether an object survives.

Basic idea:

Through a series of objects called "GC Roots" as the starting point, starting from these nodes to search down, the path of the search is called the reference chain , when an object to "GC Roots" no reference chain connected, it proves that this object is not available.

As shown in figure:

In the Java language, the GC root object can include the following: A reference object in a virtual machine stack (a table of local variables in a stack frame). The object referenced by a static property or constant (final) in the method area. The object referenced by JNI (that is, generally speaking, the native method) in the local method stack. recovery of method areas

The Java Virtual Machine specification describes that virtual machines can not be required to implement garbage collection in the method area, so many people think that there is no garbage collection in the method area.

The reason why the virtual machine does not require garbage collection of the method area is mainly the low cost, in the heap, especially in the Cenozoic, the garbage collection will generally reclaim 70%~95% space, but the garbage collection rate of the method area is much lower than this.

Even so, garbage collection of the method area is not unnecessary, and the use of frequently defined classloader scenarios such as reflection, dynamic proxies, and so on, requires the functionality of the virtual machine offload class to ensure that the method area does not overflow.

Garbage collection in the method area mainly recycles obsolete constants and useless classes .

The determination and recovery of obsolete constants is simple: Take "abc" as an example, if there is no object in the current system to reference this constant, there is no other place (the blogger guesses that there are some places in the. class file that refer to this constant) reference this literal. At this point, if a memory recycle occurs, the constant is cleared out of the constant pool. (The symbol reference for other classes, interfaces, methods, and fields in a constant pool is similar to this )

A useless class would need to meet the following three conditions:

1. There are no instances of this class.
2. The classloader that loaded the class have been recycled (more demanding).
3. The class object of this category is not referenced anywhere, that is, the reflection mechanism cannot be used.

Virtual machines can reclaim useless classes that meet the above three criteria. Enumerate root nodes

In this section we should consider a question about the accessibility analysis algorithm, how we should find out those GC Roots.

At present, a lot of applications are only the method area of hundreds of trillion, if you want to check the reference in this, will inevitably consume a lot of time.

To solve this problem, we first clear the concept of accurate memory management: the virtual machine can know what type of data is in the memory location. Based on this implementation, in a hotspot, a set of data structures called Oopmap are used to hold the location where the object references in memory are stored.

Typically, when a class is loaded, the hotspot calculates what type of data is in the object's offset, and in the JIT compilation (run-time optimization) process, it also records where in the stack and register the reference is in a particular location .

This allows the GC to get this information directly when it is scanned. talk about references again

References are divided into strong references (strong Reference), soft references (Soft Reference), weak references (Weak Reference), virtual references (Phantom Reference) 4 species, these 4 kinds of reference strength gradually weakened.

1. A strong reference is a common reference in program code, such as "Object obj = new Object ()", as long as a strong reference exists, and the garbage collector never reclaims the referenced object.

2. Soft references are used to describe some objects that are useful but not necessary. For objects associated with soft references, these objects are then listed in the collection scope for a second collection before the system is about to occur with a memory overflow exception. If this collection does not have enough memory, a memory overflow exception will be thrown. After JDK 1.2, the SoftReference class was provided to implement soft references.

3. A weak reference is also used to describe a non-required object, but its strength is weaker than a soft reference, and the object associated with the weak reference only survives until the next garbage collection occurs. When the garbage collector is working, the objects associated with a weak reference are reclaimed regardless of whether the current memory is sufficient. After JDK 1.2, the WeakReference class was provided to implement weak references.

4. A virtual reference , also known as a phantom reference or phantom Reference, is the weakest reference relationship. Whether an object has a virtual reference exists, does not affect its lifetime at all, and cannot obtain an object instance through a virtual reference. The only purpose of setting a virtual reference association for an object is to be able to receive a system notification when the object is reclaimed by the collector . After JDK 1.2, the Phantomreference class is provided to implement the virtual reference. When is it recycled? Finalize Method

With the above algorithms, a virtual machine can know what objects in memory need to be reclaimed at this time, but when will the virtual machine recycle the objects? We need to talk about the Finalize method.

In the JVM, when an object is judged to be garbage through the accessibility analysis algorithm, the JVM is not able to recycle it directly, the first is that the garbage collection mechanism is not in real time, and the other is to determine whether to run its Finalize method before actually reclaiming an object.

When an object is judged to be garbage, it is first tagged and filtered, and the criteria for filtering is whether this object is necessary to perform a Finalize method.

How to tell if an object is necessary to perform a Finalize method.

In both cases, the virtual opportunity is considered "no need to execute": The object does not overwrite the Finalize method the Finalize method has been called by the virtual machine (the Finalize method is only called once)

If this object is determined to be necessary to execute the Finalize method, then the object is placed in a queue called F-queue, and later by a virtual machine created by a low-priority finalizer thread to execute the object's Finalize () method, And the object in the Finalize () method execution if the execution is slow or the cycle of life and death will cause other objects in the F-queue queue to be permanently waiting. It even causes the entire memory recovery system to crash. The GC will then mark the objects in the F-queue for a second time. If these objects can save themselves in their own finalize () method before the second tag (and re-associate with any object on the reference chain), they can survive successfully and be removed from the collection that is about to be recycled. If you have not escaped at this time, it is really going to be recycled.

Note: The Finalize () method is expensive, uncertain, and does not guarantee the order in which individual objects are called. Bloggers recommend that you completely forget the existence of this method in the Java language. How to recycle markup Cleanup Algorithm

The algorithm is divided into two phases: Mark and clear all the objects that need to be recycled, collect all tagged objects uniformly after the mark is complete, and mark the process with the accessibility algorithm.

Insufficient:
Efficiency issues, marking and clearing two processes are not efficient.
Spatial problems, the Mark cleanup generates a lot of discontinuous memory fragmentation, resulting in insufficient memory to allocate large objects at a later time to trigger another garbage collection action in advance.

Mark, clear the process plot:

Replication Algorithms

Divide the existing memory space into two blocks, using only one block at a time, copy the surviving objects in use in memory to the unused block of memory at garbage collection, clear all objects in the memory block in use, swap the two memory roles, and complete the garbage collection.

If there are many garbage objects in the system, the number of surviving objects that the replication algorithm needs to replicate is not too large. As a result, the efficiency of the replication algorithm is high at a time when garbage collection is really needed. And because objects are uniformly copied to the new memory space during garbage collection, you can ensure that the reclaimed memory space is not fragmented. The disadvantage of this algorithm is to put the passbook in the system in half.

Today's commercial virtual machines are using this collection algorithm to reclaim the new generation , and we obviously can not tolerate the loss of half of the internal passbook, fortunately, IBM research shows that 98% of the new generation of objects are "dying", so do not need to follow the 1:1 ratio to divide the memory space.

Typically, the memory is divided into two smaller survivor spaces in a larger Eden space, each using Eden and one of the survivor. The default Eden and survivor ratio for the hotspot virtual machine is 8:1.

When each recycle occurs, the objects that are still alive in Eden and survivor are copied one at a time to another survivor space, and then the survivor space that was used is cleared out.

Of course there is a case where another piece of survivor space is not available, and other memory is required for the allocation guarantee . We will talk about the contents of the allotment guarantee later. memory allocation strategy and generational collection algorithm

Perhaps you will wonder what is the new generation and what is the allotment guarantee.

The Java heap divides memory into the new generation and the old age according to the different life cycles of the objects. The Cenozoic is divided into three regions: Eden, from Survivor, to Survivor.

The memory model of the heap is roughly:

When the object is in Eden (including a Survivor area, which is assumed to be from the region) after birth, after a Minor GC, if the object is still alive and can be accommodated by another Survivor area (which is assumed to be the from region, this should be T o area, where the to region has sufficient memory space to store objects that are alive in the Eden and from zones, copy the objects that still exist to another Survivor area (that is, to region) using the replication algorithm, and then clean up the used Eden and Survi VOR area (that is, from region), and the age of these objects is set to 1, after the object in the Survivor area every time Minor GC, the object's age + 1, when the age of the object reaches a certain value (by default, 15 years old, you can pass the parameter-xx:maxtenur Ingthreshold to be set), these objects will become the old age (long-lived objects enter the old age). But this is not certain, for some larger objects (that is, the need to allocate a larger contiguous memory space) is directly into the old generation (pretenuresizethreshold parameter setting).

Minor GC (New generation GC): In the Cenozoic, each garbage collection is found to have a large number of objects die, only a small number of survival, then choose the replication algorithm, only a small number of surviving objects can be copied cost to complete the collection. Minor GC is very frequent, and recovery speed is very fast.

Full gc/major GC (old age GC): High object Survival in the old age, no additional space to allocate it for security purposes, it is necessary to recycle using the "mark-sweep" or "mark-sweep" algorithm. Recovery speed is much slower than minor GC and occurs infrequently.

say something extra: (Dynamic object Age determination)

In order to better adapt to the memory situation of different programs, virtual machines do not always require the age of the object must reach the Maxtenuringthreshold to promote the old age, if the new generation of Eden and from survivor space the same age object size of the sum greater than to Half of the survivor space, then the object of the age is greater than or equal to the old age can be promoted without waiting for the ages required by Maxtenuringthreshold.

(normal operation in the JDK1.6 environment). allocation of security mechanisms

As we said earlier in the new generation GC, objects that survived in Eden and survivor are copied one at a time into another survivor space, and then the survivor space that was used in Eden and just before is cleared.

But there will be a situation where another piece of survivor space is not enough, and a guarantee is required.

In fact, before the occurrence of minor GC, the virtual opportunity to check whether the largest available continuous space in the old age is greater than the total size of all the new generation of objects , if greater than the minor GC is safe. If it is less than, then the virtual machine will see if the Handlepromotionfailure setting value allows the allocation guarantee mechanism to be enabled. If the handlepromotionfailure=true, indicating that the allocation guarantee mechanism is turned on, then continue to check whether the largest available continuous space in the old age is greater than the average size of the previous promotion to the old age object, if greater than, try to do a minor GC, but this time minor The GC is still risky, and if it is less than or handlepromotionfailure=false, a full GC is performed instead.

As mentioned above, the minor GC is still risky because the new generation uses the Replication collection algorithm , if a large number of objects survive after minor GC (the most extreme is that all objects in the Cenozoic are alive after the memory recovery), and survivor space is relatively small, At this time, the old age of the allocation of security, the survivor can not accommodate the object into the old age. The old age had to be guaranteed for space allocation, provided that the old age had enough space to accommodate the objects, but how many objects survived after the memory recovery was unpredictable, so they had to take the average of the size of the object that was promoted to the old age after each garbage collection as a reference. Use this mean to compare with the rest of the old age to decide whether to make full GC to make more space in the old age.

But averaging is still a kind of probability event, if a minor GC after a spike in survival objects, far above the average, it will inevitably lead to the failure of the guarantee. If an allocation guarantee fails, the full GC can only be restarted once the failure has occurred. Although there is a probability of this happening, most of the time it is possible to successfully assign the guarantee, thus avoiding the execution of the full GC too frequently. Reference Reading

"In-depth understanding of Java Virtual Machines"-Zhou Zhiming

In-depth understanding of Java Virtual Machine reading notes-garbage collection algorithm

Java GC, Cenozoic, old generation

Space Allocation guarantee

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.