How Java GC works

Source: Internet
Author: User
Tags java throws

Basic Principles of GC
What is GC? Why does GC exist?
GC refers to garbagecollection. Memory Processing is a place where programmers are prone to problems. Forgetting or wrong memory collection may lead to instability or even crash of programs or systems, the GC function provided by Java can automatically monitor whether the object exceeded the scope to automatically recycle the memory. the Java language does not provide a display operation to release the allocated memory.
Therefore, Java memory management is actually object management, including object allocation and release.
For programmers, The New Keyword is used to allocate objects. When releasing an object, they only need to assign null values to all references of the object so that the program cannot access this object, we call this object "inaccessible ". GC recycles the memory space of all "inaccessible" objects.
For GC, when a programmer creates an object, GC starts to monitor the address, size, and usage of the object. Generally, GC records and manages all objects in heap by Directed Graphs. This method is used to determine which objects are "reachable" and which objects are "inaccessible ". when GC determines that some objects are "inaccessible", GC has the responsibility to recycle the memory space. However, to ensure that GC can be implemented on different platforms, many GC behaviors are not strictly defined in the Java specification. For example, there are no clear rules on the types of recycling algorithms used and when to recycle them. Therefore, the implementers of different JVM often have different implementation algorithms. This also brings many uncertainties to the Development of Java programmers. This article studies several issues related to GC and strives to reduce the negative impact of such uncertainty on Java programs.

Incremental GC (incrementalgc)

GC is usually implemented by one or a group of processes in JVM. It also occupies heap space like the user program, and CPU usage during runtime. the application stops running when the GC process is running. Therefore, when GC runs for a long time, the user can feel the pause of the Java program. On the other hand, if GC runs for a short time, the object recovery rate may be too low, this means that many objects that should be recycled are not recycled and still occupy a large amount of memory. Therefore, we must weigh the pause time and recovery rate when designing GC. A good GC implementation allows users to define the settings they need. For example, some devices with limited memory are very sensitive to memory usage and GC is expected to accurately recycle the memory, it does not care about program slowdown. In addition, some real-time online games cannot allow long periods of program interruptions. Incremental GC divides a long interrupt into many small interruptions through a certain collection algorithm, which reduces the impact of GC on user programs. Although the overall performance of incremental GC may be less efficient than that of normal GC, it can reduce the maximum pause time of the program.
The hotspotjvm provided by sunjdk supports incremental GC. The default GC mode of hotspotjvm is not to use incremental GC. To start incremental GC, we must add the-xincgc parameter when running the Java program. Hotspotjvm incremental GC is implemented using the traingc algorithm. The basic idea is to group (stratified) all objects in the heap by creation and usage, and put frequently used and correlated objects in a queue. As the program runs, constantly adjust the group. When GC is run, it always recycles the oldest (rarely accessed recently) Objects first. If the entire group is recyclable, GC recycles the entire group. In this way, each GC operation only recycles a certain percentage of inaccessible objects to ensure smooth operation of the program.

Why generation division?

The generational garbage collection policy is based on the fact that different objects have different lifecycles. Therefore, objects in different lifecycles can be collected in different ways to improve recycling efficiency.
During Java program running, a large number of objects are generated, some of which are related to business information, such as session objects, threads, and socket connections in HTTP requests, this type of object is directly linked to the business, so the lifecycle is relatively long. However, there are still some objects, mainly temporary variables generated during the program running process. These objects will have a short life cycle, such as string objects, because of their unchanged class features, the system generates a large number of these objects, and some objects can be recycled only once.
Imagine that, without differentiation of the object survival time, every garbage collection process recycles the entire heap space, which takes a long time. At the same time, because every reclaim operation requires traversing all the surviving objects, but in fact, this traversal is ineffective for objects with a long life cycle, because it may have been traversed many times, but they still exist. Therefore, the division of generation garbage collection adopts the idea of division and governance to divide the objects of different life cycles on different generations, different generations use the most suitable garbage collection method for recycling.

Generation Division


Virtual machines are divided into three generations: young generation, old generation, and permanent generation ). The persistent generation mainly stores the class information of Java classes, which has little to do with Java objects to be collected by garbage collection. The division of the young and old generations has a great impact on garbage collection.

Young generation:

All newly generated objects are first put in the young generation. The goal of the young generation is to quickly collect objects with short lifecycles as much as possible. The young generation is divided into three zones. One Eden zone and two vor zones (generally ). Most objects are generated in the Eden area. When the Eden zone is full, the surviving objects will be copied to the primary vor zone (one of the two). When the primary vor zone is full, the surviving objects in this region will be copied to another region vor. When the region VOR is full, the surviving objects will be copied from the first region vor, it will be copied as "tenured )". Note that the two regions of the same vor are symmetric and irrelevant. Therefore, objects copied from Eden and copied from the previous vor may exist in the same region, only objects copied from the first vor region are copied to the old district. In addition, there is always a blank vor area. At the same time, according to the program requirements, the VOR area can be configured as multiple (more than two), which can increase the time for the object to exist in the young generation and reduce the possibility of being put into the old generation.

Elder Generation:

Objects that are still alive after N garbage collection in the young generation will be put into the old generation. Therefore, it can be considered that objects with long lifecycles are stored in the old generation.

Permanent generation:

Used to store static files. Currently, Java classes and methods are supported. Persistent generation has no significant impact on garbage collection, but some applications may dynamically generate or call some classes, such as hibernate, in this case, you need to set up a large persistent storage space to store the classes added during the running process. The persistent generation size is set through-XX: maxpermsize =.

Under what circumstances will garbage collection be triggered?

Because the object is divided into generations, the garbage collection area and time are different. There are two types of GC: scavenge GC and full GC.

Scavenge GC

Generally, when a new object is generated and the Eden application fails, scavenge GC is triggered to perform GC on the Eden region to clear non-surviving objects, and move the surviving objects to the same vor area. Then, sort out the two zones in the same vor. In this way, GC is performed on the Eden area of the young generation and will not affect the old generation. Because most objects start from the Eden area and the Eden area is not allocated much, GC in the Eden area is performed frequently. Therefore, it is generally necessary to use fast and efficient algorithms so that Eden can be idle as soon as possible.

Full GC

Organize the entire heap, including young, tenured, and perm. Full GC is slower than scavenge GC because it is necessary to recycle the entire GC, so the number of full GC should be minimized. In the process of JVM optimization, a major part of the work is to adjust fullgc. Full GC may occur due to the following reasons:

· The tenured is full

· Perm is full

· System. GC () is displayed and called

· Dynamic Change of heap allocation policies for various domains after the last GC


Detailed description of finalize Functions

Finalize is a method located in the object class. The access modifier of this method is protected. Because all classes are subclasses of objects, the user class can easily access this method. Because the finalize function does not automatically implement chained calls, we must implement them manually. Therefore, the finalize function's last statement is usually super. Finalize (). In this way, we can implement finalize calling from bottom to top, that is, releasing our own resources first and then releasing the parent class resources.
According to the Java language specification, the JVM ensures that the object is reachable before the finalize function is called, but the JVM does not guarantee that the function will be called. In addition, the finalize function can run at most once.
Many Java beginners will think that this method is similar to the destructor in C ++ and put the release of many objects and resources in this function. In fact, this is not a good method. There are three reasons: First, GC needs to perform a lot of additional work on the objects that overwrite the function to support the finalize function. Second, after the finalize operation is complete, the object may become reachable. GC also checks whether the object is reachable. Therefore, using finalize reduces the Running Performance of GC. Third, the time for GC to call finalize is uncertain, so releasing resources in this way is also uncertain.
In general, finalize is used to release very important resources that are not easily controlled, such as I/O operations and data connections. The release of these resources is critical to the entire application. In this case, programmers should primarily manage (including release) these resources through the program itself, supplemented by the finalize function to release resources, to form a double-insurance management mechanism, instead of relying solely on finalize to release resources.
The following example shows that after the finalize function is called, it may still be reachable. It can also be said that the Finalize of an object can only run once.

Classmyobject {
Testmain; // record the test object, which is used to restore accessibility in finalize
Publicmyobject (testt)
Main = T; // Save the test object
Protectedvoidfinalize ()
Main. ref = This; // restore the object so that the object can be reached
System. Out. println ("thisisfinalize"); // used to test finalize only once

Classtest {
Publicstaticvoidmain (string [] ARGs)
Testtest = newtest ();
Test. ref = newmyobject (test );
Test. ref = NULL; // If the myobject object is an inaccessible object, finalize will be called.
System. GC ();
If (test. Ref! = NULL) system. Out. println ("myobject is still alive ");

Running result:
Myobject is still alive: In this example, it is worth noting that although the myobject object becomes an reachable object in finalize, finalize is no longer called during next collection, because the finalize function can be called only once at most.
How the program interacts with GC
Java2 Enhances memory management and adds a java. Lang. Ref package, which defines three reference classes. These three reference classes are softreference, weakreference, and phantomreference. By using these reference classes, programmers can interact with GC to a certain extent to improve GC efficiency. The reference strength of these reference classes is between reachable objects and inaccessible objects.
It is also very easy to create a reference object. For example, if you need to create a softreference object, first create an object and use the normal reference method (reachable object ); then create a softreference to reference the object, and set the normal reference to null. in this way, this object has only one softreference reference. At the same time, we call this object softreference.
Softreference has a strong reference function. This type of memory is recycled only when the memory is insufficient. Therefore, when the memory is sufficient, it is usually not recycled. In addition, these referenced objects can be set to null before Java throws an outofmemory exception. it can be used to cache some common images and implement the cache function to ensure maximum memory usage without causing outofmemory. the following is the pseudo code used for this reference type;

// Apply for an image object
Imageimage = newimage (); // create an image object
// Use Image
// After image is used up, set it to soft reference type and release strong reference;
Softreferencesr = newsoftreference (image );
Image = NULL;
// Next time
If (SR! = NULL) image = Sr. Get ();
Else {
// Due to the low memory usage of GC, the image has been released and therefore needs to be reloaded;
Image = newimage ();
Sr = newsoftreference (image );
The biggest difference between a weak reference object and a soft reference object is that GC needs to check whether the soft reference object is recycled by algorithm during collection. GC always recycles the weak reference object. Weak reference objects are easier and faster to be recycled by GC. Although the weak object must be recycled during GC running, the weak object group with complex relationships often needs several GC operations to complete. Weak reference objects are often used in the map structure to reference objects with a large amount of data. Once the strong reference of this object is null, GC can quickly recycle the object space.
Phantom is rarely referenced and mainly used to assist in the use of finalize functions. Phantom objects refer to some objects. They run the finalize function and are non-reachable objects, but they have not been recycled by GC. This type of object can assist finalize in some subsequent recycling work. We can enhance the flexibility of the resource recovery mechanism by overwriting the clear () method of reference.

Some Java programming suggestions
Based on the working principle of GC, we can use some techniques and methods to make GC run more efficiently and better meet the requirements of applications. Some Suggestions on Program Design:
1. The most basic suggestion is to release reference of useless objects as soon as possible. When using temporary variables, most programmers automatically set reference variables to null after they exit the scope. when using this method, we must pay special attention to some complex object graphs, such as arrays, queues, trees, and graphs. These objects have more complex reference relationships. For such objects, GC is generally less efficient to recycle them. If the program permits, the unused reference object is assigned null as early as possible, which can accelerate GC.
2. Use the finalize function as little as possible. The finalize function is an opportunity that Java provides to programmers to release objects or resources. However, it will increase the GC workload, so use finalize as little as possible to recycle resources.
3. If you need to use frequently used images, you can use the soft application type. It can store images in the memory as much as possible for the program to call without causing outofmemory.
4. Pay attention to the collection data types, including arrays, trees, graphs, and linked lists. The collection of these data structures is more complex for GC. In addition, pay attention to some global variables and some static variables. These variables are often prone to danglingreference, resulting in a waste of memory.
5. When the program has a certain waiting time, the programmer can manually execute system. GC () to notify the GC to run, but the Java language specification does not guarantee that GC will be executed. Incremental GC can shorten the pause time of Java programs.


Java garbage collection mechanism

A. Stop-and-copy: First pause the running of the program, and then copy all the surviving objects from the current heap to another heap. All the objects that are not copied are junk. When objects are copied to the new heap, they are compact. Low Efficiency: first, two heaps occupy 200% of space. Second, copying a large number of living objects is a great waste if there is less garbage.

B. Mark-and-sweep: traverses all references from the battle and static storage areas to find all the surviving objects. If they are alive, they are marked. The cleanup action starts only when all the tags are marked. During cleaning, unlabeled objects will be released without any skin action. But in the midsummer, the space is not consecutive. If the garbage collector wants to get continuous space, it has to reorganize the remaining objects.

C. Note: "Stop-Copy" means that this garbage collection action is not performed in the background. On the contrary, when the garbage collection action occurs, the program will be suspended. Some people regard garbage collection as a background process with a low priority. In fact, this is not the case. When the number of available memory is lower, the sun version of Garbage Collector will suspend the running of the program. Similarly, the "mark-sweep" operation must be performed only when the program is suspended.

D. In Java virtual machines, memory allocation is measured in large blocks. Each block uses the corresponding algebra (generation count) to record whether it is still alive. Algebra increases with the number of references. The garbage collector sorts the newly allocated blocks after the last recycle action. This is helpful for handling a large number of short-lived temporary objects. The garbage collector regularly performs complete cleanup operations-large objects are still not copied (only increase in algebra), and those blocks of small objects are copied and organized. The Java virtual machine monitors all objects. If all objects are stable and the efficiency of the garbage collector is reduced, switch to the "mark-sweep" method. Similarly, the Java Virtual Machine tracks the effect of "mark-sweep". If many fragments occur in the heap space, it switches to the "stop-Copy" mode. This is the "Adaptive" technology.

Conclusion: Java garbage collector is an adaptive, generational, stop-copy, Mark-Clean garbage collector.


3. IBM:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.