Java memory management-Garbage Collector

Source: Internet
Author: User

 

Overview

 

When talking about Garbage Collection (GC), most people regard this technology as a companion product of the Java language. In fact, GC has a long history than Java. The Lisp, which was born on MIT in 1960, is the first language that truly uses the dynamic memory allocation and garbage collection technology. When Lisp was still in the embryonic period, people were thinking:

  Three tasks to be completed by GC:

Which memory needs to be recycled?

When will it be recycled?

How to recycle it?

After half a century of development, the technology of dynamic memory allocation and memory recovery has become quite mature, and everything seems to have entered the "Automated" era, so why do we need to know about GC and memory allocation? The answer is simple: when you need to troubleshoot various problems of memory overflow and Memory leakage, when garbage collection becomes the bottleneck for the system to reach a higher concurrency, we need to implement necessary monitoring and adjustment for these "automated" technologies.

The time was transferred back to the present half a century ago and back to the familiar Java language. Chapter 2 describes the various parts of the Java memory runtime region. The program counters, virtual machine stacks, and local method stacks are generated with threads and destroyed with threads; stack frames in the stack are methodically executed as methods enter and exit. The amount of memory allocated in each stack frame is basically known when the class structure is determined (although the JIT compiler will perform some optimizations at runtime, in this chapter, the concept model-based discussion, in general, it can be considered as known during the compilation period ),Therefore, the memory allocation and recovery in these regions are determined.In these regions, there is no need to worry too much about recycling, because when the method ends or the thread ends, the memory will naturally be recycled.The Java heap and method zones are different. The memory required by multiple implementation classes in one interface may be different, and the memory required by multiple branches in one method may be different, only when the program is running can we know what objects will be created. The allocation and recycling of this part of memory are dynamic, and the garbage collector focuses on this part of memory, in the subsequent discussions in this book, "Memory" Allocation and recovery only refer to this part of memory.

 

Reclaim Policy

 

Recycling:   

The memory is released by clearing unused objects, and garbage collection is another important function.Eliminate heap memory space fragments.

1. Reference count

Many textbooks use the following algorithm to determine whether an object is alive: Add a reference counter to the object, and Add 1 to the counter value whenever a reference is made; when the reference fails, the counter value is reduced by 1. Objects whose counters are 0 at any time cannot be used again. Therefore, the recovery of A may cause A chain reaction.

Objectively speaking, the implementation of Reference Counting is simple and highly efficient. In most cases, it is a good algorithm and has some well-known application cases, for example, Microsoft's COM (Component Object Model) technology, the Flash Player using ActionScript 3, The Python language, and Squirrel, which is widely used in the field of game scripts, all use the reference counting algorithm for memory management. However,The reference counting algorithm is not used in Java to manage memory. The main reason is that it is difficult to solve the issue of cross-cycle reference between objects.

Advantages: simple and fast

Disadvantage: Circular references cannot be detected. For example, if subclass A of a references A and A references a, A and a will never be recycled. This disadvantage is fatal, so this policy is no longer needed.

   2. Root Search Algorithm  

In mainstream commercial programming languages (Java, C #, and even the old Lisp mentioned above ),The root search algorithm (GC Roots Tracing) determines whether the object is alive.. The basic idea of this algorithm is to use a series ofGC RootsThe object is used as the starting point to start from these nodes. The path that is searched is called Reference Chain ), when an object to GC Roots is not connected by any reference chain (in graph theory, it is impossible to reach this object from GC Roots), it proves that this object is unavailable. As shown in 3-1, although object 5, object 6, and object7 are correlated with each other, they are not reachable to GC Roots, so they will be determined to be recyclable objects.

  In Java, the following types of objects can be used as GC Roots objects:

The referenced object in the VM stack (the local variable table in the stack frame.

The object referenced by the class static attribute in the method area.

The object referenced by constants in the method area.

Objects referenced by JNI (Native method) in the local method Stack

The tracking collector usually uses two policies:

1. compression COLLECTOR: if an object is found to be valid during traversal, the object will immediately slide over the idle area to one end of the heap, so that a large continuous idle area will appear on the other end of the heap, this eliminates heap fragments.

2. Copy COLLECTOR: the heap is divided into two areas of equal size, and only one area is used at any time. Objects are allocated in the same region until the region is exhausted. In this case, the program execution is aborted, the heap is traversed, and the object marked as activity is copied to another region. This method is called "Stop and copy ".

The main disadvantage of this approach is: It is too rough to copy all the copies, the granularity is too large, and the overall performance is not high. Therefore, we have a more advanced "collection by generation" collector"

3. collection by generation

Based on two facts:

1) Most objects created by most programs have a short life cycle.

2) Most programs create objects with very long lifecycles.

Therefore, the generation-based collection policy is based on the "Stop and copy" policy, which divides objects into class objects based on their lifecycles. It divides the heap into multiple sub-heaps, and each sub-heap serves as a "substitute" object. The youngest generation had the most frequent garbage collection. Without a garbage collection, the surviving objects will "grow" to the older "Generation". The older the "Generation" objects, the fewer the number of them, and the more stable they are, therefore, we can take a very economic strategy to deal with them, simply "take care of" them. In this way, the overall garbage collection efficiency is higher than the simple and crude "Stop and copy.

4. Train Algorithm

Is the train algorithm used to replace the generation-by-generation collection policy? No, it can be said that the train algorithm is a powerful supplement to the generation-by-generation collection policy. We know that the collection by generation policy divides the heap into multiple "generations". The maximum size can be specified for each generation, except for the "mature object space, "mature object space" Cannot specify the maximum size, because it is the "Oldest" object and ultimately the only destination. In addition, these "old guys" have nowhere to go. However, you cannot determine how many old objects a system will eventually squeeze into the "mature object space".

  The train algorithm details the organization of the mature object space of the garbage collector collected by generation. The purpose of the train algorithm is to provide time-limited progressive collection in the mature object space.

The train algorithm divides the mature object space into memory blocks of a fixed length, and each time the algorithm is executed separately in a block. Why is it "train algorithm"? This is related to the way algorithms organize these blocks.

Each piece of data is equivalent to a carriage

Each data block belongs to a set, and all data blocks in the set have been sorted. Therefore, a set is like a train.

The mature object space contains multiple sets, so it is like having multiple trains, and the mature object space is like a railway station.

  

Figure 14 contains several trains marked in order. A train is composed of multiple trains marked in order. in this example, there are two trains. each carriage can store up to three objects, and each train can contain any number of carriages.

A train's memory set is the sum of memories of all its trains, excluding references from other trains. in figure 14, the object E is a 1.1 carriage in the reference set, but he is not in the reference set of train 1. because the garbage collection algorithm always starts from marking the smallest carriage, when updating the reference set, only those references from the marked high carriage are considered. therefore, Object E belongs to the memory set of compartment 1.1, while Object C is not in the memory set of compartment 1.2.

When the Garbage Collector collects the first carriage, object A needs to be retained. Because the root reference points to it, it will be copied to A completely new train. because object B is only referenced by A, it will be copied to the same train as. this is very important, because in this way, the self-circulating garbage object structure is eventually transferred to the same separate train. because Object C is referenced by objects from the same train, it is copied to the end of the train. now the first carriage is empty and can be released. after the first recycling, the conditions in the railway station can be shown in 15

  

The memory set is updated accordingly. the first train has not been referenced from outside (outside the first train). Therefore, the whole train space will be released in the next recycling process. 16.

  

The garbage object structure self-circulating in the first train will not be copied to another train at any time. when all objects not in this self-loop structure are copied to other trains, the train will be released. this is easy to understand. but can we ensure that the structure of each self-loop will eventually remain in the first train? If a self-loop structure is distributed in different trains, the second train will become the first train in the Self-loop structure after a series of iterations, all objects in the struct will be allocated to other trains. (other trains here refer to the trains occupied by the self-circular structure just now, except for the first train .). therefore, the number of trains containing this self-loop structure will be reduced by one. when the number of trains reaches 1, the remaining train contains all objects in the Self-loop structure. Therefore, the garbage object structure can be correctly recycled.
Figure 17 shows a self-loop structure composed of four objects.

  

Overall algorithm flow

1. Select the train with the minimum number.

2. If the train's memory set is empty, release the entire train and terminate it. Otherwise, proceed to step 3.

3. Select the smallest carriage in the train.

4. Each element of the carriage memory set:

If it is an object referenced by the root, it will be copied to a new train. If it is an object pointed to by another train, then, copy it to the train pointing to it.

If some objects have been retained, the objects that can be touched by these objects will be copied to the same train.

In this step, it is necessary to update the affected reference set accordingly. If an object is referenced by objects from multiple trains, it can be copied to any train.

Release and terminate.

 

  Reference again

 

Whether the reference count algorithm is used to determine the number of objects referenced, or the root search algorithm is used to determine whether the object's reference chain is reachable, it is related to "Reference" to determine whether the object is alive. Before JDK 1.2, the reference definition in Java is very traditional: If the value stored in the reference data represents the starting address of another memory, this memory represents a reference. This definition is pure, but too narrow. An object can only be referenced or not referenced in this definition. For how to describe something "tasteless, the object is powerless.We want to describe such an object: When memoryWhen the space is sufficient, the objects can be kept in the memory. If the garbage collection is still very tight, you can discard these objects.Many system cache functions comply with such application scenarios.

After JDK 1.2, Java expanded the reference concept and divided the referenceStrong Reference, Soft Reference, and WeakReference and Phantom ReferenceThese four types of references gradually weaken.

Strong references are commonly used in program code, such as references such as "Object obj = new Object,As long as a strong reference still exists, the garbage collector will never recycle the referenced objects.

  Soft references are used to describe some useful but not necessary objects.For soft reference associated objects, these objects will be listed in the recycle range and recycled for the second time before the system will encounter a memory overflow exception. If the recovery still does not have enough memory, a memory overflow exception will be thrown. After JDK 1.2, the SoftReference class is provided for soft reference.

Weak references are used to describe non-essential objects, but they are weaker than soft references. objects associated with weak references can only survive until the next garbage collection. When the spam collector is working, only objects associated with weak references will be reclaimed no matter whether the current memory is sufficient. After JDK 1.2, the WeakReference class is provided to implement weak references.

A Virtual Reference is also called a ghost reference or phantom reference. It is the weakest reference relationship. Whether an object has a virtual reference does not affect its survival time, nor can it be used to obtain an object instance through virtual reference. The only purpose of setting a Virtual Reference Association for an object is to receive a system notification when the object is recycled by the Collector. After JDK 1.2, the PhantomReference class is provided to Implement Virtual Reference.

 

Survival or death?  

InRoot Search AlgorithmObjects that cannot be reached in are not "non-dead". At this time, they are temporarily in the "probation" stage. To truly declare the death of an object, they must go through at least two marking processes: if the object does not find a reference chain connected to GC Roots after root search, it will be marked for the first time and filtered, the filtering condition is whether it is necessary to execute the finalize () method for this object. When the object does not overwrite the finalize () method, or the finalize () method has been called by the virtual machine, the virtual machine regards both cases as "unnecessary execution ".

If this object is determined to be necessary to execute the finalize () method, the object will be placed in a Queue named F-Queue, later, it will be executed by a low-priority Finalizer thread automatically Established by the virtual machine. The so-called "execution" refers to the virtual opportunity to trigger this method, but it does not promise to wait until it ends. The reason for this is that if an object is slow to execute in the finalize () method, or an endless loop (more challenging) occurs ), it is likely that other objects in the F-Queue will be permanently waiting, or even cause the entire memory recovery system to crash. The finalize () method is the last chance for an object to escape the fate of death. Later, GC will mark the object in F-Queue for a second small scale, if the object is to be in finalize () -You only need to re-associate with any object on the reference chain, for example, assigning yourself (this keyword) to a class variable or a member variable of the object, in the second tag, it will be removed from the "to be recycled" set; if the object has not escaped yet, it is not far from death. From code listing 3-2, we can see that the finalize () of an object is executed, but it can still survive.

From the running result of code listing 3-2, we can see that the finalize () method of the SAVE_HOOK object is indeed triggered by the GC collector and escaped before being collected.

Public class FinalizeEscapeGC {public static FinalizeEscapeGC SAVE_HOOK = null; public void isAlive () {System. out. println ("yes, I am still alive! ");} Protected void finalize () throws Throwable {SAVE_HOOK = new FinalizeEscapeGC (); // the first time the object is successfully saved, SAVE_HOOK = null; System. gc (); // because the Finalizer method has a low priority, pause for 0.5 seconds to wait for its Thread. sleep (500); if (SAVE_HOOK! = Null) {SAVE_HOOK.isAlive ();} else {System. out. println ("no, I am dead! ");} // The following code is exactly the same as the above Code, but this self-help failed. SAVE_HOOK = null; System. gc (); // because the Finalizer method has a low priority, pause for 0.5 seconds to wait for its Thread. sleep (500); if (SAVE_HOOK! = Null) {SAVE_HOOK.isAlive ();} else {System. out. println ("no, I am dead! ");}}}

Running result:

1 finalize method executed!2 yes, i am still alive!3 no, i am dead!

The other thing worth noting is that there are two identical code segments in the code, but the execution result is an escape success and a failure, because the finalize () of any object () methods are automatically called only once by the system. If the object faces the next recycle, its finalize () method will not be executed again, so the self-rescue action of the second Code fails. It should be particularly noted that the above description of the finalize () method for object death may have a tragic artistic color. I do not encourage you to use this method to save the object. On the contrary, I suggest you avoid using it as much as possible because it is not a destructor in C/C ++, but a compromise made by Java to make it easier for C/C ++ programmers to accept. It runs at a high cost and has a high uncertainty. It cannot guarantee the call sequence of each object. Some textbooks mentioned that it is suitable for "disabling external resources" and other work, which is completely a kind of self-comfort for the purpose of this method. All the work that finalize () can do can be done better and more timely using try-finally or other methods. you can completely forget the existence of this method in Java.

 

 

PS.

Reference

Http://www.cnblogs.com/gw811/archive/2012/10/19/2730258.html

Http://www.cnblogs.com/wenfeng762/archive/2011/11/18/2137882.html

Http://nileader.blog.51cto.com/1381108/402609

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.