Articles in the JVM series (II): Garbage collection mechanism and jvm garbage collection

Source: Internet
Author: User

Articles in the JVM series (II): Garbage collection mechanism and jvm garbage collection

As a programmer, it is far from enough to know how to use it. At least, you need to know why it can be used, that is, what we call the underlying layer.

So what is the underlying layer? I don't think this can be generalized. In my current knowledge: for Web developers, TCP/IP, HTTP, and other protocols may be the underlying layer. For C and C ++ programmers, memory, pointer, and so on may be the underlying thing. For Java developers, the JVM where your Java code runs may be something you need to understand and understand.

I will study JVM with you over the next period of time. For details, refer to "deep understanding of Java VM: JVM advanced features and Best Practices" (version 2). Thank you.


This is the second article in the series. JVM garbage collection mechanismThat is, how to determine which memory should be recycled and how to recycle it on the virtual machine.

If you do not know much about the Java memory area, we recommend that you read the first article in the series: JVM series (1): Java memory area analysis.


I. Determine whether it can be recycled
In the Java memory area, three areas, program counters, virtual machine stacks, and local method stacks, are generated with threads and destroyed with threads; the stack frame in the stack goes out of the stack as the method goes in and out. The amount of memory allocated in each stack frame is basically known when the class structure is determined. Therefore, the memory allocation and recovery in these regions are deterministic. Because when the method ends or the thread ends, the memory is recycled naturally. The Java heap and the method area are different. Only the runtime knows what objects will be created, and the garbage collector focuses on this part of memory.
Before the Garbage Collector recycles the heap, it must first determine which objects can be recycled, that is, they are "dead" and cannot be used again. There are two algorithms for this problem:
1. The reference counting algorithm adds a reference counter to the object. When a counter is referenced in a certain place, the counter value is added with 1. When the reference fails, the counter value is reduced by 1. Objects with a counter value of 0 cannot be used again. This algorithm is easy to implement and has a high judgment efficiency, but it cannot solve the problem of circular reference between objects.
2. the basic idea of the Accessibility analysis algorithm is to use a series of objects that become "GC Roots" as the starting point. From these nodes, the search path becomes the reference chain, when an object is connected to GC Roots without any reference chain, it is proved that this object is unavailable.
Objects that can be used as GC Roots include the following:Virtual Machine stack (local variable table in stack frames) the object method referenced by the class static property in the referenced object method area to the object referenced by the constant in the local method stack.
These two algorithms are related to "Reference". What is reference? If the value stored in the reference data represents the starting address of another memory, this memory represents a reference. In this way, the object can only be referenced or not referenced. It seems powerless for some uninteresting objects. We hope to describe such an object that can be kept in the memory when the memory space is sufficient. If the memory space is still very tight after garbage collection, We will discard these objects. Many system cache functions comply with such scenarios.
Java references after JDK are divided into four types: strong reference, soft reference, weak reference, and virtual reference. These four types of references gradually weaken.
A strong reference is a reference similar to "Object obj = new Object ()". As long as a strong reference still exists, the garbage collector will never recycle the referenced Object. Soft references are used to describe some useful but not necessary objects. Before the system encounters a memory overflow exception, these objects are listed in the recycle range for the second recycle. Use the SoftReference class. Weak references also describe non-essential objects and can only survive until the next garbage collection. Use the WeakReference class. A Virtual Reference is also called a ghost reference. The only purpose of setting a virtual reference for an object is to receive a system notification when the object is reclaimed. Use the PhantomReference class.
Even the reachable objects in the accessibility analysis algorithm are not recycled: If this object is not reachable, it will be marked for the first time and filtered, the filtering condition is whether it is necessary to execute the finailize () method for this object. If the object does not overwrite the finailize () method or the method has been called by the virtual machine, no execution is necessary. If necessary, this object will be placed in the F-Queue and executed by a low-priority Finalizer thread automatically created by a virtual machine. If the finalize () method re-associates with the reference chain, it will get rid of the fate of being recycled. (For example, write XXX. xx = this in the finalize method) Note that there is only one chance of self-saving, because the finailize method of an object will only be called once automatically by the system.

The recycling method area mainly involves the constants and classes in the recycling method area. Constant: for example, if no String object references the "abc" of the constant pool and there is no reference elsewhere, the "abc" will be cleared out of the constant pool. Class: to determine whether a class needs to be recycled, the following three conditions must be met: all instances of this class have been recycled; ClassLoader loaded with this class has been recycled; java. lang. the Class object is not referenced anywhere and cannot access this Class through reflection anywhere.

Ii. Garbage collection Algorithm
1. the Mark-clearing algorithm first marks all objects to be recycled, and then recycles all objects marked by ratio after the mark is complete. Insufficient: the first is the efficiency problem. The efficiency of the process of marking and clearing is not high; the other is the space problem. After marking is cleared, a large number of discontinuous memory fragments are generated.
2. The replication algorithm divides the memory into two equal parts, each of which is used only. When this part is used up, the surviving object is copied to the other part, and the used memory space is cleared once.
3. the markup-sorting algorithm is similar to the markup-clearing algorithm, but instead of cleaning recyclable objects directly, it allows all surviving objects to move to one end and then directly clears the memory outside the end boundary.
4. The generation-based collection divides the memory into several parts based on different object lifecycles. Java heap is generally divided into the new generation and the old generation, so that the most appropriate collection algorithm can be used according to the characteristics of each generation. In the new generation, when a large number of objects die in each garbage collection, and only a small number of objects survive, the replication algorithm is used, and the collection can be completed with the replication cost of a small number of surviving objects. In the old age, the survival rate is high, and the mark cleaning or mark sorting algorithm is used for recovery.

Iii. Security points and security areas
1. When GC is required for the security point, only the security point can stop other threads. The security points cannot be too small, so that GC can be waited too long, and cannot be too much to increase the runtime burden. Generally, only commands such as method call, loop jump, and abnormal jump generate security points. There are two types of Interruption: preemptive interruption and proactive interruption. Preemptive interruption: When GC occurs, all threads are interrupted first. If a thread is found to be interrupted, the thread will be recovered from the security point and run to the security point. (Almost no) Active interrupt: When the GC needs to interrupt the thread, set a flag. When each thread is executed, it actively polls and suspends when it finds the interrupt mark. The location of the polling sign overlaps with the security point, plus the location where the memory needs to be allocated when the object is created.
2. security Zone Security points may encounter problems: the thread is in Sleep or Blocked State. At this time, the thread cannot respond to the JVM interrupt request and "go" to the security point to interrupt and suspend, JVM is unlikely to wait until the thread is re-allocated with CPU. In this case, the security zone is needed. A security zone means that the reference relationship does not change in a code segment. GC is safe to start anywhere in this region. We can regard it as an extended security point. When the thread executes the code in the safe area, it indicates that it enters the safe area. When the JVM initiates GC, it does not need to worry about these threads. When the thread leaves the safe area, check whether the root node enumeration (or the entire GC process) has been completed. if the process is completed, continue the execution; otherwise, wait.

Iv. Various garbage collectors
1. The new generation collector of the Serial collector, namely the replication algorithm and single thread, only one CPU or one collection thread is used to complete garbage collection. When collecting data, you must pause all other working threads until the collection ends.
2. multi-threaded version of ParNew collector Serial. Multiple Threads are used for garbage collection.
3. The new generation collector of Parallel Scavenge collector, replication algorithm, and multithreading. Its focus is different from other collectors. Other concerns are to minimize the pause time of user threads during garbage collection, its goal is to achieve a controllable throughput (CPU Running code time/total CPU consumption time ). The pause time is short and the table throughput is not large, because the total memory size of the new generation may be reduced. The original 500 M, 10 s collected once, each pause 100 ms; now 300 M, 5s once, every 70 ms, pause time decreased, but the throughput also fell.
4. In the Old age of the Serial Old collector, single thread, tag-sorting algorithm.
5. Parallel Old collector

Earlier versions of Parallel Scavenge. Use multithreading and tagging-to organize algorithms.

6. CMS collector Concurrent Mark Sweep collector is a collector designed to obtain the minimum recovery pause time. Generally, applications are deployed on the server. The entire collection process is divided into four steps: initial Tag: Mark the concurrent tag of objects that can be directly associated with GC Roots: retag GC Roots Tracing (forming links: mark the tag record of the part of the object that changes due to the continued operation of the user program during the modification of the concurrent tag. The pause time of this phase is generally longer than the initial tag, but is far shorter than the concurrent Tag time.
The concurrency tag and concurrency can work with the user thread throughout the process. Advantages: Concurrent collection and low pause. Disadvantages: sensitive to CPU resources (concurrent programs are sensitive to CPU resources), unable to handle floating garbage (New garbage is generated when cleaning)
7. G1 collector Garbage-First collector, for the server. Features: parallel and concurrent, generational collection, spatial integration, and predictable pause

5. Memory Allocation
1. objects are preferentially allocated in the Eden area of the new generation.

The new generation also has two vor regions. When the Eden area does not have enough space, initiate a Minor GC.

New Generation GC (Minor GC): refers to the garbage collection action that occurs in the new generation. Because most Java objects are extinct overnight, MInor GC is very frequent and fast. GC in the old age (Major/Full GC): refers to the GC in the old age.
2. large objects directly enter the Old Age
For example, a particularly long string (for strings that occupy less memory, you can refer to the http://www.jb51.net/article/59935.htm), a particularly large array.
3. Long-lived objects enter the Old Age
The Virtual Machine defines an object age counter for each object. If the object is born in Eden and remains alive after the first MinorGC and can be accommodated by the primary vor, it will be moved to the primary vor space and the object age is set to 1. Every time a MinorGC occurs in a vor object, the age increases by 1. To a certain extent (15 years by default), the object is promoted to the old age.
4. Dynamic Object age determination
If the total size of all objects of the same age in the primary vor space is greater than that of the primary vor space, objects of the same age or age enter the primary age.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.