Deep understanding of Java Virtual Machines (iv) garbage collection algorithm and hotspot implementation

Source: Internet
Author: User

garbage Collection Algorithm

In general, the garbage collection algorithm is divided into four categories:

tag-purge algorithm

The most basic algorithm is the tag-purge algorithm (mark-sweep). The algorithm is divided into "mark" and "clear" two stages: first mark the objects that need to collect, after the mark completes, then collect all the tagged objects uniformly.

This is the simplest algorithm, but the disadvantage is also obvious: one is the efficiency problem, the mark and clear efficiency is not high. The second is the space problem, which will result in a lot of space debris, resulting in the allocation of large objects can not find enough contiguous objects and have to trigger another garbage collection action. The algorithm executes the process such as.


Replication Algorithms

The replication algorithm (Copying) divides the available memory into two equal parts per capacity, using only half of the time. When this piece of memory is used up, the surviving object is copied to another piece of memory, and then the previous block of memory is emptied. The advantage is that the problem of space debris is solved, and the allocation of new objects in order is simple and efficient. The disadvantage is that memory is reduced by half. The algorithm is as follows.


today's commercial virtual machines use this collection algorithm to reclaim the new generation. Due to the high mortality rate of new generation objects, it is possible to divide the memory into a larger Eden space and two smaller survivor spaces, each using Eden and a survivor. When recycled, copy the objects that are still alive in Eden and one survivor to another survivor, and then clean up Eden and the previously used survivor space. Hotspot Virtual MachineThe default Eden and survivor ratios are 8:1, which means that only 10% of the memory is "wasted". The Recovery object algorithm used in the young and old areas is not the same, because when the recovery is full, old does not need to be recycled, and when the old area is full to reclaim objects, the entire memory heap is cleaned up, and the user can set The recovery of the young and old areas is multi-threaded or single-threaded, so the designer is hoping that the object will be able to stay in the young area for more time to improve the efficiency of the recycled objects. designed as from and to two flat  Row area, I think it is to filter objects that really meet the requirements of the old area (that is, objects that require long-held references), and then put them into the old area.

tagging-sorting algorithms

The replication algorithm is inefficient in the case of high object survival. and wasted 50% of the space.

According to the characteristics of the old age, someone proposed another "marker-collation" algorithm (mark-compact). The algorithm is also divided into two stages: marking and collating. Tag is the same as the tagging procedure for the mark-clear algorithm. When the tag is complete, the recyclable objects are not sorted directly, but all the surviving objects are organized into contiguous, and then the remaining space is cleared away. The algorithm is as follows.


Generational Collection Algorithms

Current commercial virtual machines use the generational Collection algorithm to divide memory into chunks based on the lifetime of the object. In general, the Java heap is divided into the new generation and the old age, so you can use the most appropriate collection algorithm according to the characteristics of each age. The new generation uses the replication algorithm, the old age belt uses the mark-sweep or the mark-tidy algorithm.



hotspot Algorithm Implementation

Moreover, accessibility analysis must be done in a consistent snapshot-that is, the system freezes as it is during the entire analysis. Otherwise, if one side of the analysis, the system on one side of the dynamic table, the results will not be accurate. This results in the system GC having to pause all Java execution threads.

Now the mainstream Java virtual machines are using the exact GC, so when the execution system is paused, there is no need to check all execution contexts and global reference locations without leaking, A virtual machine should have a way to know directly where the object reference is stored. In the hotspot implementation, this is accomplished by using a set of data structures called oopmap . Oopmap will record what type of data is in the object's offset at the time the class is loaded, and in the JTI compilation process, the stack and register are referenced at a specific location. In this way, this information can be obtained directly when the GC is scanned.

Safety Point

There is a lot of instructions that could lead to a change in the reference relationship, or oopmap content changes, and the hotspot does not generate oopmap for each instruction, but only records the information in a specific location that becomes a "security point" (SafePoint). When the program executes, it only pauses to start the GC when the security point is reached. Commands that typically have a longer run time can be selected as security points, such as method calls, loop jumps, exception jumps, and so on.

The next thing to consider is how to ensure that all threads "run" to a safe point in the GC. There are two scenarios: preemptive interrupt (preemptive suspension) and active interrupt (voluntary suspension).

Preemptive interrupts interrupt all threads, and if a thread is not on a security point, restore it to a secure point. There are few virtual machines in this way.

The active interrupt idea is to set up a GC flag that each thread polls for and interrupts its own suspend when needed. In this way, the flags and security points are coincident.

Security Zone

The safepoint mechanism guarantees that a program can enter the GC's safepoint for a long time while it is running. However, if the program does not allocate CPU time, such as in the sleep state or blocked state, the thread cannot respond to the JVM's interrupt request. In this case, the security zone (safe region ) can only be used to resolve.

A security zone is a code fragment in which the reference relationship does not change. It is safe to start anywhere in the region. When the thread executes the code in safe region, it marks itself into safe region, so that the JVM skips those threads when it initiates the GC. When the thread is leaving the safe region, it checks to see if the system has completed the enumeration (or GC process) and continues execution if it finishes, otherwise waits.



In-depth understanding of Java virtual Machines (iv) garbage collection algorithms and hotspot implementations

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.