Understanding the Android Java garbage collection mechanism

Last Update:2016-09-17 Source: Internet

Author: User

Tags compact

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

JVM (Java Virtual machine) memory model

Starting with the JVM memory model can be a great help in understanding the GC, but it only takes a little more than a few words to confuse the line of sight.

The JVM (Java Virtual machine) primarily manages two types of memory: heap and non-heap.
A heap is a run-time data region where all class instances and arrays of memory are allocated.
A non-heap is a JVM left to itself, containing the required memory for the method area, internal processing or optimization of the JVM (such as JIT compiler,just-in-time Compiler, instant compiled code caching), each class structure (such as running a constant pool, field and method data), and code for methods and construction methods.

In short, the Java Program memory mainly (here emphasizes the main two words) is divided into two parts, heap and non-heap. Everyone's general new objects and arrays are in the heap, and the main memory that GC recycles is the heap memory.

With a summary:

Heap Memory: Storing Java objects
Non-heap (non-heap memory): Store class loading information and other Meta-data
Miscellaneous (Other): Store JVM's own code, etc.

Heap memory model

Since the focus is on heap memory, we'll look at the heap's memory model again.

Heap memory is reclaimed by the garbage collector's automatic memory management system.
Heap memory is divided into two parts: the new generation and the old age. The ratio is 1:2.
The old age mainly stores the surviving objects with long life cycles in the application.
The Cenozoic is divided into three parts: a Eden area and two survivor areas, with a ratio of 8:1:1.
The Eden area stores new objects.
Survivor stores the objects that survive each garbage collection.

Look dizzy, follow these questions:

Why divide the new generation and the old age?

Why is the new generation divided into a Eden area and two survivor districts?

What is the ratio of a Eden area and two survivor districts to 8:1:1?

It is not yet possible to explain why, but these problems are determined by the algorithm used by the garbage collection mechanism.
So the question turns into, what algorithm is it? Why do you want to use this kind of algorithm?

Determination of recyclable objects

Before we talk about algorithms, we have to figure out a problem, what kind of object is garbage (useless object), need to be recycled?
There are currently two algorithms on the market to determine whether an object is garbage.

1. Reference counting algorithm

Add a reference counter to the object, and whenever there is a reference to it, the counter value is incremented by 1, and when the reference fails, the counter value is reduced by 1, and any object with counter 0 at any time is impossible to use again.

The advantages are simple and efficient, and now the OBJECTIVE-C is using this algorithm.
The disadvantage is that it is difficult to handle circular references, which cannot be freed than the two objects referenced in each other.
This shortcoming is very deadly, some people may ask, that objective-c not use good?
I personally did not feel that objective-c properly handled the circular reference problem, it is actually to throw this problem to the developer.

2. Accessibility analysis Algorithm (root search algorithm)

In order to solve the circular reference problem above, Java uses a new algorithm: the Accessibility analysis algorithm.
From GC Roots (each specific implementation has a different definition of GC Roots) as a starting point, search down the objects they reference, you can generate a reference tree, the nodes of the tree are considered as achievable objects, and vice versa.

OK, even if the circular reference, as long as not by GC roots reference will still be recycled, perfect!
However, the definition of this GC roots will be refined, and the Java language defines the following GC roots objects:

The object referenced in the virtual machine stack (the local variable table in the frame stack).
The object referenced by the static property in the method area.
The object referenced by the constant in the method area.
The object that is referenced by JNI in the local method stack.

Stop the World

With the above the decision of the garbage object, we also have to consider a problem, please prepare your mind, that is stop the world.
Because of garbage collection, it is necessary to keep the whole reference state intact, otherwise the decision is to determine the garbage, and when I later recycled it was quoted again, this is all messed up. So, when the GC, all other program execution is paused and stuck.
Fortunately, this lag is very short (especially in the new generation) and has little effect on the program (for other GCs, such as concurrent GC, not discussed here).
So the issue of the GC's lag is understandable, and is unavoidable.

Several garbage collection algorithms

With the above two large bases, our GC can begin.
So the problem is, already know which is the garbage object, how to recycle it? Currently there are several algorithms in the mainstream.
PS: You can guess the Java Virtual machine (the default refers to the hotspot) is the kind of algorithm, ..., correct, is the generational recovery algorithm, now is not understand the front heap memory why to divide the new generation and the old age. But even if the guessed right, but also to see a number of other algorithms oh, or do not say I did not remind you, you will directly see not understand the generational recovery algorithm.

1. Tag cleanup algorithm (mark-sweep)

The tag-purge algorithm is divided into two stages: the tagging phase and the purge phase. The task of the tagging phase is to mark out all objects that need to be recycled, and the purge phase is to reclaim the space occupied by the tagged objects.
The advantages are simple and easy to implement.
The disadvantage is that it is prone to memory fragmentation, and too many fragments can cause the subsequent process to allocate space for large objects without finding enough space to trigger a new garbage collection action ahead of time.
The following (without my explanation):

2. Copy Algorithm (Copying)

The replication algorithm divides the available memory by capacity into two blocks of equal size, using only one piece at a time. When this piece of memory is used up, copy the surviving object to another piece, and then clean up the used memory space once, so the memory fragmentation problem is not easy.
The pros and cons are that it's simple, efficient, and not prone to memory fragmentation, but it's expensive to use in memory space, because the memory you can use is reduced to half the original.
We can see from the algorithm principle that the efficiency of the copying algorithm is very much related to the number of surviving objects, and if there are many surviving objects, the efficiency of the copying algorithm will be greatly reduced.
The following (without my explanation):

3. Marker grooming Algorithm (mark-compact)

The algorithm marks the same stage as Mark-sweep, but after the token is completed, it does not clean the recyclable object directly, but instead moves the surviving object to one end and then cleans up memory outside the end boundary.
Therefore, particularly suitable for the survival of many objects, the recovery of the case of fewer objects.
The following (without my explanation):

4. Generational recovery algorithm

In fact, the generational recovery algorithm is not a new algorithm, but is based on the characteristics of the replication algorithm and the labeling algorithm. This synthesis takes into account the language characteristics of Java.
Here are the scenarios for the two old algorithms:

Replication algorithm: Applies to very few surviving objects. Multiple Objects Recycled
Tag grooming algorithm: Suitable for many survival objects, less objects recovered

Just complementary! The different types of object lifecycles determine which algorithm is more appropriate to use.
As a result, we divide the memory into several different regions based on the life cycle of the object's survival. In general, the heap zoning is divided into the old Generation and the new Generation (young Generation), the characteristics of the old age is that each garbage collection only a small number of objects need to be recycled, and the new generation is characterized by a large number of objects to be recycled each time the garbage collected, Then we can take the most suitable collection algorithm according to the characteristics of different generations.
This is the generational recovery algorithm.
Now look back to see the heap of memory why to divide the new generation and the old age, is not that clear and natural?

Let's talk a little bit more:

For the Cenozoic to take copying algorithm, because each garbage collection in the Cenozoic to recover most of the objects, that is, the number of operations need to replicate less, the use of copying algorithm is the most efficient. But, however, the actual is not according to the above algorithm in the proportion of 1:1 to divide the new generation of space, but the Cenozoic divided into a larger Eden space and two smaller survivor space, the proportion of 8:1:1. Why? The next section in-depth analysis.

Since the old age is characterized by the collection of only a small number of objects per collection, the general use of the mark-compact algorithm.

In-depth understanding of generational recovery algorithms

For this algorithm, I believe a lot of people still have doubts, we come to conquer, it is very simple to speak clearly.

Why not a piece of survivor space but two bucks?

Here is a question of the survival cycle of a new generation and an old age, for example, when an object is 15 times (for reference only) in the new generation, it can be moved to the old age. The problem is, when we first GC, we can put the surviving objects in the Eden area into Survivor a space, but the second GC, Survivor a space of the surviving objects also need to use the copying algorithm, put on the Survivor b space, And the just survivor a space and Eden space cleared. In the third GC, the surviving objects of Survivor B space are copied to survivor a space, so repeated.
So, it takes two blocks of survivor space to go back and forth Daoteng.

Why is the Eden space so large and survivor less space?

The newly created objects are placed in the Eden space, which is very frequent, especially when a large number of local variables produce temporary objects, most of which should be immediately recycled, and are often less likely to survive being transferred to survivor space. Therefore, it is reasonable to set the large Eden space and the smaller survivor space, which greatly improves the memory utilization and alleviates the disadvantage of the copying algorithm.
I think 8:1:1 is very good, of course, this ratio can be adjusted, including the upper Cenozoic and the old age of 1:2 of the proportion can also be adjusted.
The new question has come again, what if survivor space is not enough to move from Eden space to survivor space? Go straight to the old age.

Workflow for Eden Space and two survivor spaces

Here is a simple copying algorithm is divided into three parts after a lot of friends can not understand the moment, also really bad description, let me show the Eden Space and two pieces of survivor space work flow.

Now assume a new generation of Eden,survivor A, Survivor b three space and a space of Laosheng generation old.

123456789-ten-19                 the

Assign one object after anotherPut it in the Eden area.No, Eden area is full, only GC (Cenozoic Gc:minor GC).Copy the surviving objects from the Eden area to survivor A and empty the Eden area (originally survivor B will need to be emptied, but it is empty)and assigned one object after another.Put it in the Eden area.Not good, Eden area is full again, only GC (Cenozoic Gc:minor GC)Copy the surviving objects from Eden and survivor area A to Survivor B, then empty the Eden and survivor area A.and assigned one object after another.Put it in the Eden area.//bad, Eden area is full, only GC (new Generation Gc:minor GC) //... //some objects back and forth in survivor A or B area, such as 15 times, was assigned to the old age zone //some objects are too large, more than the Eden area, directly assigned in the old area //some of the surviving objects, not fit the survivor zone, Also assigned to old district //... //in the process of a minor GC suddenly found: //bad, Old age area is full, this is a large GC (old age gc:major GC) old area slowly tidy up, the space is enough  Continue minor Gc//... //...

From this process, I believe that you should have a clear understanding of, of course, in order to illustrate the principle, this is only the most streamlined version.

Type of triggering GC

Understanding these is to solve the actual problem, Java Virtual opportunity to print each time the information triggered GC to help us analyze the problem, so mastering the type of the triggering GC is the basis of the analysis log.

Gc_for_malloc: Represents a GC that is not memory-triggered when allocating objects on a heap.
Gc_concurrent: When our application's heap memory reaches a certain amount, or it can be understood to be nearly full, the system automatically triggers a GC operation to free up memory.
GC_EXPLICIT: Represents the GC that is triggered when an application calls the System.GC, VMRUNTIME.GC interface, or receives a SIGUSR1 signal.
Gc_before_oom: Represents a GC that is triggered by a final effort before preparing to throw an OOM exception.

Summary

Understanding the Java Virtual Machine GC principle, should be for the Dalvik virtual machine and the art virtual machine GC principle is very helpful, as for the three GC has what difference, can only step by step.
Knowing this knowledge, I believe in the optimization of the Android GC will definitely use.

Android Memory Structure:

Pending further study

The Android memory structure is divided into two parts

Javaobject memory, DVM takes up memory
Native memory (allocated with JNI), Linux system memory

The DVM memory in Android is variable, that is, when the app just starts up, it may take up to 8 m of memory, but in a few moments (the middle loads a few pictures), the memory will automatically rise to 9M. If the app continues to request memory while it is running, it will reach the real app memory limit. At the very beginning of Android, the recommended minimum is 16M, so far the default configuration for most mobile phone production is much larger than this value (typically 50m~ hundreds of m).

image memory allocation before 3.0, the picture's binary data is allocated on native, so to manually call the Recycle () method to clear manually, but after (including 3.0) back to the javaobject, so it will be managed by GC.

DVM Memory Management before the 2.3 release, memory reclamation is the Stop JVM (Stop-world) that recycles the entire heap, which can then be parallelized and reclaimed locally (the printed log can be seen in Logcat).

Understanding the Android Java garbage collection mechanism

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More