Memory fragmentation and garbage collection

Last Update:2016-03-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Memory fragmentation and garbage collection
I. Generation of memory fragments

There are two types of memory allocation: static allocation and dynamic allocation. The size and service life of static allocation are determined when the program compiles the link, the application requires that the operating system can apply for and release any size of memory during the process running, which is the dynamic allocation of memory. Dynamic Allocation will inevitably cause memory fragmentation. What is memory fragmentation? Memory fragmentation refers to the "fragmented memory", which describes all the idle memory that is unavailable in a system. The reason why these fragments cannot be used is as follows, this is because the allocation algorithm responsible for the Dynamic Allocation of memory makes these idle memories unusable. This problem occurs because these idle memories appear in different locations in a small and discontinuous manner. Therefore, the major or minor issue depends on the implementation of the memory management algorithm.

Why are these small and discontinuous idle memory fragments generated? In fact, these idle memory fragments exist in two ways: a. Internal fragments B. External fragments.

Generation of internal fragments: because all memory allocations must start with addresses that can be divisible by 4, 8, or 16 (depending on the processor architecture) or because of limitations of the MMU paging mechanism, it is determined that the memory allocation algorithm can only allocate memory blocks of a predetermined size to the customer. Assume that when a customer requests a 43-byte memory block, because there is no suitable size of memory, it may obtain a slightly larger byte, such as 44 bytes and 48 bytes, therefore, the extra space generated by rounding the required size is called internal fragments.

External fragments: frequent allocation and recovery of physical pages can lead to a large number of continuous and small page blocks mixed in the middle of the allocated page, and external fragments will be generated. Assume that there is a continuous idle memory space with a total of 100 units in the range of 0 ~ 99. If you apply for a piece of memory, for example, 10 units, then the applied memory block is 0 ~ 9. At this time, you continue to apply for a piece of memory. For example, if the size of the memory is 5 units, the size of the second block should be 10 to 10 ~ 14. If you release the first memory block and then apply for a memory block larger than 10 units, for example, 20 units. Because the released memory block cannot meet the new request, 20 memory blocks can only be allocated starting from 15. The current status of the entire memory space is 0 ~ 9 idle, 10 ~ 14 occupied, 15 ~ 24 occupied, 25 ~ 99 idle. 0 ~ 9 is a memory shard. If 10 ~ 14. If the occupied space is more than 10 units, then 0 ~ 9.

Ii. Significance of garbage collectionIn C ++, objects are occupied until the program ends running and cannot be allocated to other objects before explicit release. in Java, when no object references the memory originally allocated to an object, the memory becomes garbage. A jvm system-level thread Automatically releases the memory block. Garbage collection means that the object no longer needed by the program is "useless information", which will be discarded. When an object is no longer referenced, the memory recycles the occupied space so that the space is used by new objects. In fact, in addition to releasing useless objects, garbage collection can also clear memory record fragments. Because the created object and the garbage collector release the memory space occupied by the discarded object, memory fragments may occur. Fragments are idle memory holes between memory blocks allocated to objects. Fragment moves the occupied heap memory to the end of the heap, And the JVM allocates the organized memory to the new object.
Garbage collection can automatically release memory space and reduce programming burden. This gives Java virtual machines some advantages. First, it can improve programming efficiency. When there is no garbage collection mechanism, it may take a lot of time to solve an obscure storage problem. When programming in Java, garbage collection can greatly shorten the time. Second, it protects program integrity. Garbage collection is an important part of Java's security policy.
One potential drawback of garbage collection is that its overhead affects program performance. The Java virtual machine must track useful objects in the running program and finally release useless objects. This process takes processing time. Secondly, due to the incompleteness of the garbage collection algorithm, some garbage collection algorithms used earlier cannot guarantee that 100% of the garbage collection algorithms collect all the discarded memory. Of course, with the continuous improvement of the garbage collection algorithm and the continuous improvement of the operating efficiency of software and hardware, these problems can be solved.
2. Garbage collection algorithm analysis
The Java language specification does not clearly indicate which garbage collection algorithm is used by JVM, but any garbage collection algorithm generally requires two basic tasks: (1) discovering useless information objects; (2) reclaim the memory space occupied by useless objects so that the space can be used by the program again.
Most garbage collection algorithms use the root set concept; the so-called root set is the set of referenced variables that can be accessed by running Java programs (including local variables, parameters, and class variables ), the program can use reference variables to access object attributes and call object methods. Garbage collection first needs to determine which are reachable and which are inaccessible from the root, and all objects reachable from the root set are active objects, which cannot be recycled as garbage, this also includes objects indirectly accessible from the root set. Objects that cannot be reached through any path in the root SET meet the garbage collection conditions and should be recycled. The following describes several common algorithms.
　　 2.1. Reference Counting Collector)
The reference counting method is the only method that does not use the root set for garbage collection. This algorithm uses the reference counter to distinguish between a surviving object and an object that is no longer in use. Generally, each object in the heap corresponds to a reference counter. When an object is created and assigned to a variable, the reference counter is set to 1. When an object is assigned to any variable, the reference counter is added with 1 each time. When the object is out of scope (this object is not used anymore), the reference counter is reduced by 1. Once the reference counter is 0, the object meets the garbage collection conditions.
The reference counter-based Garbage Collector runs fast and does not interrupt program execution for a long time. It is suitable for programs that must run in real time. However, the reference counter increases the overhead of program execution because every time an object is assigned to a new variable, the counter is added with 1. Every time an existing object has a scope, the counter is reduced by 1.
　　 2.2. tracing Algorithm (Tracing Collector)
The tracing algorithm is proposed to solve the problem of reference counting. It uses the concept of root set. The Garbage Collector Based on the tracing algorithm scans the root set to identify which objects are reachable and which objects are not reachable, and marks the reachable objects in some way, for example, you can set one or more places for each reachable object. In the scanning and identification process, the garbage collection based on the tracing algorithm is also called the mark-and-sweep garbage collector.
　　 2.3. compacting algorithm (Compacting Collector)
To solve the heap fragmentation problem, tracing-based garbage collection absorbs the Compacting algorithm IDEA. In the process of clearing, the algorithm moves all objects to the end of the heap, the other end of the heap becomes an adjacent idle memory zone. The Collector updates all references of all objects it moves, so that these references can recognize the original objects at the new location. In the implementation of collectors Based on the Compacting algorithm, the handle and handle tables are generally added.
　　 2.4. copying algorithm (Coping Collector)
This algorithm is proposed to overcome the handle overhead and solve the garbage collection of heap fragments. At the beginning, it divides the heap into an object zone and multiple free zones. The program allocates space for the object from the object zone. When the object is full, garbage Collection Based on the coping algorithm scans active objects from the root set and copies each active object to the idle zone (there is no idle interval between the memory occupied by the active object ), in this way, the idle area becomes the object area, and the original object area becomes the idle area. The program allocates memory in the new object area.
A typical garbage collection Algorithm Based on the coping algorithm is the stop-and-copy algorithm, which divides the heap into the object area and the idle area. During the switching process between the object area and the idle area, the program is suspended.
　　 2.5. generation Algorithm (Generational Collector)
One defect of the stop-and-copy garbage collector is that the collector must copy all the active objects, which increases the program wait time, which is the cause of the inefficiency of the coping algorithm. In program design, there is a rule that most objects have a short time and a few objects have a long time. Therefore, the generation algorithm divides the heap into two or more sub-heaps as the generation (generation) of objects ). Because most objects have a short time, as the program discards unused objects, the garbage collector collects these objects from the youngest child heap. After the generational garbage collector is run, the objects that survived the last run are moved to the subheap of the next highest generation. Because the subheap of the old generation is not often recycled, this saves time.
　　 2.6. adaptive Algorithm (Adaptive Collector)
In specific cases, some garbage collection algorithms are better than other algorithms. The Garbage Collector Based on the Adaptive algorithm monitors the usage of the current heap and selects the garbage collector of the appropriate algorithm.

3. System. gc () method

Command line parameters view the running of the Garbage Collector
Using System. gc () can request Java garbage collection regardless of the garbage collection algorithm used by JVM. There is a parameter in the command line-verbosegc to view the heap memory used by Java. Its format is as follows:
Java-verbosegc classfile
Let's look at an example:
　　

Class TestGC
{
Public static void main (String [] args)
{
New TestGC ();
System. gc ();
System. runFinalization ();
}
}

In this example, a new object is created. Because it is not used, the object quickly becomes inaccessible. After the program is compiled, run the following command: the result of java-verbosegc TestGC is:
[Full GC 168 K-> 97 K (1984 K), 0.0253873 secs]
The environment of the machine is Windows 2000 + JDK1.3.1. The data before and after the arrow is K and 97K respectively indicate the memory capacity used by all the surviving objects before and after garbage collection, this indicates that the object capacity of 168 K-97 K = 71K is recycled. The data in the brackets, K, is the total heap memory capacity, the collection takes 0.0253873 seconds (this time varies with each execution ).
It should be noted that calling System. gc () is only a request (recommended ). After JVM accepts the message, it does not immediately perform garbage collection, but only weighted several garbage collection algorithms to make garbage collection easy or happen early, or there are many recycles.
4. finalize () method
Before the JVM Garbage Collector collects an object, it is generally required that the program call an appropriate method to release the resource, but without explicitly releasing the resource, java provides a default mechanism to terminate this object and release resources. This method is finalize (). Its prototype is:
Protected void finalize () throws Throwable
After the finalize () method returns, the object disappears and the garbage collection starts to be executed. Throws Throwable in the prototype indicates that it can throw any type of exception.
The reason for using finalize () is that the Garbage Collector cannot process it. Assume that your object (not using the new method) obtains a "special" memory area, because the Garbage Collector only knows the memory space allocated by the new display, therefore, it does not know how to release this "special" memory area. In this case, java Allows defining a finalize () method in the class.
Special regions such as: 1) because the C language approach may be adopted when allocating memory, rather than the common new approach of JAVA. This situation occurs mainly in native method. For example, native method calls the C/C ++ method malloc () function series to allocate storage space, but unless the free () function is called, otherwise, the memory space will not be released, which may cause memory leakage. However, because the free () method is a function in C/C ++, you can use a local method in finalize () to call it. To release these "special" Memory Spaces. 2) or open file resources, these resources are not within the recycle range of the garbage collector.
In other words, finalize () is mainly used to release memory space opened up by some other practices and to clean up the memory. Because JAVA does not provide enough functions like "destructor" or similar concepts, you must create a common method to execute cleanup when you want to perform similar cleanup tasks, that is, the finalize () method in the override Object class. For example, assume that an object will be drawn to the screen during creation. If it is not explicitly wiped out from the screen, it may never be cleared. If you add an erasure function to finalize (), the finalize () is called when the GC is working, and the image is erased. If GC does not occur, the image will
Is saved.
Once the garbage collector is ready to release the storage space occupied by objects, it will first call the finalize () method for some necessary cleaning work. The memory space occupied by this object will be truly released only when the next garbage collection is performed.
To clear an object, the user of the object must call a clearing method at the location where the object is to be cleared. This is slightly in conflict with the concept of C ++ "destructor. In C ++, all objects are destroyed (cleared ). Or in other words, all objects "should" be damaged. If you create a C ++ object as a local object, such as creating a C ++ object in the stack (it is impossible in Java, and Java is all in the heap ), the clearing or destruction work will be performed at the end of the scope of the created object represented by "ending curly braces. If the object is created with new (similar to Java), when the programmer calls the C ++ delete command (Java does not have this command), the corresponding destructor will be called. If the programmer forgets this, The Destructor will never be called. What we get is a memory "Vulnerability", and other parts of the object will never be cleared.
On the contrary, Java does not allow us to create local (local) objects-new is used in any case. But in Java, there is no "delete" command to release the object, because the Garbage Collector will help us automatically release the storage space. Therefore, we can say that Java has no destructor because of the garbage collection mechanism. However, with the further study in the future, we will know that the existence of the garbage collector cannot completely eliminate the need for destructor, or you cannot eliminate the need for the mechanism represented by the Destructor (for the reason, see the next section. In addition, the finalize () function is called when the garbage collector is preparing to release the storage space occupied by objects. It is absolutely not allowed to directly call finalize (), so try to avoid using it ). To clear a bucket in some other way, you must still call a method in Java. It is equivalent to the destructor of C ++, but it is convenient without the latter.
All objects in C ++ will be destroyed by using delete (), and objects in JAVA will not be recycled by the garbage collector. In another word, 1 object may not be garbage collected, 2 garbage collection is not equal to "destructor", 3 garbage collection is only related to memory. That is to say, if an object is no longer used, do you want to release other objects contained in this object in finalize? No. No matter how the object is created, the garbage collector is responsible for releasing the memory occupied by those objects.
5. Conditions for triggering the master GC (Garbage Collector)
The frequency of jvm gc is very high, but this GC takes a very short time, so it has little impact on the system. It is worth noting that the trigger condition of the main GC is obvious to the system. In general, there are two conditions that will trigger the main GC:
1) when the application is idle, that is, when no application thread is running, GC will be called. GC is performed in the thread with the lowest priority, so when the application is busy, the GC thread will not be called, except for the following conditions.
2) When the Java heap memory is insufficient, GC will be called. When the application thread is running and a new object is created during running, if the memory space is insufficient, JVM will forcibly call the GC thread to recycle the memory for new allocation. If the GC does not meet the memory allocation requirements after one time, the JVM will perform two further GC attempts. If the GC fails to meet the requirements, the JVM reports an error of "out of memory" and the Java application stops.
The JVM determines whether to perform the primary GC based on the system environment, and the system environment is constantly changing. Therefore, the operation of the primary GC is uncertain and cannot be predicted when it will inevitably emerge, however, it can be determined that the main GC is repeated for a long-running application.
6. Measures to Reduce GC overhead
According to the above GC mechanism, program running will directly affect the changes in the system environment, thus affecting GC triggering. If the GC features are not designed and encoded, a series of negative effects such as memory resident will occur. To avoid these impacts, the basic principle is to minimize garbage and reduce GC overhead. Specific measures include the following:
　　(1) do not explicitly call System. gc ()
The JVM is recommended to perform primary GC for this function. Although it is only recommended, it will trigger the primary GC in many cases to increase the frequency of primary GC, this increases the number of intermittent pauses.
　　(2) minimize the use of temporary objects
After a function call is called, the temporary object will become garbage. Using less temporary variables is equivalent to reducing the generation of garbage, thus increasing the time for the second trigger condition, reduces the chance of main GC.
　　(3) it is best to explicitly set the object to Null when it is not used.
Generally, Null objects are treated as garbage. Therefore, explicitly setting unused objects to Null is helpful for GC collectors to identify garbage and improve GC efficiency.
　　(4) Try to use StringBuffer instead of String to accumulate strings.
Because String is a fixed-length String object, when adding a String object, it is not expanded in a String object, but a new String object is created, for example, if Str5 = Str1 + Str2 + Str3 + Str4, multiple spam objects are generated during the execution of this statement, because a new String object must be created during the "+" operation, however, these transition objects have no practical significance for the system and only increase more garbage. To avoid this situation, you can use StringBuffer to accumulate strings. Because StringBuffer is variable-length, it is expanded based on the original, without generating intermediate objects.
　　(5) You can use basic types such as Int and Long to remove Integer and Long objects.
The memory resources occupied by the basic type variables are much less than those occupied by the corresponding objects. If not necessary, it is best to use the basic variables.
　　(6) Use as few static object variables as possible
Static variables are global variables and will not be recycled by GC. They will continue to occupy the memory.
　　(7) time when the dispersed object was created or deleted
When a large number of new objects are created in a short period of time, especially large objects, a large amount of memory is suddenly required. In this case, the JVM can only perform primary GC, to recycle memory or integrate memory fragments to increase the frequency of the primary GC. The same applies to deleting objects in a centralized manner. It causes a large number of spam objects to suddenly appear, and the free space is inevitably reduced, which greatly increases the chance to force the main GC when the next object is created.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Memory fragmentation and garbage collection

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support