JVM memory management, garbage collection, and jvm Memory Management
I. JVM Memory Structure
The Java Virtual Machine divides the memory into several different management zones. These regions have their own purposes. Based on their characteristics, they undertake different tasks and use different algorithms for garbage collection. It consists of the following parts:
Program Counter Register, JVM Virtual Machine stack (JVM Stacks), Native Method Stacks, Heap, and Method Area)
For example:
1. Program Counter (Program Counter Register)
This is a relatively small piece of memory, not in Ram, but directly divided on the CPU, the programmer can not directly operate on it, its role is: JVM in interpreting bytecode files (. class), stores the row number of the bytecode executed by the current thread. It is only a conceptual model. Different JVM methods are used. When the bytecode interpreter works, it is to select the next instruction to be executed by changing the value of the program counter. Basic functions such as branch, loop, jump, and so on depend on this technical area. Another scenario is the Java multithreading. multithreading is achieved by switching threads in turn. At the same time, one core can only execute one command. Therefore, each program must have a counter to record the program's execution progress. In this way, when the thread resumes execution, it can start from the correct place, each thread must have an independent program counter, which is the private memory of the thread. If a thread is executing a Java method, the counter records the address of the bytecode instruction. If a Native method is executed, the counter record is empty, this memory zone is the only region with no OutOfMemoryError in the Java specification.
2. JVM Virtual Machine stack (JVM Stacks)
The JVM Virtual Machine stack is what we often call the stack (we usually roughly divide the memory into stacks and stacks). Like program counters, it is also thread-proprietary, with the same lifecycle as threads, when each method is executed,Stack frameStores information about local variable tables, dynamic links, operands, and method exits. The execution process of the method is the process of stack frames going out and into the stack in JVM. The local variable table stores various basic data types, such as boolean, byte, char, and reference types (the memory address pointing to each object is stored). Therefore, it has one feature: the memory space can be determined during compilation, and the runtime does not change. There are two possible Java exceptions in this memory area: StackOverFlowError and OutOfMemoryError.
3. Native Method Stacks)
From the name, we can see that the local method stack is used to process local methods in Java. There are many Native methods in the Java class's ancestor class objects, such as hashCode () and wait, most of their execution relies on the operating system, but the JVM needs to standardize them to handle their execution process. There can be different implementation methods in this region. For example, the common JVM of Sun is that the local method stack and the JVM Virtual Machine Stack are the same.
4. Heap)
Heap memory is the most important and necessary part of the memory. Because Java performance optimization mainly targets this part of memory. All object instances and arrays are allocated on the heap. (with the gradual maturity of JIT technology, this sentence seems absolutely absolute, but at least it is basically the same currently ), you can use-Xmx and-Xms to control the heap size. The development of JIT technology has produced new technologies, such as stack allocation and scalar replacement. In the near future, real-time compilation may be born and mature. At that time, "All object instances and arrays are allocated on the heap" should be slightly modified. Heap memory is the main area of garbage collection, so the garbage collection section below will focus on the introduction. Here, I will only explain the concept. The maximum size of a 32-bit system is 2 GB, and that of a 64-bit system is unlimited. You can use-Xms and-Xmx to control the minimum Heap memory applied during JVM startup.-Xmx is the maximum Heap memory that can be applied by JVM.
5. Method Area)
The method area is the memory area shared by all threads. It is used to store data such as class information, constants, and static variables loaded by JVM. Generally, the method area belongs to the permanent generation (about the permanent generation, it will be detailed in the GC section, in addition to the permanent generation, there are new generation and old generation). It is no wonder that the Java specification describes the method area as a logical part of the heap, but it is not a heap. Garbage collection in the method area is tricky, and Sun's HotSpot VM is not doing very well in this regard. An important concept in the method area is introduced here: the runtime constant pool. It is mainly used for storing the literal volume generated during the compilation process (the literal volume is simply understood as a constant) and reference. In general, the memory allocation of constants can be determined during compilation, but not necessarily all. Some may be that constants can be placed in the constant pool during runtime, for example, the String class has an Native METHOD intern ().
Add a memory area outside JVM memory management: Direct Memory. In JDK1.4, we added the NIO class and introduced an I/O Method Based on channel and buffer. It can use the Native function library to directly allocate off-heap memory, that is, the direct memory we call, in this way, the program performance will be improved in some scenarios.
Ii. Garbage Collection
There is a wall surrounded by memory allocation and garbage collection technologies between Java and C ++. The Hacker wants to go in and the man in the wall wants to go out! Let the readers think about the meaning of this sentence. In general, C and C ++ Programmers sometimes suffer from memory leaks. Memory Management is a headache, but Java programmers envy C ++ programmers and can control everything on their own, in this way, we will not be helpless in terms of memory management. Indeed, as a Java programmer, it is difficult for us to control JVM memory reclaim. We can only adapt to it based on its principles and try to improve program performance. The following describes Java Garbage Collection, namely, Garbage Collection and GC. Perform the following operations:
1. Why garbage collection?
As the program runs, more and more information such as instance objects and variables in the memory occupy the memory. If garbage collection is not performed in time, the program performance will inevitably decline, it may even cause unnecessary system exceptions due to insufficient available memory.
2. What "garbage" needs to be recycled?
Three of the five regions we introduced above do not need garbage collection: Program counters, JVM stacks, and local method stacks. Because their life cycle is synchronized with the thread, with the destruction of the thread, the memory they occupy will be automatically released, so only the method zone and heap need to be GC. For specific objects, a simple summary is as follows: if an object does not have any references, it can be recycled. To put it simply, if an object has no effect, it can be recycled as a waste.
3. When will garbage collection be performed?
According to a classic reference counting algorithm, each object adds a reference counter. Once referenced, the counter increases by 1, and the reference is lost. The counter is reduced by 1, when the counter is kept as 0 for a period of time, this object is considered to be recoverable. However, this algorithm has obvious drawbacks: When two objects reference each other but the two do not work, garbage collection should be performed on them according to the general rules, but they are referenced to each other, because it does not meet the garbage collection condition, it cannot be used to clean up the memory perfectly. Therefore, Sun's JVM does not use the reference counting algorithm for garbage collection. Instead, a root search algorithm is used, such:
The basic idea is to start from an object called GC Roots and search down. If an object cannot reach the GC Roots object, it means it is no longerReferenceTo be decommissioned. (For the moment, there are some differences in the fact that when an object is no longer referenced, it is not completely "dead". If the class overwrites finalize () method, and has not been called by the system, the system will call a finalize () method to complete the final work. During this period, if you can re-associate an object with any object referenced by GC Roots, the object can be "Reborn". If not, the object can be completely recycled ), for example, Object5, Object6, and Object7 may still be referenced by each other, but in general, they have no effect. This solves the problem that the reference counting algorithm cannot solve.
Supplementary concepts:After JDK 1.2, the references are extended and four references, strong, soft, ruo, and virtual, are introduced and marked as the four referenced objects, it has different meanings in GC:
A> Strong Reference is a Reference added to a new object. It features that it will never be recycled.
B> Soft Reference ). classes declared as soft references are recyclable objects. If the JVM memory is not tight, such objects can not be recycled. If the memory is tight, they will be recycled. Here is a question: Since soft reference objects can be recycled, why not recycle them? In fact, we know that there is a cache mechanism in Java. For literal caching, sometimes the cached object is dispensable, but if it is needed to stay in the memory, therefore, these objects can be referenced as soft references to facilitate use and improve program performance.
C> Weak Reference ). objects with weak references must be garbage collected. no matter whether the memory is insufficient or not, objects marked as weak references will be cleared and recycled during GC.
D> Phantom Reference. The weak Virtual Reference is negligible, and the JVM does not care about the virtual Reference at all. Its only role is to make some tracking records to assist in the use of finalize functions.
Finally, what types of classes need to be recycled? What are useless classes? The following requirements must be met:
1> All instance objects of this class have been recycled.
2> ClassLoader for loading this class has been recycled.
3> the reflection Class java. lang. Class Object corresponding to this Class is not referenced anywhere.
4. How to recycle garbage?
This section focuses on the garbage collection algorithm, because we have introduced that the memory is mainly divided into three parts: New Generation, old generation, and persistent generation. Different three generations have different GC algorithms. The new generation is suitable for objects with short lifecycles and frequent creation and destruction. The old generation is suitable for objects with relatively long lifecycles, persistent substitution in Sun HotSpot refers to the method area (some JVMs do not have the permanent substitution Statement ). First, we will introduce the concepts and features of the next generation, old generation, and permanent generation:
New Generation: New Generation or Young Generation. The preceding sections are roughly divided into Eden and Survivor. The same vor area is divided into two parts: FromSpace and ToSpace. The newly created objects are allocated memory by the new generation. When the Eden space is insufficient, the surviving objects are transferred to the same vor. The size of the new generation can be controlled by-Xmn, or-XX: limit vorratio to control the ratio of Eden to limit vor.
Old Generation: Old Generation. It is used to store objects that are still alive after many garbage collection times in the new generation, such as cache objects. The occupied size of the old generation is-Xmx value minus the value corresponding to-Xmn.
Permanent Generation: Permanent Generation. In Sun's JVM, this is the meaning of the method area, although most of some JVMs do not have this generation. The minimum and maximum values of constants and classes are 16 MB by default, and the maximum value is 64 MB. You can set the minimum and maximum values through-XX: PermSize and-XX: MaxPermSize.
Common GC algorithms:
Mark-clearing Algorithm)
The most basic GC algorithm is to mark the objects to be recycled, then scan and recycle the objects with tags. In this way, two steps are taken: tag and clear. This algorithm is not efficient, and memory fragments are generated after cleaning. In this way, if a large object needs continuous memory space, fragments are also required. Therefore, this algorithm needs to be improved.
Copying)
As we have discussed earlier, the new generation memory is divided into three parts: Eden zone and two dedicated vor zones. Generally, Sun's JVM will change the ratio of Eden zone to vor zone, ensure that there is a vor zone that is idle. In this way, when garbage collection is performed, the objects that do not need to be recycled are placed in the idle vor zone, then, the Eden zone and the first vor zone are completely cleaned up. The problem is that what if the space of the second vor zone is not large enough? In this case, you need to temporarily use the persistent generation memory when the VOR zone is insufficient.This algorithm is applicable to the new generation..
Mark-Compact)
Just like the first half of the Mark-clear algorithm, after marking objects that do not need to be recycled, move the marked objects together to make the memory continuous, you only need to clear the memory that is not bound to the mark.This algorithm applies to persistent generation.
Common garbage collectors:
According to the algorithms mentioned above, the JVM has different implementations every day. Let's take a look at some common garbage collectors:
First, we introduce three types of garbage collectors: SerialGC, Parallel Scavenge and ParNew ).
1. Serial GC. Is the most basic and oldest collector, but it is still widely used. It is a single-thread garbage collection mechanism, its biggest feature is to Stop The World of all The threads being executed during garbage collection. This is unacceptable for some applications, but we can think like this, as long as we can control the pause time within N milliseconds, most applications are acceptable, and the fact is that it does not disappoint us, we can use dozens of millimeters of pauses as clients. This collector is suitable for applications with a single CPU, a small New Generation of space, and not very demanding pause time, is the default GC method at the client level, which can be forcibly specified by-XX: + UseSerialGC.
2. ParNew GC. Basically the same as Serial GC, but the essential difference is that the multi-thread mechanism is added to improve efficiency, so that it can be used on the Server side (Server), and it can be used with cms gc, therefore, there is more reason to place it on the Server side.
3. Parallel Scavenge GC. The entire scanning and replication process adopts the multi-thread method. This method is applicable to applications with multiple CPUs and short pause time requirements. It is the default GC method at the server level.-XX is available: + UseParallelGC is used to forcibly specify the number of threads.-XX: ParallelGCThreads = 4 is used to specify the number of threads. The following is a combination of several groups:
4. CMS (Concurrent Mark Sweep) Collector. The purpose of this collector is to solve the pause problem of Serial GC to achieve the shortest recovery time. Common B/S architecture applications are suitable for this type of collector because of its high concurrency and high response. The CMS collector is implemented based on the "mark-clear" algorithm. The entire collection process is roughly divided into four steps:
CMS initial mark, CMS concurrent mark, CMS remark, and CMS concurrent sweep ).
The initial and re-marking steps either need to pause other user threads. The initial tag only indicates the objects that can be directly associated with gc roots. The initial tag is fast. The concurrent tag phase is the algorithm phase for gc roots search and determines whether the object is alive. The re-marking stage aims to modify the tag records of the part of objects whose tags are changed as the user program continues to run during the concurrent marking period, the pause time of this phase will be slightly longer than the initial marking stage, but shorter than the concurrent marking stage. The collector thread can work with the user thread in the Process of the longest time-consuming concurrent mark and concurrent clearing, the memory Recycle Process of the CMS collector is executed concurrently with the user thread.
Advantages of cms collector:Concurrent collection, low pause, but CMS is far from perfect.
CMS collectors have three major disadvantages::
A>. CMS collectors are very sensitive to CPU resources. In the concurrency phase, although it does not cause user threads to pause, it will occupy CPU resources and cause the reference program to slow down, reducing the total throughput. The number of recycle threads started by CMS by default is: (number of CPUs + 3)/4.
B>The. CMS collector cannot handle floating garbage, and "Concurrent Mode Failure" may occur. If the Failure occurs, another Full GC occurs. Because the user thread is still running in the CMS concurrent cleanup phase, new garbage will be generated continuously as the program runs automatically, and this part of garbage will appear after the marking process, CMS could not process them in this collection, so it had to be cleared at the next GC. This part of garbage is called floating garbage ". This is also because the user thread needs to run in the garbage collection stage, that is, it needs to reserve enough memory space for the user thread to use, therefore, the CMS collector cannot be collected after it is almost completely filled up in the old age like other collectors. A portion of the memory space needs to be reserved for program operation and Usage During Concurrent collection. By default, the CMS collector is activated when 68% of the space is used in the old age. You can also use the-XX: CMSInitiatingOccupancyFraction parameter to provide the trigger percentage, to reduce the number of memory recycles and improve performance. If the memory reserved during CMS running cannot meet the needs of other threads of the program, "Concurrent Mode Failure" will fail. At this time, the VM will start the backup plan: temporarily enable the Serial Old collector to re-collect junk data in the Old age, so that the pause takes a long time. Therefore, if the parameter-XX: CMSInitiatingOccupancyFraction is set too high, it will easily lead to "Concurrent Mode Failure" Failure, but the performance will decrease.
C>. The last disadvantage is that CMS is a collector implemented based on the "tag-clear" algorithm. After it is collected using the "tag-clear" algorithm, a large number of fragments are generated. When there are too many space fragments, it will cause a lot of trouble for object allocation. For example, if a large object cannot find a continuous space for allocation, the Full GC has to be triggered in advance. To solve this problem, the CMS collector provides a-XX: UseCMSCompactAtFullCollection Switch Parameter to add a fragment process after Full GC. You can also use-XX: the CMSFullGCBeforeCompaction parameter sets the number of Full GC tasks that are not compressed, followed by a fragment process.
5. G1 collector. Compared with CMS collectors, the tag-based sorting algorithm does not cause memory fragmentation issues. Second, You can accurately control the pause, which is not described here.
6. Serial Old. Serial Old is an Old version of the Serial collector. It also uses a single thread to execute collection and uses the "mark-organize" algorithm. It mainly uses virtual machines in Client mode.
7. Parallel Old. Parallel Old is an Old version of the Parallel Scavenge collector. It uses multithreading and the "mark-Arrangement" algorithm.
8. RTSJ Garbage Collector for Java real-time programming. It will be introduced later.
Iii. Java program performance optimization
Gc () call
Calling the gc method implies that Java virtual machines have made some efforts to recycle unused objects so that they can quickly reuse the memory currently occupied by these objects. When the control is returned from the method call, the virtual machine has tried its best to reclaim space from all discarded objects and calls the System. gc () is equivalent to calling Runtime. getRuntime (). gc ().
Call and rewrite of finalize ()
Gc can only clear the memory allocated on the stack (all objects in the java language use the new memory allocated on the stack), but cannot clear the memory allocated on the stack (when JNI technology is used, memory may be allocated on the stack, for example, when java calls a c program and the c program uses malloc to allocate memory ). Therefore, if some objects are allocated to the memory area on the stack, gc will not be able to handle the issue, and the finalize () is required to reclaim the memory of the objects on the stack (). For example, when java calls a non-java method (This method may be c or c ++), it may call c's malloc () in non-java code () function to allocate memory, and the memory will not be released unless the free () function is called (because free () is a c function). At this time, the memory should be released, gc does not work. Therefore, you need to call free () in an inherent method in finalize ().
Excellent programming habits
(1) avoid creating an object in the loop body, even if the object occupies little memory space.
(2) Try to make the object meet the garbage collection standards in a timely manner.
(3) do not use a deep hierarchy of inheritance.
(4) accessing local variables is better than accessing variables in the variables class.
Iv. FAQs
1. memory overflow
That is, the memory allocated to the Java Virtual Machine exceeds what the system can give you. The system cannot meet your needs and thus overflows.
2. Memory leakage
It means that you apply to the system for allocating memory for use (new), but do not return (delete) after use. As a result, you cannot access the memory you applied, the allocated memory of the block cannot be used any more. As the server memory is continuously consumed and the memory cannot be used more and more, the system cannot allocate it to the required program again, leakage occurs. The program will gradually become out of memory and will overflow.
Article Source: http://blog.csdn.net/zhangerqing