First, the JVM memory structure
The Java Virtual opportunity divides the memory into several different management areas, each of which has its own purpose, depending on the characteristics of different tasks and the use of different algorithms in garbage collection. The whole is divided into the following sections:
Program Counter Register, JVM virtual machine stack (JVM Stacks), local method Stack (Native methods Stacks), Heap (heap), method area
Such as:
1. Program counter (Programs Counter Register)
This is a relatively small amount of memory, not on the RAM, but directly divided on the CPU, the programmer can not directly manipulate it, Its role is: when the JVM interprets the bytecode file (. Class), the line number that stores the bytecode that the current thread executes is just a conceptual model, and the bytecode interpreter works by changing the value of the program counter to select the next command to execute, branching, looping, Jump, and other basic functions are dependent on this technical area to complete. There is also a situation, that is, we often speak of Java multithreading, multithreading is through the thread rotation to achieve, at the same time, a kernel can only execute one instruction, so for each program, there must be a counter to record the progress of the program execution, so that when the thread resumes execution, To start from the right place, so each thread must have a separate program counter, which is the thread-private memory. If a thread is executing a Java method, the counter records the address of the instruction for the bytecode, and if a native method is executed, the record of the counter is empty, and this memory area is the only area in the Java specification that does not have any outofmemoryerror conditions.
2. JVM Virtual machine stack (JVM Stacks)
The JVM virtual machine stack is what we often call stack stacks (we often divide memory roughly into heaps and stacks), like program counters, thread-private, life cycles and threads, and each method is executed with a stack frame for storing local variable tables, dynamic links, operands, Method exports and other information. The execution of the method is the process of stack frame stack and stack in the JVM. The Local variables table holds a variety of basic data types, such as Boolean, Byte, Char, 8, and reference types (which hold the memory addresses pointing to individual objects), so it has a feature: memory space can be determined during compilation, and the runtime is not changed. There are two possible Java exceptions to this memory area: Stackoverflowerror and OutOfMemoryError.
3. Local method Stack (Native methods Stacks)
As you can see from the name, the local method stack is used to handle native methods in Java, and there are many native methods in the Java class ancestor class object, such as Hashcode (), wait (), and so on, their execution is often aided by the operating system, But the JVM needs to do something with them to deal with their execution. This area can have different implementations, like our usual sun JVM is the same as the local method stack and the JVM virtual machine stack.
4. Heaps (heap)
Heap memory is the most important piece of memory and part of the most necessary drill-down. Because Java performance optimization, mainly for this part of the memory. All object instances and arrays are allocated on top of the heap (as the JIT technique matures, this sentence seems to be absolute, but at least for now), the size of the heap can be controlled by-xmx and-XMS. The development of JIT technology has generated new technology, such as stack allocation and scalar substitution, perhaps in the near future, the real-time compilation will be born and mature, then, "all object instances and arrays are allocated on the heap" This sentence should be slightly changed. Heap memory is a major area of garbage collection, so the garbage collection section below will focus on the conceptual aspects of this. On a 32-bit system, the maximum is unlimited on the 2g,64 bit system. With-xms and-xmx control, the-XMS is the minimum heap memory requested when the JVM starts,-xmx the maximum heap memory that the JVM can request.
5. Method area
A method area is a region of memory shared by all threads, used to store data such as class information, constants, static variables, and so on, which have been loaded by the JVM, in general, the method area is a durable generation (about the persistence generation, which is described in detail in the GC section, in addition to the persistent generation and the generation and generation), No wonder the Java specification describes the method area as a logical part of the heap, but it is not a heap. Garbage collection in the method area is tricky, and even Sun's HotSpot VMs are not doing so perfectly. An important concept in the method area is introduced here: Run a constant-rate pool. It is primarily used to store the literal that is generated during compilation (literal simple comprehension is constant) and reference. In general, the memory allocations for constants can be determined during compilation, but not necessarily all, and some may be that the runtime can also put constants into a constant pool, such as a native method intern () in the String class.
This complements a memory area outside the JVM's memory management: direct memory. The new class NiO class in JDK1.4 introduces a channel-to-buffer-based I/O approach that can use the native library to directly allocate out-of-heap memory, which is what we call direct memory, which improves program performance in some scenarios.
Second, garbage collection
There is a good saying: There is a memory allocation and garbage collection between Java and C + + wall, people outside the wall want to go in, the wall people want to go out! Ask the reader to figure it out for yourself. In general, C, C + + programmers sometimes suffer from memory leaks, memory management is a headache, but the Java programmer, but also envy C + + programmers, they can control everything, so that there is no memory management seems helpless, indeed, As a Java programmer, it is very difficult for us to control the memory recycling of the JVM, only according to its principle to adapt, try to improve the performance of the program. Let's start with Java garbage collection, garbage collection,gc. From the following four areas:
1. Why garbage collection?
With the running of the program, the memory of the instance objects, variables and other information Occupy more and more memory, if not timely garbage collection, will inevitably bring the program performance degradation, and even due to the lack of available memory caused some unnecessary system anomalies.
2, which "rubbish" needs to be recycled?
Of the five regions we have described above, three are not garbage collected: program counters, JVM stacks, local method stacks. Because their lifecycles are synchronized with threads, the memory they occupy is automatically freed as the threads are destroyed, so only the method area and heap need to be GC. specific to which objects, a simple summary: If an object already has no references, it can be recycled. The popular explanation is that if an object has no effect, it can be recycled as a waste.
3. When do garbage collection take place?
Based on a classic reference counting algorithm, each object adds a reference counter, each referenced once, the counter is incremented by 1, loses the reference, the counter is minus 1, and when the counter remains at 0 o'clock for a period of time, the object is considered to be recoverable. However, this algorithm has obvious flaws: when two objects are referenced by each other, but they are no longer useful, they should be garbage collected as a rule, but they are referenced by each other and are not eligible for garbage collection, so this memory cleanup cannot be handled perfectly. Therefore, Sun's JVM does not use a reference counting algorithm for garbage collection. Instead, it uses a call: Root search algorithm, such as:
The basic idea is: Start with an object called GC roots, search down, if an object cannot reach the GC roots object, it is no longer referenced , it can be garbage collected (this is understood here for the time being, in fact there are some differences, When an object is no longer referenced, it is not completely "dead", and if the class overrides the Finalize () method and is not called by the system, the system calls a Finalize () method to complete the final work, during which time if the object can be re-associated with any one and GC Roots has a referenced object associated with it, then the object can be "reborn", if not, then the description can be completely recycled), such as Object5, OBJECT6, OBJECT7, although they still can reference each other 3, but in general, they have no effect, This solves the problem that the reference counting algorithm cannot solve.
the concept of supplemental references: After JDK 1.2, the reference is expanded to introduce strong, soft, if, and four references, which are marked as the four referenced objects, each with a different meaning in GC:
A> Strong reference (strong Reference). Is the reference to the object just being new, and it's characteristic that it will never be recycled.
B> Soft Reference (Soft Reference). A class that is declared soft-referenced, is an object that can be reclaimed, and if the JVM's memory is not strained, such objects can not be reclaimed, and if memory is tight, they will be recycled. There is a problem here, since objects that are referenced as soft references can be recycled, why not recycle them? In fact, we know that there is a cache mechanism in Java, take the literal cache, sometimes, the cached object is currently optional, just stay in memory if there is need, you do not need to reallocate memory to use, so these objects can be referred to as soft reference, easy to use, improve program performance.
c> Weak references (Weak Reference). Weakly referenced objects are necessarily garbage collected, regardless of memory tension, when GC, the object marked as weak reference will be cleaned and reclaimed.
D> Virtual Reference (Phantom Reference). The weak reference is negligible, the JVM does not care about the virtual reference at all, its only function is to do some tracking records, to assist the use of the Finalize function.
Finally, what kind of classes need to be recycled? Useless class, what is a useless class? The following requirements must be met:
1> All instance objects of this class have been reclaimed.
2> loading the class ClassLoader has been recycled.
3> the Reflection Class Java.lang.Class object that corresponds to this class is not referenced anywhere.
4, how to carry out garbage collection?
This block content to introduce garbage collection algorithm mainly, because we have introduced earlier, memory is divided into three blocks, the new generation, the old generation, the last generation. Three generations of different characteristics, resulting in their use of the GC algorithm, the new generation for those with short life cycle, frequently created and destroyed objects, the old generation suitable for the life cycle of relatively long objects, durable generation in the Sun hotspot refers to the method area (some JVMs do not have a persistent generation of this argument). Firstly, the concept and characteristics of the next generation, the old generation and the enduring generations are introduced:
Cenozoic: New Generation or young Generation. The above is roughly divided into the Eden and survivor areas, and the survivor area is divided into two parts of the same size: Fromspace and Tospace. New objects are used to allocate memory in the Cenozoic, Eden Space is not enough, the surviving objects will be transferred to the survivor, the size of the Cenozoic can be controlled by-xmn, you can also use-xx:survivorratio to control the ratio of Eden and survivor.
Older generation: Old Generation. Used to store objects that are still alive after multiple garbage collections in the Cenozoic, such as cached objects. The old generation occupies a value that corresponds to the-XMX value minus-xmn.
Persistent generation: Permanent Generation. In Sun's JVM is the meaning of the method area, although most of the JVMs do not have this generation. Some information about the main storage constants and classes The default minimum value is 16MB and the maximum value is 64MB, and the minimum and maximum values can be set by-xx:permsize and-xx:maxpermsize.
Common GC Algorithms:
Tag-purge algorithm (mark-sweep)
The most basic GC algorithm that will require the collection of objects to be tagged, then scanned, tagged for recycling, resulting in two steps: Mark and clear. This algorithm is inefficient and generates memory fragmentation when the cleanup is complete, so that if a large object requires contiguous memory space, it needs to be defragmented, so the algorithm needs to be improved.
Replication Algorithm (Copying)
As we talked about, the new generation of memory is divided into three parts, Eden and 2 survivor area, the General Sun's JVM will be the Eden area and the survivor area of the ratio of 8:1, to ensure that a survivor area is free, so that when garbage collection, Place objects that do not need to be reclaimed in the free Survivor area and then clean the Eden area and the first survivor area completely, so there is a problem if the second block of Survivor area is not large enough to do? At this time, it is necessary when the survivor area is not enough, to temporarily borrow the memory of the permanent generation. This algorithm is suitable for the new generation.
Mark-Organize (or call compression) algorithm (MARK-COMPACT)
And the mark-sweep algorithm, just like the first half, only after the objects that do not need to be reclaimed are tagged, moving the tagged objects together to make the memory contiguous, so long as the memory outside the mark boundary is cleaned up. This algorithm is suitable for persistent generations .
Common garbage Collectors:
According to the many algorithms mentioned above, each day the JVM has a different implementation, we first look at some common garbage collectors:
The first three actual garbage collector is introduced: serial GC (SERIALGC), parallel reclaim GC (Parallel scavenge), and parallel GC (PARNEW).
1, Serial GC. Is the most basic, the oldest collector, but is still widely used, is a single-threaded garbage collection mechanism, and not only that, it is the most important feature is the garbage collection at the time of the need to all the executing thread paused (Stop the world), for some applications this is unacceptable, But we can think of that, as long as we can control the time it pauses in the N-millisecond range, most applications are acceptable, and the fact is that it doesn't disappoint us, and the dozens of-millimeter pause is perfectly acceptable to us as a client. This collector is suitable for single CPU, small generation space and is not very high for pause time application, is the default GC mode of client level, can be-XX:+USESERIALGC to enforce the designation.
2, Parnew GC. Basic and Serial GC, but the essential difference is to add multithreading mechanism, improve efficiency, so that it can be used on the server side (server), and it can be combined with the CMS GC, so there is more reason to put it on the server side.
3, Parallel scavenge GC. in the entire scanning and replication process in a multi-threaded way, for multi-CPU, the time required for a short pause on the application, the server level is the default use of GC mode, can be-XX:+USEPARALLELGC to enforce the designation, with-XX: Parallelgcthreads=4 to specify the number of threads. Here are a few sets of usage combinations:
4. CMS (Concurrent Mark Sweep) collector. The collector's goal is to solve the serial GC pause problem to achieve the shortest payback time. The common B/S architecture application is suitable for this collector, because of its high concurrency, high response characteristics. The CMS collector is implemented based on the "tag-purge" algorithm, and the entire collection process is broadly divided into 4 steps:
Initial tag (CMS initial mark), concurrency token (CMS concurrent mark), re-tagging (CMS remark), concurrency Cleanup (CMS concurrent sweep).
Where the initial token, the re-tagging of these two steps will need to pause other user threads. The initial tag simply marks the object that the GC ROOTS can directly relate to, fast, and the concurrent tagging phase is the GC ROOTS root search algorithm stage, which determines whether the object is alive or not. The re-tagging phase is to fix the tag record of the part of the object that caused the markup to change as the user program continues to run during the concurrency tag, and the pause time of this phase is slightly longer from the initial marking stage, but shorter than the concurrent tagging phase. Because the collector thread can work with the user thread during the longest concurrent markup and concurrent cleanup process throughout the process, the memory reclamation process for the CMS collector is performed concurrently with the user thread.
Benefits of CMS Collectors: concurrent collection, low pauses, but CMS is far from perfect.
The CMS Collector has three notable drawbacks :
a> cms collector is very sensitive to CPU resources. In the concurrency phase, although the user thread does not pause, it consumes CPU resources and causes the reference program to slow down and the total throughput to decrease. The number of recycled threads that the CMS starts by default is: (Number of CPUs +3)/4.
B> . The CMS collector cannot handle floating garbage and may appear "Concurrent Mode Failure", resulting in another full GC after failure. Because the CMS concurrent cleanup phase user thread is still running, with the program running since the heat will have a new garbage generation, this part of the garbage appears after the tagging process, the CMS will not be able to process them in this collection, we have to leave the next GC to clean it off. This part of the rubbish is called "floating rubbish". It is also because the user thread in the garbage collection phase needs to run, that is, to reserve enough memory space for the user thread to use, so the CMS collector cannot wait until the old age is almost completely filled up like other collectors and then collects it, and needs to reserve a portion of the memory space for the program to run when it is collected concurrently. By default, the CMS collector is activated when 68% of space is used in the old age, or it can provide a trigger percentage by the value of the parameter-xx:cmsinitiatingoccupancyfraction to reduce the number of memory recoveries to improve performance. The "Concurrent Mode Failure" failure occurs when the memory reserved during the CMS operation does not meet the needs of other threads of the program, and the virtual machine will start a fallback plan: temporarily enable the serial old collector to re-use the garbage collection of the older age, So the pause time is very long. So the parameter-xx:cmsinitiatingoccupancyfraction set too high will easily lead to "Concurrent Mode Failure" failure, performance is reduced.
c> last disadvantage, CMS is a collector based on the "tag-purge" algorithm, which is collected with a "mark-sweep" algorithm, resulting in a lot of fragmentation. Too much space debris will cause a lot of trouble with object allocation, such as large objects, where memory space cannot find contiguous space to allocate and have to trigger a full GC in advance. To solve this problem, the CMS collector provides a-xx:usecmscompactatfullcollection switch parameter that adds a defragmentation process after the full GC, and can also be-xx: The cmsfullgcbeforecompaction parameter sets the number of times the full GC is executed, followed by a defragmentation process.
5, G1 Collector. Compared to the CMS collector has a lot of improvements, first based on the tag-collation algorithm, will not produce a memory fragmentation problem, second, can be more accurate control of the pause, which is no longer described in detail.
6, Serial old. Serial old is an older version of the Serial collector, which also uses a single thread to perform the collection using the "mark-and-organize" algorithm. The virtual machine is primarily used in client mode.
7, Parallel old. Parallel old is an older version of the Parallel scavenge collector, using multithreading and the "mark-and-organize" algorithm.
8, RTSJ garbage collector, for Java real-time programming, follow-up will be supplemented by the introduction.
Third, Java Program performance optimization
Invocation of GC ()
Invoking the GC method implies that the Java virtual machine has made some effort to reclaim unused objects so that the memory currently occupied by these objects can be reused quickly. When control is returned from a method call, the virtual machine has done its best to reclaim space from all discarded objects, and calling System.GC () is equivalent to calling Runtime.getruntime (). GC ().
Finalize () Call and rewrite
The GC can only purge memory allocated on the heap (all objects in the Pure Java language are allocated memory on the heap using new), and cannot clear the allocated memory on the stack (when using JNI technology, memory may be allocated on the stack, such as Java calling C programs, while the C program allocates memory using malloc). Therefore, if some objects are allocated the memory area on the stack, then the GC is not the case, and the memory of the objects on the stack depends on finalize (). For example, when Java calls a non-Java method (which may be C or C + +), the malloc () function of C may be called within non-Java code to allocate memory, and the memory is not freed (because free () is a function of C) unless it is called. , the GC does not work at this time to release the memory, and therefore requires an intrinsic method called free () within the Finalize ().
Excellent programming habits
(1) Avoid creating objects in the loop body, even if the object occupies little memory space.
(2) Try to make the object conform to the garbage collection standard in time.
(3) Do not use too deep inheritance levels.
(4) Accessing local variables is better than accessing variables in the class.
Iv. Frequently Asked Questions
1. Memory Overflow
Is that you require the allocation of the Java Virtual Machine memory beyond the system can give you, the system can not meet the requirements, so overflow.
2. Memory leaks
Is that you apply to the system to allocate memory for use (new), but after use is not returned (delete), the result of the memory you have applied to your own can no longer access, the block has been allocated to the memory can no longer be used, as the server memory is constantly consumed, and the memory is not used more and more, The system also cannot assign it to the required program again, resulting in a leak. Go on, the program also gradually without memory use, will overflow.
Article Source: http://blog.csdn.net/zhangerqing
JVM memory management and garbage collection