Java's beauty [from rookie to master evolution] JVM memory management and garbage collection

Last Update:2014-10-14 Source: Internet

Author: User

Tags server memory xms

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Many Java interviews, will ask about the Java garbage collection problem, mentions that garbage collection must involve the JVM memory management mechanism, the Java language execution efficiency has been ridiculed by C, C + + programmers, in fact, the fact is that Java in the implementation of efficiency is really very low, on the one hand, The Java language adopts object-oriented thinking, which also determines that the development efficiency is high and the execution efficiency is low. On the other hand, the Java language has made a good promise to programmers that programmers do not need to manage memory because the JVM has garbage collection (GC), which is automatically garbage collected.

actually otherwise

1, garbage collection will not be in accordance with the requirements of the programmer, GC at any time.

2. Garbage collection does not clean up memory in a timely manner, although sometimes the program requires additional memory.

3, the programmer can not control garbage collection.

Because of these facts, so that when we write the program, only according to the law of garbage collection, reasonable arrangement of memory, which requires that we must thoroughly understand the JVM memory management mechanism, so as to be arbitrary, will be programmed in the applause! This chapter is the Java beauty [from rookie to master evolution] series of JVM memory management and garbage collection, after learning this chapter knowledge, the reader will have a basic understanding of the JVM.

This blog is permanently updated, if reproduced,

Please indicate source: http://blog.csdn.net/zhangerqing

If you have any questions, please contact me: egg

Email: [Email protected]

Weibo: HTTP://WEIBO.COM/XTFGGEF

First, the JVM memory structure

The Java Virtual opportunity divides the memory into several different management areas, each of which has its own purpose, depending on the characteristics of different tasks and the use of different algorithms in garbage collection. The whole is divided into the following sections:

Program Counter Register, JVM virtual machine stack (JVM Stacks), local method Stack (Native methods Stacks), Heap (heap), method area

Such as:

1. Program counter (Programs Counter Register)

This is a relatively small amount of memory, not on the RAM, but directly divided on the CPU, the programmer can not directly manipulate it, Its role is: when the JVM interprets the bytecode file (. Class), the line number that stores the bytecode that the current thread executes is just a conceptual model, and the bytecode interpreter works by changing the value of the program counter to select the next command to execute, branching, looping, Jump, and other basic functions are dependent on this technical area to complete. There is also a situation, that is, we often say that the Java multithreading, multithreading is through the rotation of the current process to achieve, the same time, a kernel can only execute a command, so, for each program, there must be a counter to record the progress of the program execution, so that when the current process resumes execution, To start from the right place, so each thread must have a separate program counter, which is the thread-private memory. If a thread is executing a Java method, the counter records the address of the instruction for the bytecode, and if a native method is executed, the record of the counter is empty, and this memory area is the only area in the Java specification that does not have any outofmemoryerror conditions.

2. JVM Virtual machine stack (JVM Stacks)

The JVM virtual machine stack is what we often call stack stacks (we often divide memory roughly into heaps and stacks), like program counters, thread-private, life cycles and threads, and each method is executed with a stack frame for storing local variable tables, dynamic links, operands, Method exports and other information. The execution of the method is the process of stack frame stack and stack in the JVM. The Local variables table holds a variety of basic data types, such as Boolean, Byte, Char, 8, and reference types (which hold the memory addresses pointing to individual objects), so it has a feature: memory space can be determined during compilation, and the runtime is not changed. There are two possible Java exceptions to this memory area: Stackoverflowerror and OutOfMemoryError.

3. Local method Stack (Native methods Stacks)

As you can see from the name, the local method stack is used to handle native methods in Java, and there are many native methods in the Java class ancestor class object, such as Hashcode (), wait (), and so on, their execution is often aided by the operating system, But the JVM needs to do something with them to deal with their execution. This area, can have different implementations, to our common Sun's JVM is the local method stack and JVM virtual machine stack is the same.

4. Heaps (heap)

Heap memory is the most important piece of memory and part of the most necessary drill-down. Because Java performance optimization, mainly for this part of the memory. All object instances and arrays are allocated on top of the heap (as JIT technology matures, this sentence depends on some absolute, but at least for now), the size of the heap can be controlled by-XMX and-XMS. The development of JIT technology has generated new technology, such as stack allocation and scalar substitution, perhaps in the near future, the real-time compilation will be born and mature, then, "all object instances and arrays are allocated on the heap" This sentence should be slightly changed. Heap memory is a major area of garbage collection, so the garbage collection section below will focus on the conceptual aspects of this. On a 32-bit system, the maximum is unlimited on the 2g,64 bit system. With-xms and-xmx control, the-XMS is the minimum heap memory requested when the JVM starts,-xmx the maximum heap memory that the JVM can request.

5. Method area

A method area is a region of memory shared by all threads, used to store data such as class information, constants, static variables, and so on, which have been loaded by the JVM, in general, the method area is a durable generation (about the persistence generation, which is described in detail in the GC section, in addition to the persistent generation and the generation and generation), No wonder the Java specification describes the method area as a logical part of the heap, but it is not a heap. Garbage collection in the method area is tricky, and even Sun's hotspot VMs are not doing so perfectly. An important concept in the method area is introduced here: Run a constant-rate pool. It is primarily used to store the literal that is generated during compilation (literal simple comprehension is constant) and reference. In general, the memory allocations for constants can be determined during compilation, but not necessarily all, and some may be that the runtime can also put constants into a constant pool, such as a native method in the String Class intern () < about intern () For more information, see another article: http ://blog.csdn.net/zhangerqing/article/details/8093919>

This complements a memory area outside the JVM's memory management: direct memory. The new class NiO class in JDK1.4 introduces a channel-to-buffer-based I/O approach that can use the native library to directly allocate out-of-heap memory, which is what we call direct memory, which improves program performance in some scenarios.

Second, garbage collection

There is a good saying: There is a memory allocation and garbage collection between Java and C + + wall, people outside the wall want to go in, the wall people want to go out! Ask the reader to figure it out for yourself. In general, C, C + + programmers sometimes suffer from memory leaks, memory management is a headache, but Java programmers, but also envy C + + programmers, they can control everything, so that in memory management is not helpless, but so, As a Java programmer, it is very difficult for us to control the memory recycling of the JVM, only according to its principle to adapt, try to improve the performance of the program. Let's start with Java garbage collection, garbage collection,gc. From the following four areas:

1. Why garbage collection?

With the running of the program, the memory of the instance objects, variables and other information Occupy more and more memory, if not timely garbage collection, will inevitably bring the program performance degradation, and even due to the lack of available memory caused some unnecessary system anomalies.

2, which "rubbish" needs to be recycled?

Of the five regions we have described above, three are not garbage collected: program counters, JVM stacks, local method stacks. Because their lifecycles are synchronized with threads, the memory they occupy is automatically freed as the threads are destroyed, so only the method area and heap need to be GC. specific to which objects, a simple summary: If an object already has no references, it can be recycled. The popular explanation is that if an object has no effect, it can be recycled as a waste.

3. When do garbage collection take place?

Based on a classic reference counting algorithm, each object adds a reference counter, each referenced once, the counter is incremented by 1, loses the reference, the counter is minus 1, and when the counter remains at 0 o'clock for a period of time, the object is considered to be recoverable. However, this algorithm has obvious flaws: when two objects are referenced by each other, but they are no longer useful, they should be garbage collected as a rule, but they are referenced by each other and are not eligible for garbage collection, so this memory cleanup cannot be handled perfectly. Therefore, Sun's JVM does not use a reference counting algorithm for garbage collection. Instead, it uses a call: Root search algorithm, such as:

The basic idea is: Start with an object called GC roots, search down, if an object cannot reach the GC roots object, it is no longer referenced , it can be garbage collected (this is understood here for the time being, in fact there are some differences, When an object is no longer referenced, it is not completely "dead", and if the class overrides the Finalize () method and is not called by the system, the system calls a Finalize () method to complete the final work, during which time if the object can be re-associated with any one and GC Roots has a referenced object associated with it, then the object can be "reborn", if not, then the description can be completely recycled), such as Object5, OBJECT6, OBJECT7, although they still can reference each other 3, but in general, they have no effect, This solves the problem that the reference counting algorithm cannot solve.

the concept of supplemental references: After JDK 1.2, the reference is expanded to introduce strong, soft, if, and four references, which are marked as the four referenced objects, each with a different meaning in GC:

A> Strong reference (strong Reference). Is the reference to the object just being new, and it's characteristic that it will never be recycled.

B> Soft Reference (Soft Reference). A class that is declared soft-referenced, is an object that can be reclaimed, and if the JVM's memory is not strained, such objects can not be reclaimed, and if memory is tight, they will be recycled. There is a problem here, since objects that are referenced as soft references can be recycled, why not recycle them? In fact, we know that there is a cache mechanism in Java, take the literal cache, sometimes, the cached object is currently optional, just stay in memory if there is need, you do not need to reallocate memory to use, so these objects can be referred to as soft reference, easy to use, improve program performance.

c> Weak references (Weak Reference). Weakly referenced objects are necessarily garbage collected, regardless of memory tension, when GC, the object marked as weak reference will be cleaned and reclaimed.

D> Virtual Reference (Phantom Reference). The weak reference is negligible, the JVM does not care about the virtual reference at all, its only function is to do some tracking records, to assist the use of the Finalize function.

Finally, what kind of classes need to be recycled? Useless class, what is a useless class? The following requirements must be met:

1> All instance objects of this class have been reclaimed.

2> loading the class ClassLoader has been recycled.

3> the Reflection Class Java.lang.Class object that corresponds to this class is not referenced anywhere.

4, how to carry out garbage collection?

This block content to introduce garbage collection algorithm mainly, because we have introduced earlier, memory is divided into three blocks, the new generation, the old generation, the last generation. Three generations of different characteristics, resulting in their use of the GC algorithm, the new generation for those with short life cycle, frequently created and destroyed objects, the old generation suitable for the life cycle of relatively long objects, durable generation in the Sun hotspot refers to the method area (some JVMs do not have a persistent generation of this argument). Firstly, the concept and characteristics of the next generation, the old generation and the enduring generations are introduced:

Cenozoic: New Generation or young Generation. The above is roughly divided into the Eden and survivor areas, and the survivor area is divided into two parts of the same size: Fromspace and Tospace. New objects are used to allocate memory in the Cenozoic, Eden Space is not enough, the surviving objects will be transferred to the survivor, the size of the Cenozoic can be controlled by-xmn, you can also use-xx:survivorratio to control the ratio of Eden and survivor.
Older generation: Old Generation. Used to store objects that are still alive after multiple garbage collections in the Cenozoic, such as cached objects. The old generation occupies a value that corresponds to the-XMX value minus-xmn.

Persistent generation: Permanent Generation. In Sun's JVM is the meaning of the method area, although most of the JVMs do not have this generation. Some information about the main storage constants and classes The default minimum value is 16MB and the maximum value is 64MB, and the minimum and maximum values can be set by-xx:permsize and-xx:maxpermsize.

Common GC Algorithms:

Tag-purge algorithm (mark-sweep)

The most basic GC algorithm that will require the collection of objects to be tagged, then scanned, tagged for recycling, resulting in two steps: Mark and clear. This algorithm is inefficient and generates memory fragmentation when the cleanup is complete, so that if a large object requires contiguous memory space, it needs to be defragmented, so the algorithm needs to be improved.

Replication Algorithm (Copying)

As we talked about, the new generation of memory is divided into three parts, Eden and 2 survivor area, the General Sun's JVM will be the Eden area and the survivor area of the ratio of 8:1, to ensure that a survivor area is free, so that when garbage collection, Place objects that do not need to be reclaimed in the free Survivor area and then clean the Eden area and the first survivor area completely, so there is a problem if the second block of Survivor area is not large enough to do? At this time, it is necessary when the survivor area is not enough, to temporarily borrow the memory of the permanent generation. This algorithm is suitable for the new generation.

Mark-Organize (or call compression) algorithm (MARK-COMPACT)

And mark-Clear the first half of the algorithm, just after the object that does not need to be reclaimed is marked, the tagged objects are moved together, so that the memory is contiguous, so long as the memory outside the mark boundary is cleaned up. This algorithm is suitable for persistent generations .

Common garbage Collectors:

According to the many algorithms mentioned above, each day the JVM has a different implementation, we first look at some common garbage collectors:

The first three actual garbage collector is introduced: Serial GC (SERIALGC), parallel reclaim GC (Parallel scavenge), and parallel GC (PARNEW).

1, Serial GC. Is the most basic, the oldest collector, but is still widely used, is a single-threaded garbage collection mechanism, and not only that, it is the most important feature is the garbage collection at the time of the need to all the executing thread paused (Stop the world), for some applications this is unacceptable, But we can think of that, as long as we can control the time it pauses in the N-millisecond range, most applications are acceptable to us, and the fact is that it doesn't disappoint us, and the dozens of-millimeter pause is perfectly acceptable to us as a client, which applies to a single CPU, The new generation of small space and the demand for pause time is not very high application, is the client level of the default GC mode, can be-XX:+USESERIALGC to enforce the designation.

2, Parnew GC. Basic and Serial GC, but the essential difference is to add multithreading mechanism, improve efficiency, so that it can be used on the server side (server), and it can be combined with the CMS GC, so there is more reason to put it on the server side.

3, Parallel scavenge GC. In the entire scanning and replication process in a multi-threaded way, for multi-CPU, the time required for a short pause on the application, the server level is the default use of GC mode, can be-XX:+USEPARALLELGC to enforce the designation, with-XX: Parallelgcthreads=4 to specify the number of threads. Here are a few sets of usage combinations:

4. CMS (Concurrent Mark Sweep) collector. The collector's goal is to solve the serial GC pause problem to achieve the shortest payback time. The common B/S architecture application is suitable for this collector, because of its high concurrency, high response characteristics. The CMS collector is implemented based on the "tag-purge" algorithm, and the entire collection process is broadly divided into 4 steps:

Initial tag (CMS initial mark), concurrency token (CMS concurrenr mark), re-tagging (CMS remark), concurrency Cleanup (CMS concurrent sweep).

Where the initial token, the re-tagging of these two steps will need to pause other user threads. The initial tag simply marks the object that the GC ROOTS can directly relate to, fast, and the concurrent tagging phase is the GC ROOTS root search algorithm stage, which determines whether the object is alive or not. The re-tagging phase is to fix the tag record of the part of the object that caused the markup to change as the user program continues to run during the concurrency tag, and the pause time of this phase is slightly longer from the initial marking stage, but shorter than the concurrent tagging phase. Because the collector thread can work with the user thread during the longest concurrent markup and concurrent cleanup process throughout the process, the memory reclamation process for the CMS collector is performed concurrently with the user thread.

Benefits of CMS Collectors: concurrent collection, low pauses, but CMS is far from perfect.

The CMS Collector has three notable drawbacks :

a>. The CMS collector is very sensitive to CPU resources. In the concurrency phase, although the user thread does not pause, it consumes CPU resources and causes the reference program to slow down and the total throughput to decrease. The number of recycled threads that the CMS starts by default is: (Number of CPUs +3)/4.

b>. The CMS collector cannot handle floating garbage and may appear "Concurrent Mode Failure", resulting in another full GC after failure. Because the CMS concurrent cleanup phase user thread is still running, with the program running since the heat will have a new garbage generation, this part of the garbage appears after the tagging process, the CMS will not be able to process them in this collection, we have to leave the next GC to clean it off. This part of the rubbish is called "floating rubbish". It is also because the user thread in the garbage collection phase needs to run, that is, to reserve enough memory space for the user thread to use, so the CMS collector cannot wait until the old age is almost completely filled up like other collectors and then collects it, and needs to reserve a portion of the memory space for the program to run when it is collected concurrently. By default, the CMS collector is activated when 68% of space is used in the old age, or it can provide a trigger percentage by the value of the parameter-xx:cmsinitiatingoccupancyfraction to reduce the number of memory recoveries to improve performance. The "Concurrent Mode Failure" failure occurs when the memory reserved during the CMS operation does not meet the needs of other threads of the program, and the virtual machine will start a fallback plan: temporarily enable the serial old collector to re-use the garbage collection of the older age, So the pause time is very long. So the parameter-xx:cmsinitiatingoccupancyfraction set too high will easily lead to "Concurrent Mode Failure" failure, performance is reduced.

c>. The last drawback, CMS is a collector based on the "tag-purge" algorithm, which is collected with a "mark-sweep" algorithm, resulting in a lot of fragmentation. Too much space debris will cause a lot of trouble with object allocation, such as large objects, where memory space cannot find contiguous space to allocate and have to trigger a full GC in advance. To solve this problem, the CMS collector provides a-xx:usecmscompactatfullcollection switch parameter that adds a defragmentation process after the full GC, and can also be-xx: The cmsfullgcbeforecompaction parameter sets the number of times the full GC is executed, followed by a defragmentation process.

5, G1 Collector. Compared to the CMS collector has a lot of improvements, first based on the tag-collation algorithm, will not produce a memory fragmentation problem, second, can be more accurate control of the pause, which is no longer described in detail.

6, Serial old. Serial old is an older version of the Serial collector, which also uses a single thread to perform the collection using the "mark-and-organize" algorithm. The virtual machine is primarily used in client mode.

7, Parallel old. Parallel old is an older version of the Parallel scavenge collector, using multithreading and the "mark-and-organize" algorithm.

8, RTSJ garbage collector, for Java real-time programming, follow-up will be supplemented by the introduction.

Third, Java Program performance optimization

Invocation of GC ()

Invoking the GC method implies that the Java virtual machine has made some effort to reclaim unused objects so that the memory currently occupied by these objects can be reused quickly. When control is returned from a method call, the virtual machine has done its best to reclaim space from all discarded objects, and calling System.GC () is equivalent to calling Runtime.getruntime (). GC ().

Finalize () Call and rewrite

The GC can only purge memory allocated on the heap (all objects in the Pure Java language are allocated memory on the heap using new), and cannot clear the allocated memory on the stack (when using JNI technology, memory may be allocated on the stack, such as Java calling C programs, while the C program allocates memory using malloc). Therefore, if some objects are allocated the memory area on the stack, then the GC is not the case, and the memory of the objects on the stack depends on finalize (). For example, when Java calls a non-Java method (which may be C or C + +), the malloc () function of C may be called within non-Java code to allocate memory, and the memory is not freed (because free () is a function of C) unless it is called. , the GC does not work at this time to release the memory, and therefore requires an intrinsic method called free () within the Finalize ().

Excellent programming habits

(1) Avoid creating objects in the loop body, even if the object occupies little memory space.
(2) Try to make the object conform to the garbage collection standard in time.
(3) Do not use too deep inheritance levels.
(4) Accessing local variables is better than accessing variables in the class.

This section is constantly updated!

Iv. Frequently Asked Questions

1. Memory Overflow

Is that you require the allocation of the Java Virtual Machine memory beyond the system can give you, the system can not meet the requirements, so overflow.
2. Memory leaks

Is that you apply to the system to allocate memory for use (new), but after use is not returned (delete), the result of the memory you have applied to your own can no longer access, the block has been allocated to the memory can no longer be used, as the server memory is constantly consumed, and the memory is not used more and more, The system also cannot assign it to the required program again, resulting in a leak. Go on, the program also gradually without memory use, will overflow.

This chapter is based on theory, I will continue to add some practical operations, such as verifying the garbage collection effect, or memory monitoring what, but also hope that readers will continue to give guidance, suggestions, if you have any questions, please contact: Egg:

Email: [Email protected]

Weibo: WEIBO.COM/XTFGGEF

If there is reprint, please indicate the source (http://blog.csdn.net/zhangerqing), thank you!

The End

Java's beauty [from rookie to master evolution] JVM memory management and garbage collection

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More