Deep understanding of the memory structure and GC mechanism of the JVM

Source: Internet
Author: User
First, the preface

The Java GC(Garbage Collection, garbage collection) mechanism is an important distinguishing feature of C + +, which requires developers to implement their own garbage collection logic, while Java developers only need to focus on business development, Because of garbage collection this tedious thing the JVM has done for us, from this point of view, Java still have to do a bit more perfect. But that doesn't mean we don't have to understand the principle of the GC mechanism, because if you don't understand its rationale, it can lead to memory leaks, frequent GC results in the application of cotton, and even oom problems, so we need to understand its rationale in order to write high-performance applications to address performance bottlenecks.

To understand the principle of GC, we must first understand the JVM memory management mechanism so that we know what objects to recycle, when to recycle, and how to recycle. Second, JVM memory management

According to the JVM specification, the JVM divides memory into the following areas:

1. Methods area
2. Heap Area (HEAP)
3. Virtual machine stack (VM stack)
4. Local methods Stack (Native method Stack)
5. Programs Counter Register)

Where the method area and the heap are shared by all threads. 2.1 Method Areas

The method area holds information about the class to be loaded (such as the class name, modifiers, and so on), static variables, constructors, final defined constants, fields and methods in the class, and so on. The method area is shared globally and will be GC under certain conditions. When the method area exceeds the size it allows, a Outofmemory:permgen space exception is thrown.

In a hotspot virtual machine, this area corresponds to a persistent generation (permanent Generation), in general, there are few cases where GC is performed on the method area, so the method area is called one of the reasons for the persistent generation, but this does not mean that there is no GC on the method area. The GC on it is primarily for the collection of constant pools and for unloading of loaded classes. GC on the method area, the conditions are very harsh and difficult.

A run-time (Runtime Constant Pool) is a part of the method area used to store compiler-generated constants and references. In general, the distribution of constants can be determined at compile time, but not all, and can also store constants that are generated during runtime. The Intern () method of the String class, for example, is that the string class maintains a constant pool, and if the called character "Hello" is already in a constant pool, it returns directly to the address in the constant pool, otherwise a new constant is added to the pool and the address is returned. 2.2 Heap Area (HEAP)

The heap area is the most frequent GC and the most important area for understanding the GC mechanism. The heap area is shared by all threads and is created when the virtual machine is started. The heap area is mainly used for storing object instances and arrays, and all new objects are stored in the region. 2.3 Virtual machine stacks (VM stack)

The virtual machine stack occupies the operating system memory, each thread corresponds to a virtual machine stack, it is thread-private, lifecycle and thread, each method is executed to produce a stack frame (statck frame), stack frame for storing local variables table, dynamic link, Operations and methods such as export information, when the method is called, stack frame into the stack, when the method call at the end of the stack frame stack.

The local variables table stores methods-related local variables, including various basic data types and object reference addresses, so he has a feature: the memory space can be determined during compilation and the runtime no longer changes.

The virtual machine stack defines two types of exceptions :stackoverflowerror (Stack Overflow) and outofmemoryerror (memory overflow). Throw Stackoverflowerror If the thread calls a stack depth greater than the maximum depth allowed by the virtual machine, but most virtual machines allow the dynamic expansion of the virtual machine stack, so the thread can always request a stack until there is not enough memory to throw the OutOfMemoryError. 2.4 Local methods Stack (Native method Stack)

The local method stack is used to support the execution of the native method, storing the execution state of each native method. Local method stacks and virtual machine stacks their operating mechanism is consistent, the only difference is that the virtual machine stack executes the Java method, and the local method stack executes the native method. In many virtual machines (such as the Sun's JDK default hotspot virtual machine), the virtual machine stack and the local method stack are used together. 2.5 Programs Counter (program Counter Register)

Program Counter is a very small area of memory, not RAM, but directly divided on the CPU, the program ape can not operate it, The function of the JVM is to store the byte code line number that the current thread executes when interpreting the bytecode (. Class) file, which is a conceptual model, and different JVMs are used differently. The bytecode interpreter works by changing the value of the program counter to remove a command to be executed, and the basic functions of branching, looping, and jumping are all dependent on this technology area.

Each program counter can only record the line number of one thread, so it is thread-private.

If the program is currently executing a Java method, the program counter records the executing virtual machine bytecode instruction address, and if the native method is executed, the counter's value is NULL, and the memory area is the only area that does not throw a outofmemoryerror. iii. GC Mechanism

With the operation of the program, the memory of instance objects, variables, etc. occupy more and more memory, if not timely recycling, can reduce the efficiency of program operation, or even cause system anomalies.

Of the five memory areas described above, 3 are not garbage-collected: Local method stacks, program counters, virtual machine stacks. Because their lifecycle is synchronized with the thread, the memory they occupy will be released automatically as the thread is destroyed. Therefore, only the method area and heap area need to be garbage collected, and the objects reclaimed are those that do not have any references. 3.1 Lookup Algorithm

Classic Reference counting algorithm , each object added to the reference counter, each referenced once, counter +1, lost reference, Counter-1, when the counter in a period of 0 o'clock, that the object can be recycled. But the algorithm has an obvious flaw: when two objects are referenced to each other, but both have no effect, they should be recycled, but because they are referenced by each other, do not meet the conditions of garbage collection, so it is impossible to dispose of this area of memory. Therefore, the Sun's JVM does not adopt this algorithm, but instead employs a called- root search algorithm , as shown in the figure:

The basic idea is: from a root node called GC roots, search down, if an object can not reach GC roots, the object is no longer referenced, can be recycled. As the Object5, OBJECT6, and Object7 in the figure above, although they still refer to each other, they do not actually work, which solves the flaw of the reference counting algorithm.

Complementary concept, after JDK1.2 introduced four concepts: strong reference, soft reference, weak reference, virtual reference .
Strong references : New objects are strong references, and the GC will not be recycled at all, even if the oom exception is thrown.
Soft Reference : The JVM is recycled only if it is low on memory.
Weak reference : As long as the GC, it will be immediately reclaimed, regardless of whether the memory is sufficient.
Virtual Reference : Can be ignored, the JVM does not care about virtual references, you can understand that it is to dine, together enough "four kings." Its only function is to make some trace records to assist in the use of the Finalize function.

Finally, what kind of class needs to be recycled:

A. All instances of the class have been recycled;
B. The classload that loaded the class have been recycled;
c. The corresponding reflection class Java.lang.Class object is not referenced anywhere.
3.2 Memory Partitions

Memory is mainly divided into three blocks: Cenozoic (Youn Generation), Old Generation (Generation), persistent generation (permanent Generation). The characteristics of the three generations are different, created their use of GC algorithm, the new generation for short life cycle, quickly created and destroyed objects, the old generation for longer life-cycle objects, persistent generation in the Sun hotpot virtual machine refers to the method area (some JVMs do not have a persistent generation of this).

Cenozoic (Youn Generation): Generally divided into the Eden and survivor areas, survivor area is divided into two parts of the same size: Fromspace and Tospace. New objects are allocated memory from the Cenozoic, when the Eden area is insufficient, will move the surviving object to the Survivor area. The Minor GC(also known as the Youn GC) occurs when a generation of garbage collection proceeds.

old generation (old Generation): Legacy generations are used to store objects that are still alive in the Cenozoic, such as cached objects. The old generation is reclaimed when the old generation is full, and the garbage collection of the old generation is called the major GC (also known as the fully GC).

Persistent Generation (permanent Generation): In the Sun's JVM is the meaning of the method area, although most JVMs do not have this generation. 3.3 GC Algorithm

Common GC Algorithms : Copy, Tag-erase, and tag-compress

replication : The replication algorithm takes the form of scanning from the root collection, moving the surviving object to an empty area, as shown in the figure:

When fewer objects exist, the replication algorithm is more efficient (the new generation of Eden is using this algorithm), the cost is the need for a piece of extra free space and object movement.

Mark-Purge : The algorithm uses the way to start the scan with the collection, mark the surviving objects, then scan the unmarked objects in the whole space and clear them. The process of marking and purging is as follows:

The blue part of the image above is a referenced object, and the brown part is an object that is not referenced. In the marking phase, a comprehensive scan is required, and the process is more time-consuming.

The purge phase cleans up objects that are not referenced, and the surviving objects are preserved.

Mark-Purge actions do not need to move objects, and only to clean up the objects that are not alive, more efficient when the object is more alive in space, but because it is only clear, there is no reorganization, resulting in memory fragmentation.

tag-Compression : This algorithm is similar to the tag-purge algorithm, which marks the surviving object first, but after clearing it moves the living object to the left free space and then updates its pointer to the referenced object, as shown in the following figure

The algorithm avoids the tagging-clear fragmentation problem because of the move regularization action, but the cost increases because of the need to move. (This algorithm applies to the old generation) four, garbage collector

In the JVM, the GC is executed by the garbage collector, so in the actual scenario, we need to select the appropriate garbage collector, and we'll introduce the garbage collector below. 4.1 serial collector (serial GC)

The serial GC is the oldest and most basic collector, but it is still widely used, the default configuration used by client virtual machines in Java SE5 and Java SE6. More suitable for systems with only one processor. In a serial processor, the minor and major GC processes are recycled using one thread. The biggest feature of this is that when garbage collection is done, the need to suspend all running threads (Stop the world) is unacceptable for some applications, but if the application's real-time requirements are not so high, most applications are acceptable as long as the pause time is controlled within n milliseconds, and in fact , it did not disappoint us, dozens of millisecond pause, for our client is completely acceptable, the collector for the single CPU, the new generation of small space and for pause time requirements are not particularly high applications, is the client-level default GC method. 4.2 parnew GC

Basically the same as the serial GC, but the essential difference is that it adds a multithreaded mechanism that improves efficiency so that it can be used on the server, and it works with the CMS GC, so it's more reasonable to use it on the server side. 4.3 Parallel scavenge GC

The entire scan and replication process is done in a multi-threaded manner, applicable to multiple CPUs, a short pause time application, is the server-level default GC method. 4.4 CMS (Concurrent Mark Sweep) collector

The goal of the collector is to solve the problem of serial GC pauses to achieve the shortest recovery time. The common application of B/s architecture is suitable for this collector, because of its high concurrency, high response characteristics, the CMS is based on the mark-clear algorithm implementation.

The advantages of CMS collectors: Concurrent collection, low pause, but far from achieving perfection;

Disadvantages of the CMS collector:

The A.cms collector is very sensitive to CPU resources, which, while not causing the user to pause during the concurrency phase, will consume CPU resources and cause the application to slow down and total throughput down. The
b.cms Collector cannot handle floating garbage, and "concurrnet Mode failure" may occur, resulting in another full GC failure. The
c.cms Collector is based on the implementation of the tag-purge algorithm, and therefore fragments are also generated.
4.5 G1 Collector

Compared to the CMS collector has a lot of improvements, first of all, based on the tag-compression algorithm, will not produce memory fragmentation, followed by more precise control of the pause. 4.6 serial Old collector

Serial old is the older version of the serial collector, which uses a single thread to perform the collection, using the "tag-collation" algorithm. The main use of virtual machines in client mode. 4.7 Parallel Old collector

Parallel old is the older version of the Parallel scavenge collector, using multithreaded and "tag-organize" algorithms. 4.8 rtsj Garbage collector

RTSJ garbage collector for Java real-time programming. v. Summary

An in-depth understanding of the JVM's memory model and GC mechanism helps us to write high performance code and provide code optimization ideas and directions.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.