Java memory area and GC mechanism _

Java memory area and GC mechanism __java

Last Update:2018-07-27 Source: Internet

Author: User

Tags compact scalar java reference

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Java memory regions and GC mechanisms
Directory

Overview of Java Garbage collection
Java Memory Area
How Java objects are accessed
Java memory allocation mechanism
Java GC Mechanism
Garbage collector
Overview of Java Garbage collection

Java GC (Garbage Collection, garbage collection, garbage collector) mechanism, is one of the main differences between Java and C++/C, as Java developers, generally do not need to write memory and garbage cleaning code, memory leaks and overflow problems, It doesn't need to be as jittery as a C programmer. This is because in the Java virtual machine, there are automatic memory management and garbage cleaning mechanisms. In a nutshell, the mechanism marks memory in the JVM (Java Virtual Machine) and determines which memory needs to be recycled, automatically reclaims memory based on a certain recycling strategy, and never Stops (Nerver Stop) to guarantee the memory space in the JVM, There is a memory leak and overflow problem with the placement.

With regard to the JVM, it needs to be explained that in the Sun Company's JDK, which is currently the most used, the default virtual machines are hotspot, since the JDK1.2 of 1999 began to be widely used JDK6. Oracle acquired Sun in 2009, plus the previously acquired EBA, Oracle had two of the 3 largest virtual machines: JRockit and Hotspot,oracle also indicated their intention to integrate the two large virtual machines, but now in the newly released JDK7, The default virtual machine is still hotspot, so the virtual machines introduced in this article are hotspot, and the related mechanism is mainly the GC mechanism of hotspot.

The Java GC mechanism mainly accomplishes 3 things: determining which memory needs to be recycled, determining when the GC is to be performed, and how to execute the GC. After such a long period of development (in fact, there is a GC mechanism before the advent of the Java language, such as the Lisp language), the Java GC mechanism is getting better, and almost automatically doing most of the work for us. However, if we are engaged in the development of larger application software, there is a need for memory optimization, we must study the Java GC mechanism.

Learning the Java GC mechanism can help us to troubleshoot various memory overflow or leak problems in our daily work, solve performance bottlenecks, achieve higher concurrency, and write more efficient programs.

We will learn from 4 aspects of the Java GC mechanism, 1, how the memory is allocated, 2, how to ensure that memory is not incorrectly recycled (that is, which memory needs to be recycled), 3, under what circumstances GC and how to execute GC, and 4, how to monitor and optimize the GC mechanism.

Java Memory Area

To understand the Java GC mechanism, you must first understand the partitioning of memory areas in the JVM. In the Java runtime's data area, the memory area managed by the JVM is divided into the following diagram modules:

which

1, Program counter (program Counter Register): Programs counter is a relatively small memory area, used to indicate the current thread execution of bytecode execution to the first few lines, can be understood to be the current thread line number indicator. When the bytecode interpreter is working, a statement instruction is removed by changing the value of this counter.

Each program counter is used only to record the line number of a thread, so it is thread-private (one thread has a program counter).

If the program executes a Java method, the counter records the executing virtual machine byte-code instruction address, and if the execution is a local (native, written by C language) method, the counter's value is undefined, because the program counter only records the current instruction address. So there is no memory overflow, so the program counter is the only region in all JVM memory areas that does not have a defined outofmemoryerror.

2, Virtual machine stack (JVM stack): Each method of a thread executes at the same time will create a stack frame (statck frame), stored in the stack frame with local variables table, operator station, dynamic link, method exit, etc., when the method is called, the stack frame in the JVM stack, when the method execution is completed , stack frame out stack.

The Local variables table stores the relevant local variables of the method, including various basic data types, object references, return addresses, and so on. In a local variable table, only the long and double types occupy 2 local variable spaces (Slot, for 32-bit machines, one Slot is 32 bit), and the others are 1 Slot. It should be noted that the local variable table is at compile time has been determined, the method of operation required to allocate the space in the stack frame is completely determined, in the life cycle of the method will not change.

Two exceptions are defined in the virtual machine stack that throw a statckoverflowerror (stack overflow) if the thread calls a stack depth greater than the maximum depth allowed by the virtual machine; however, most Java virtual machines allow the dynamic expansion of the virtual machine stack size (with a small number of fixed-length). So the thread can always apply for stacks, know that there is not enough memory, at this time, will throw OutOfMemoryError (memory overflow).

Each thread corresponds to a virtual machine stack, so the virtual machine stack is also thread-private.

3, local methods Stack (Native method Statck): The local method stack in the role, operating mechanism, exception types and so on are the same as the virtual machine stack, the only difference is: the virtual machine stack is the implementation of Java methods, and the local method stack is used to execute the Native method, In many virtual machines (such as the Sun's JDK default hotspot virtual machine), the local method stack is used with the virtual machine stack.

The local method stack is also thread-private.

4, heap area (Heap): Heap area is the most important area to understand the Java GC mechanism, not one. In the memory managed by the JVM, the heap area is the largest piece, and the heap area is the main memory area managed by the Java GC mechanism, and the heap area is shared by all threads and created when the virtual machine is started. Heap area exists to store object instances, in principle, all objects are allocated memory on the heap area (but in modern technology, it is not so absolute, there are also directly distributed on the stack).

In general, according to the Java Virtual Machine specification, heap memory needs to be logically continuous (physically unwanted), can be fixed size or extensible when implemented, and the current mainstream virtual machines are extensible. If you do not have enough memory allocations or extensions after the garbage collection has been performed, you will throw a Outofmemoryerror:java heap space exception.

There is much more to the heap area, which is described in detail in the next section, "Java Memory allocation mechanism."

5, methods area: In the Java Virtual Machine specification, the method area is treated as a logical part of the heap, but in fact, the method area is not a heap (non-heap); In addition, many people blog, the Java GC's generational collection mechanism is divided into 3 generations: The green age, the old age, Permanent generations, these authors define the method area as a "permanent generation" because, for the implementation of the previous Hotspot Java Virtual machine, the idea of generational collection is extended to the method area and the method area is designed as a permanent generation. However, most virtual machines other than hotspot do not treat the method area as a permanent generation, hotspot itself, and also plan to cancel the permanent generation. In this article, because the author mainly uses Oracle JDK6.0, it will still use the term permanent generation.

A method area is an area shared by individual threads to store class information that has been loaded by a virtual machine (that is, information that needs to be loaded when the class is loaded, including version, field, method, interface, etc.), final constant, static variable, compiler Just-in-time code, and so on.

The method area is not physically required to be contiguous, you can choose a fixed size or a scalable size, and the method area has one more limit than the heap: You can choose whether to perform garbage collection. Generally, the garbage collection performed on the method area is very small, this is also one of the reasons why the method area is called a permanent generation (HotSpot), but it does not mean that there is no garbage collection on the method area, and that the garbage collection on it is mainly for the memory reclaim of the constant pool and the unload of the loaded class.

Garbage collection in the method area, the conditions are harsh and very difficult, the effect is not satisfactory, so generally do not do too much thinking, can be left for further in-depth study later use.

The Outofmemoryerror:permgen space exception is defined on the method area and is thrown when there is not enough memory.

The runtime (Runtime Constant Pool) is part of the method area used to store literal constants, symbolic references, translated direct references (symbolic references that encode a string representing the position of a variable, an interface) generated at compile time. A direct reference is a translated address based on a symbolic reference that will complete the translation at the class link stage; The Run-time constant pool, in addition to storing compile-time constants, can also store constants generated at runtime (such as The Intern () method of the String class, which maintains a constant pool of If the called character "ABC" is already in a constant pool, the string address in the pool is returned, otherwise a new constant is added to the pool and the address is returned.

6, Direct Memory: Direct memory is not a JVM-managed memory, so it can be understood that direct memory is the machine memory outside the JVM, for example, you have 4G of memory, the JVM is occupied by 1G, the remaining 3G is direct memory, In JDK, there is a memory allocation method based on channel (Channel) and buffer (buffer), where the native function library implemented by C is allocated in direct memory and referenced by Directbytebuffer stored in the JVM heap. Because direct memory is limited by the memory of this machine, outofmemoryerror exceptions may occur.

How Java objects are accessed

In general, a Java reference access involves 3 areas of memory: The JVM stack, the heap, and the method area.

Take the simplest local variable reference: Object obj = new Object () for example:

Object obj represents a local reference, stored in the local variable table of the JVM stack, representing a reference type of data;
The new object () is stored in the heap as the instance object data;
The heap also records the address of the type information (interface, method, field, object type, etc.) of the object class, and the data executed by these addresses is stored in the method area;
In the Java Virtual Machine specification, there are two main ways of implementing a specific object through the reference type reference:

1, access via handle (figure from deep understanding Java Virtual Machine: JVM advanced effects and best implementations):

In the implementation of handle access, there is a special area in the JVM heap that is used as a handle pool to store the instance data addresses (including the addresses in the heap and the addresses in the method area) that are executed by the relevant handles. This implementation method is stable because it represents an address with a handle.

2, through direct pointer access: (Figure from "Deep understanding Java Virtual Machine: JVM Advanced effects and best implementation")

In the way of direct pointer access, the reference stores the actual address of the object in the heap, and the object information stored in the heap contains the corresponding type of data in the method area. The biggest advantage of this approach is its speed, which is the way it is used in hotspot virtual machines.

Java memory allocation mechanism

The memory allocation referred to here is mainly about the allocation on the heap, generally, the memory allocation of objects is done on the heap, but modern technology also supports splitting objects into scalar types (scalar type, atomic type, representing a single value, can be a basic type or string, etc.), then allocated on the stack, rarely seen on the stack, We don't think about it here.

Java memory allocation and recycling mechanism in general, that is: generational distribution, generational recycling. The objects will be divided according to the time of survival: young Generation, older generation (old Generation), Permanent generation (permanent Generation, which is the method area). The following figure (from "become JAVAGC expert part I", http://www.importnew.com/1993.html):

Younger generation (young Generation): When an object is created, the allocation of memory first occurs in the younger generation (large objects can be created directly in the old generation), and most objects are no longer used after they are created, so they quickly become unreachable, and are then cleared by the younger generation's GC mechanism (IBM research shows that 98% of objects are soon extinct), this GC mechanism is called the minor GC or the young GC. Note that the Minor GC does not represent a lack of memory in the young generation, which in fact represents only the GC on the Eden area.

The younger generation is divided into 3 regions: the Eden area (where the Eden, Adam and Eve eat the Forbidden Fruit dolls), the area where memory was first allocated, and the two surviving areas (Survivor 0, Survivor 1). The memory allocation process is (from "becoming a JAVAGC specialist part I", http://www.importnew.com/1993.html):

Most of the objects that have just been created will be allocated in the Eden area, most of which will soon die out. The Eden area is a contiguous memory space, so allocating memory on it is extremely fast;
When Eden is full, execute the minor GC, clean out the extinct objects, and copy the remaining objects to a surviving area Survivor0 (at this point, Survivor1 is blank and two survivor always have a blank);
Thereafter, once the Eden area is full, perform a minor GC and add the remaining objects to the Survivor0;
When Survivor0 is full, the objects that are still alive are copied directly to Survivor1, and after the Eden area executes the minor GC, the remaining objects are added Survivor1 (at this point, the Survivor0 is blank).
When two surviving areas have been switched several times (Hotspot virtual machine default 15 times, with-xx:maxtenuringthreshold control, greater than the value of the old age), the surviving objects (in fact only a small number, such as our own defined objects), will be copied to the old age.
From the above process can be seen, the Eden area is a continuous space, and survivor always have one is empty. After a GC and replication, a survivor holds the currently alive object, and the contents of the Eden and another survivor area are no longer needed and can be emptied directly to the next GC, where the two survivor roles are interchanged. As a result, this way of allocating memory and cleaning up memory is highly efficient, and this garbage collection is the famous "stop-copy (stop-and-copy)" Cleanup (copy of the Eden area and the surviving object in a survivor to another survivor), This does not mean that the stop copy cleaning method is very efficient, in fact, it is only in this case efficient, if the old age to use stop copying, it is very tragic.

In the Eden area, the hotspot virtual machine uses two techniques to speed up memory allocation. respectively, Bump-the-pointer and Tlab (thread-local allocation buffers), the two techniques are: Because the Eden area is continuous, Therefore, the core of Bump-the-pointer technology is to track the last created object, when the object is created, only need to check if there is enough memory after the last object, so that the memory allocation speed greatly faster; for Tlab technology is for multithreading, the Eden area is divided into several segment, each thread uses a separate section to avoid interacting with each other. Tlab combined with Bump-the-pointer technology will ensure that each thread uses a section of the Eden area and allocates memory quickly.

Older generation (old Generation): If the object survives long enough in the young generation without being cleaned up (ie survived several young GC), it will be copied to the old age, where the older generation is generally larger than the younger generation, and can store more objects, The number of GC occurrences in older generations is also less than in younger generations. When older generations were out of memory, the major GC, also called full GC, was executed.

You can use the-xx:+useadaptivesizepolicy switch to control whether dynamic control policies are used, and if dynamic control, dynamically adjust the size of each area in the Java heap and the age of the old age.
If the object is large (such as a long string or large Array), Young is not enough space, then the large object will be directly assigned to the old age (large objects may trigger the GC, should be less used, should avoid the use of short-lived large objects). Using-xx:pretenuresizethreshold to control the size of the object directly ascending into the older generation, objects larger than this value are directly distributed in the old age.

There may be cases in which older generation objects refer to a new generation of objects, and if a young GC is required, it may be inefficient to query the entire old age to determine whether the collection can be cleaned up. The solution is to maintain a block of byte in the older generation-"card table", where all old-age objects refer to a new generation of objects recorded here. Young GC, as long as the check here, no longer to check all the old age, so performance greatly improved.

Java GC Mechanism

The basic algorithm of GC mechanism is: the collection of generational, this does not need to repeat. The collection method for each generational is described below.

Young generation:

In fact, in the previous section, has introduced the new generation of main garbage collection methods, in the Cenozoic, using the "Stop-copy" algorithm to clean up the new generation of memory into 2 parts, 1 part of the Eden region larger, 1 parts survivor relatively small, and is divided into two equal parts. Each time the cleanup is done, copy the Eden area and the surviving objects in a survivor to another survivor, and then clear out Eden and the survivor just now.

It is also found that in the stop-replication algorithm, the two parts used to replicate are not always equal (the traditional stop-copying algorithm is equal to two parts of memory, but the new generation uses 1 large Eden areas and 2 small survivor areas to avoid this problem)

Because most of the objects are short-lived, or even survive survivor, so, Eden area and survivor ratio is large, hotspot default is 8:1, that is, respectively, the new generation of 80%,10%,10%. If you have more than 10% of the memory surviving in a survivor+eden, you need to allocate some of the objects to the old age. The-xx:survivorratio parameter is used to configure the capacity ratio of the survivor area in the Eden region, which defaults to 8, representing the eden:survivor1:survivor2=8:1:1.

Old Age:

Older generations store more objects than the younger generation, and there are large objects, the old age of memory cleanup, if using a stop-copy algorithm, it is very inefficient. In general, the algorithm used in the old age is the tag-collation algorithm, which is to mark the surviving object (there is a reference) and move all the surviving objects to one end to keep the memory contiguous.
In the event of a minor GC, the virtual opportunity checks whether the size of the older age is greater than the amount of space left in the old age for each promotion, or if it is greater than, triggers a full GC directly, otherwise, see if the-xx:+handlepromotionfailure is set (Allow warranty failure) , if allowed, the memory allocation failure can be tolerated, and if not, the full GC (which means that if the-xx:+handle promotionfailure is set, the trigger MINORGC will trigger the full GC at the same time, if the MINORGC is not allowed. Even in the old age there is a lot of memory, so it is best not to do so.
Method Area (permanent generation):

There are two kinds of recycling for permanent generations: constants in a constant pool, useless class information, and a simple collection of constants that can be recycled without reference. For unwanted classes to be recycled, 3 points must be guaranteed:

All instances of the class have been reclaimed
The ClassLoader of the load class has been reclaimed
Class object is not referenced (that is, there is no reference to the class by reflection)
A collection of permanent generations is not necessary, and you can set whether to recycle the class by using parameters. Hotspot provides-XNOCLASSGC for control
Use-verbose,-xx:+traceclassloading,-xx:+traceclassunloading to view class loading and unloading information
-verbose,-xx:+traceclassloading can be used in the product version hotspot;
-xx:+traceclassunloading needs Fastdebug version hotspot support
Garbage collector

In the GC mechanism, play an important role is the garbage collector, garbage collector is the implementation of the GC, the Java Virtual Machine specification for the garbage collector does not have any provisions, so different vendors to implement the garbage collector is not the same, HotSpot 1.6 version of the garbage collector used in the following figure (map from the " Deep understanding of Java Virtual machines: JVM advanced effects and best implementations, there is a connection between the two collectors in the diagram, indicating that they can be used together:

Before introducing the garbage collector, it is important to be clear that the meaning of Stop (Stop-the-world) in the new generation of stop-replication algorithms is to suspend the execution of all other threads when memory is reclaimed. This is very inefficient, and now the various Cenozoic collectors are becoming more and more optimized for this, but still only shorten the stop time and not completely cancel the stop.

Serial collector: Cenozoic Collector, using a stop-replication algorithm, using one thread for GC, and other worker threads to suspend. Use-XX:+USESERIALGC to run a memory recycle using the serial+serial old mode (which is also the default for virtual machines running in client mode)
Parnew Collector: Cenozoic collector, using stop replication algorithm, serial collector's multi-threaded version, with multiple threads GC, other worker threads paused, attention to shorten garbage collection time. Use the-XX:+USEPARNEWGC switch to control the collection of memory using the parnew+serial old collector; use-xx:parallelgcthreads to set the number of threads that perform a memory recycle.
Parallel Scavenge Collector: New generation Collector, using stop replication algorithm, focus on CPU throughput, that is, the time/total time of running user code, for example: The JVM runs for 100 minutes, which runs user code 99 minutes, garbage collection 1 minutes, throughput is 99%, This collector is the most efficient use of CPU, suitable for running background operations (focus on shortening the garbage collection time of collectors, such as the CMS, waiting time is very small, so suitable for user interaction, improve the user experience). Use the-XX:+USEPARALLELGC switch to control the collection of garbage using the Parallel scavenge+serial old collector (which is also the default value in server mode); Use-XX: Gctimeratio to set the ratio of user execution time to total time, the default 99, or 1% of the time, is used for garbage collection. Use-xx:maxgcpausemillis to set the maximum pause time for a GC (this parameter is valid only for parallel scavenge)
Serial old collector: old age collector, single-threaded Collector, using tag collation (sorting method is sweep (cleaning) and compact (compressed), cleaning is to discard objects, only surviving objects, compression is moving objects, the space is filled to ensure that memory is divided into 2 pieces, A whole object, a piece of free) algorithm, using a single thread GC, other worker threads paused (note, in the old age of the tagging algorithm cleaning, also need to suspend other threads), before JDK1.5, serial old collector and parallelscavenge with the use.
Parallel Old collector: older age collector, multithreading, multithreaded mechanism and Parallel scavenge poor, using tag collation (unlike the serial, the collation here is summary (summary) and compact (compressed), The idea of a rollup is to replicate the surviving objects to a prepared area, rather than to clean obsolete objects like sweep (cleanup), and still need to suspend other threads when the parallel old executes. Parallel old is very useful in multi-core computing. Parallel old (JDK 1.6), with Parallel scavenge with good results, fully embodies the Parallel scavenge collector throughput priority effect. Use the-XX:+USEPARALLELOLDGC switch to control the collection using the Parallel scavenge +parallel old combination collector.
CMS (Concurrent Mark Sweep) Collector: old age collector, dedicated to obtaining the shortest recovery pause time, using the tag cleanup algorithm, multithreading, the advantage of concurrent collection (user threads can work simultaneously with GC threads), pause small. Use-XX:+USECONCMARKSWEEPGC to parnew+cms+serial old for memory recycling, priority to use PARNEW+CMS (cause see behind), when the user thread memory is not enough, use the standby solution serial old collection.
The CMS collection method is: First 3 marks, 1 times cleared, the first two times in the 3 mark is the initial mark and the mark (Stop the World), the initial tag (Initial remark) is the object that the GC roots can associate with (that is, the referenced object) , the pause time is very short; the Concurrency token (Concurrent remark) is the process of performing GC roots lookup references without user thread pauses; the token (remark) is the part of the marked change that still needs to be marked during the initial tag and the concurrency tag, so add this part The process of marking, the pause time is much smaller than the concurrency tag, but slightly longer than the initial tag. After the tag is complete, the concurrency cleanup begins and no user thread pauses.
Therefore, in the CMS cleanup process, only the initial marking and re-marking needs a short pause, concurrent markup and concurrent cleanup do not need to suspend the user thread, so efficient, very suitable for high interactive occasions.
CMS also has drawbacks, it needs to consume additional CPU and memory resources, in the CPU and memory resources tight, less CPU, will increase the system burden (the number of CMS default boot threads (CPU number +3)/4).
In addition, in the concurrent collection process, the user thread is still running, still generating memory garbage, so it may produce "floating garbage", this time can not be cleaned, only the next full GC to clean, so during the GC, need to reserve enough memory for the user thread to use. Therefore, the collector using CMS is not old when it triggers the full GC, but when using the half (default 68%, that is, 2/3, using-xx:cmsinitiatingoccupancyfraction to set), it is necessary to do full GC, If the user thread consumes memory that is not particularly large, the-xx:cmsinitiatingoccupancyfraction can be appropriately tuned to reduce the number of GC times and improve performance, triggering concurrent Mode failure if the reserved user thread is low on memory. At this point, a fallback scenario is triggered: The serial old collector is used for collection, but the pause time is long, so-xx:cmsinitiatingoccupancyfraction should not be set too large.
Also, the CMS uses the tag cleanup algorithm, which results in memory fragmentation, and can be used to set whether to defragment after full GC,-xx:+usecmscompactatfullcollection. Use-xx:cmsfullgcsbeforecompaction to set up a full GC with compression at a time after performing the number of uncompressed full GC.

G1 Collectors: Officially released in JDK1.7, and the current situation of the new generation, the concept of older generations are very different, the current use of less, do not introduce.

Note the difference between concurrency (Concurrent) and Parallelism (Parallel):
 concurrency means that the user thread executes concurrently with the GC thread (not necessarily in parallel, possibly alternately, but generally at the same time) without pausing the user thread (in fact, the user thread in the CMS needs to pause, is very short, the GC thread executes on another CPU;
 Parallel collection means that multiple GC threads work in parallel, but at this point the user thread is paused;

Therefore, the serial and parallel collectors are parallel, while the CMS collector is concurrent.

For JVM parameter configuration and memory tuning examples, see my next blog (written: Java series notes (4)-JVM monitoring and tuning), originally wanted to write in the same blog, helpless content too much, had to another.

Description
This article is the 3rd of the Java series of notes, this article has been written for a long time, mainly the Java memory and the GC mechanism is relatively complex, difficult to understand, plus I spent a lot of hours in the project and life, so slow progress. Most of the notes come from blogs I've found on the Web and a book on deep understanding of Java Virtual machines: JVM advanced effects and best implementations.
I have limited ability, if there are errors and omissions, please comment.
Resources:
"Java Programming Idea", 5th chapter;
Java depth Adventures, Java garbage collection mechanism and reference type;
"Deep understanding of Java Virtual machines: JVM advanced effects and Best implementations", 第2-3 chapter;
Become JAVAGC Expert part ii-How to monitor the Java garbage collection mechanism, http://www.importnew.com/2057.html
JDK5.0 garbage Collection Optimization –don ' t pause,http://calvin.iteye.com/blog/91905
"Original" Java Memory Area Understanding-preliminary understanding, http://iamzhongyong.iteye.com/blog/1333100
Via http://www.cnblogs.com/zhguang/p/3257367.html

Original link: http://www.cnblogs.com/hnrainll/archive/2013/11/06/3410042.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More