Java Series notes (3)-Java memory area and GC mechanism __java

Source: Internet
Author: User
Tags compact garbage collection scalar java reference

Directory Java garbage Collection overview Java Memory Area Java Object Access method Java memory allocation mechanism Java GC Mechanism garbage collector

Overview of Java Garbage collection

Java GC (Garbage Collection, garbage collection, garbage collector) mechanism, is one of the main differences between Java and C++/C, as Java developers, generally do not need to write memory and garbage cleaning code, memory leaks and overflow problems, It doesn't need to be as jittery as a C programmer. This is because in the Java virtual machine, there are automatic memory management and garbage cleaning mechanisms. In a nutshell, the mechanism marks memory in the JVM (Java Virtual Machine) and determines which memory needs to be recycled, automatically reclaims memory based on a certain recycling strategy, and never Stops (Nerver Stop) to guarantee the memory space in the JVM, Prevent memory leaks and overflow problems.

With regard to the JVM, it needs to be explained that in the Sun Company's JDK, which is currently the most used, the default virtual machines are hotspot, since the JDK1.2 of 1999 began to be widely used JDK6. Oracle acquired Sun in 2009, plus the previously acquired EBA, Oracle had two of the 3 largest virtual machines: JRockit and Hotspot,oracle also indicated their intention to integrate the two large virtual machines, but now in the newly released JDK7, The default virtual machine is still hotspot, so the virtual machines introduced in this article are hotspot, and the related mechanism is mainly the GC mechanism of hotspot.

The Java GC mechanism mainly accomplishes 3 things: determining which memory needs to be recycled, determining when the GC is to be performed, and how to execute the GC. After such a long period of development (in fact, there is a GC mechanism before the advent of the Java language, such as the Lisp language), the Java GC mechanism is getting better, and almost automatically doing most of the work for us. However, if we are engaged in the development of larger application software, there is a need for memory optimization, we must study the Java GC mechanism.

Learning the Java GC mechanism can help us to troubleshoot various memory overflow or leak problems in our daily work, solve performance bottlenecks, achieve higher concurrency, and write more efficient programs.

We will learn from 4 aspects of the Java GC mechanism, 1, how the memory is allocated, 2, how to ensure that memory is not incorrectly recycled (that is, which memory needs to be recycled), 3, under what circumstances GC and how to execute GC, and 4, how to monitor and optimize the GC mechanism.

Java Memory Area

To understand the Java GC mechanism, you must first understand the partitioning of memory areas in the JVM. In the Java runtime's data area, the memory area managed by the JVM is divided into the following diagram modules:

which

1, Program counter (program Counter Register): Programs counter is a relatively small memory area, used to indicate the current thread execution of bytecode execution to the first few lines, can be understood to be the current thread line number indicator. When the bytecode interpreter is working, a statement instruction is removed by changing the value of this counter.

Each program counter is used only to record the line number of a thread, so it is thread-private (one thread has a program counter).

If the program executes a Java method, the counter records the executing virtual machine byte-code instruction address, and if the execution is a local (native, written by C language) method, the counter's value is undefined, because the program counter only records the current instruction address. So there is no memory overflow, so the program counter is the only region in all JVM memory areas that does not have a defined outofmemoryerror.

2, Virtual machine stack (JVM stack): Each method of a thread executes at the same time will create a stack frame (statck frame), stored in the stack frame with local variables table, operator station, dynamic link, method exit, etc., when the method is called, the stack frame in the JVM stack, when the method execution is completed , stack frame out stack.

The Local variables table stores the relevant local variables of the method, including various basic data types, object references, return addresses, and so on. In a local variable table, only the long and double types occupy 2 local variable spaces (Slot, for 32-bit machines, one Slot is 32 bit), and the others are 1 Slot. It should be noted that the local variable table is at compile time has been determined, the method of operation required to allocate the space in the stack frame is completely determined, in the life cycle of the method will not change.

Two exceptions are defined in the virtual machine stack that throw a statckoverflowerror (stack overflow) if the thread calls a stack depth greater than the maximum depth allowed by the virtual machine; however, most Java virtual machines allow the dynamic expansion of the virtual machine stack size (with a small number of fixed-length). So the thread can always apply for stacks until there is not enough memory to throw OutOfMemoryError (memory overflow).

Each thread corresponds to a virtual machine stack, so the virtual machine stack is also thread-private.

3, local methods Stack (Native method Statck): The local method stack in the role, operating mechanism, exception types and so on are the same as the virtual machine stack, the only difference is: the virtual machine stack is the implementation of Java methods, and the local method stack is used to execute the Native method, In many virtual machines (such as the Sun's JDK default hotspot virtual machine), the local method stack is used with the virtual machine stack.

The local method stack is also thread-private.

4, heap area (Heap): Heap area is the most important area to understand the Java GC mechanism, not one. In the memory managed by the JVM, the heap area is the largest piece, and the heap area is the main memory area managed by the Java GC mechanism, and the heap area is shared by all threads and created when the virtual machine is started. Heap area exists to store object instances, in principle, all objects are allocated memory on the heap area (but in modern technology, it is not so absolute, there are also directly distributed on the stack).

In general, according to the Java Virtual Machine specification, heap memory needs to be logically continuous (physically unwanted), can be fixed size or extensible when implemented, and the current mainstream virtual machines are extensible. If you do not have enough memory allocations or extensions after the garbage collection has been performed, you will throw a Outofmemoryerror:java heap space exception.

There is much more to the heap area, which is described in detail in the next section, "Java Memory allocation mechanism."

5, methods area: In the Java Virtual Machine specification, the method area is treated as a logical part of the heap, but in fact, the method area is not a heap (non-heap); In addition, many people blog, the Java GC's generational collection mechanism is divided into 3 generations: The green age, the old age, Permanent generations, these authors define the method area as a "permanent generation" because, for the implementation of the previous Hotspot Java Virtual machine, the idea of generational collection is extended to the method area and the method area is designed as a permanent generation. However, most virtual machines other than hotspot do not treat the method area as a permanent generation, hotspot itself, and also plan to cancel the permanent generation. In this article, because the author mainly uses Oracle JDK6.0, it will still use the term permanent generation.

A method area is an area shared by individual threads to store class information that has been loaded by a virtual machine (that is, information that needs to be loaded when the class is loaded, including version, field, method, interface, etc.), final constant, static variable, compiler Just-in-time code, and so on.

The method area is not physically required to be contiguous, you can choose a fixed size or a scalable size, and the method area has one more limit than the heap: You can choose whether to perform garbage collection. Generally, the garbage collection performed on the method area is very small, this is also one of the reasons why the method area is called a permanent generation (HotSpot), but it does not mean that there is no garbage collection on the method area, and that the garbage collection on it is mainly for the memory reclaim of the constant pool and the unload of the loaded class.

Garbage collection in the method area, the conditions are harsh and very difficult, the effect is not satisfactory, so generally do not do too much thinking, can be left for further in-depth study later use.

The Outofmemoryerror:permgen space exception is defined on the method area and is thrown when there is not enough memory.

The runtime (Runtime Constant Pool) is part of the method area used to store literal constants, symbolic references, translated direct references (symbolic references that encode a string representing the position of a variable, an interface) generated at compile time. A direct reference is a translated address based on a symbolic reference that will complete the translation at the class link stage; The Run-time constant pool, in addition to storing compile-time constants, can also store constants generated at runtime (such as The Intern () method of the String class, which maintains a constant pool of If the called character "ABC" is already in a constant pool, the string address in the pool is returned, otherwise a new constant is added to the pool and the address is returned.

6, Direct Memory: Direct memory is not a JVM-managed memory, so it can be understood that direct memory is the machine memory outside the JVM, for example, you have 4G of memory, the JVM is occupied by 1G, the remaining 3G is direct memory, In JDK, there is a memory allocation method based on channel (Channel) and buffer (buffer), where the native function library implemented by C is allocated in direct memory and referenced by Directbytebuffer stored in the JVM heap. Because direct memory is limited by the memory of this machine, outofmemoryerror exceptions may occur.

How Java objects are accessed

In general, a Java reference access involves 3 areas of memory: The JVM stack, the heap, and the method area.

In the simplest local variable reference: Object obj = new Object () For example: Object obj represents a local reference, stored in a local variable table in the JVM stack, representing a reference type data; new Object () As instance object data is stored in the heap, the address of type information (interface, method, field, object type, etc.) of the object class is also recorded in the heap, and the data executed by these addresses is stored in the method area;

In the Java Virtual Machine specification, there are two main ways of implementing a specific object through the reference type reference:

1, access via handle (figure from deep understanding Java Virtual Machine: JVM advanced effects and best implementations):

In the implementation of handle access, there is a special area in the JVM heap that is used as a handle pool to store the instance data addresses (including the addresses in the heap and the addresses in the method area) that are executed by the relevant handles. This implementation method is stable because it represents an address with a handle.

2, through direct pointer access: (Figure from "Deep understanding Java Virtual Machine: JVM Advanced effects and best implementation")

In the way of direct pointer access, the reference stores the actual address of the object in the heap, and the object information stored in the heap contains the corresponding type of data in the method area. The biggest advantage of this approach is its speed, which is the way it is used in hotspot virtual machines.

Java memory allocation mechanism

The memory allocation referred to here is mainly about the allocation on the heap, generally, the memory allocation of objects is done on the heap, but modern technology also supports splitting objects into scalar types (scalar type, atomic type, representing a single value, can be a basic type or string, etc.), then allocated on the stack, rarely seen on the stack, We don't think about it here.

Java memory allocation and recycling mechanism in general, that is: generational distribution, generational recycling. The objects will be divided according to the time of survival: young Generation, older generation (old Generation), Permanent generation (permanent Generation, which is the method area). The following figure (from "become JAVAGC expert part I", http://www.importnew.com/1993.html):

    

Younger generation (young Generation): When an object is created, the allocation of memory first occurs in the younger generation (large objects can be created directly in the old generation), and most objects are no longer used after they are created, so they quickly become unreachable, and are then cleared by the younger generation's GC mechanism (IBM research shows that 98% of objects are soon extinct), this GC mechanism is called the minor GC or the young GC. Note that the Minor GC does not represent a lack of memory in the young generation, which in fact represents only the GC on the Eden area.

The younger generation is divided into 3 regions: the Eden area (where the Eden, Adam and Eve eat the Forbidden Fruit dolls), the area where memory was first allocated, and the two surviving areas (Survivor 0, Survivor 1). The memory allocation process is (from "becoming a JAVAGC specialist part I", http://www.importnew.com/1993.html):

Most of the objects that have just been created will be allocated in the Eden area, most of which will soon die out. The Eden area is a contiguous memory space, so allocating memory on it is extremely fast; the first time, when Eden was full, the minor GC was executed, the extinct object was cleaned up, and the remaining objects were copied to a surviving area Survivor0 (at this point, Survivor1 was blank, Two survivor always have one is blank); next time the Eden is full, do it again. Minor GC, clean out the extinct objects, copy the surviving objects into the Survivor1, then clear the Eden area, and clear out the extinct objects in the Survivor0, The promotion of which can be promoted to the old area, the surviving objects are also copied to the Survivor1 area, and then empty the Survivor0 area; When two survival zones were switched several times (Hotspot virtual machine default 15 times, with-xx:maxtenuringthreshold control, When the value is greater than the old age, but this is only the maximum, not necessarily the value, the surviving objects (in fact only a small number, for example, the object we define ourselves) will be copied into the old age.

From the above process can be seen, the Eden area is a continuous space, and survivor always have one is empty. After a GC and replication, a survivor holds the currently alive object, and the contents of the Eden and another survivor area are no longer needed and can be emptied directly to the next GC, where the two survivor roles are interchanged. As a result, this way of allocating memory and cleaning up memory is highly efficient, and this garbage collection is the famous "stop-copy (stop-and-copy)" Cleanup (copy of the Eden area and the surviving object in a survivor to another survivor), This does not mean that the stop copy cleaning method is very efficient, in fact, it is only in this case efficient, if the old age to use stop copying, it is very tragic.

In the Eden area, the hotspot virtual machine uses two techniques to speed up memory allocation. respectively, Bump-the-pointer and Tlab (thread-local allocation buffers), the two techniques are: Because the Eden area is continuous, So the core of Bump-the-pointer technology is to track the last object created, when the object is created, just check if there is enough memory behind the last object to greatly speed up the memory allocation, and for the Tlab technology is for multithreading, The Eden area is divided into segments, each thread uses a separate section to avoid interaction. Tlab combined with Bump-the-pointer technology will ensure that each thread uses a section of the Eden area and allocates memory quickly.

Older generation (old Generation): If the object survives long enough in the young generation without being cleaned up (ie survived several young GC), it will be copied to the old age, where the older generation is generally larger than the younger generation, and can store more objects, The number of GC occurrences in older generations is also less than in younger generations.      When older generations were out of memory, the major GC, also called full GC, was executed. You can use the-xx:+useadaptivesizepolicy switch to control whether dynamic control policies are used, and if dynamic control, dynamically adjust the size of each area in the Java heap and the age of the old age.

If the object is large (such as a long string or large Array), Young is not enough space, then the large object will be directly assigned to the old age (large objects may trigger the GC, should be less used, should avoid the use of short-lived large objects). Using-xx:pretenuresizethreshold to control the size of the object directly ascending into the older generation, objects larger than this value are directly distributed in the old age.

There may be cases in which older generation objects refer to a new generation of objects, and if a young GC is required, it may be inefficient to query the entire old age to determine whether the collection can be cleaned up. The solution is to maintain a block of byte in the older generation-"card table", where all old-age objects refer to a new generation of objects recorded here. Young GC, as long as the check here, no longer to check all the old age, so performance greatly improved.

Java GC Mechanism

The basic algorithm of GC mechanism is: the collection of generational, this does not need to repeat. The collection method for each generational is described below.

  

Young generation:

In fact, in the previous section, has introduced the new generation of main garbage collection methods, in the Cenozoic, using the "Stop-copy" algorithm to clean up the new generation of memory into 2 parts, 1 part of the Eden region larger, 1 parts survivor relatively small, and is divided into two equal parts. Each time the cleanup is done, copy the Eden area and the surviving objects in a survivor to another survivor, and then clear out Eden and the survivor just now.

It is also found that in the stop-replication algorithm, the two parts used to replicate are not always equal (the traditional stop-copying algorithm is equal to two parts of memory, but the new generation uses 1 large Eden areas and 2 small survivor areas to avoid this problem)

Because most of the objects are short-lived, or even survive survivor, so, Eden area and survivor ratio is large, hotspot default is 8:1, that is, respectively, the new generation of 80%,10%,10%. If you have more than 10% of the memory surviving in a survivor+eden, you need to allocate some of the objects to the old age. The-xx:survivorratio parameter is used to configure the capacity ratio of the survivor area in the Eden region, which defaults to 8, representing the eden:survivor1:survivor2=8:1:1.

Old age: old age storage objects much more than the younger generation, and there are large objects, in the old era of memory cleanup, if the use of stop-copy algorithm, it is very inefficient.      In general, the algorithm used in the old age is the tag-collation algorithm, which is to mark the surviving object (there is a reference) and move all the surviving objects to one end to keep the memory contiguous. In the event of a minor GC, the virtual opportunity checks whether the size of the older age is greater than the amount of space left in the old age for each promotion, or if it is greater than, triggers a full GC directly, otherwise, see if the-xx:+handlepromotionfailure is set (Allow warranty failure) , if allowed, the memory allocation failure can be tolerated, and if not, the full GC (which means that if the-xx:+handle promotionfailure is set, the trigger MINORGC will trigger the full GC at the same time, if the MINORGC is not allowed. Even in the old age there is a lot of memory, so it is best not to do so.

Method Area (permanent generation):

There are two kinds of recycling for permanent generations: constants in a constant pool, useless class information, and a simple collection of constants that can be recycled without reference. For unwanted classes to be recycled, 3 points must be guaranteed: all instances of the class have been reclaimed the ClassLoader of the class object that has been reclaimed has not been referenced (that is, where there is no reference to the class by reflection) the collection of permanent generations is not necessary, You can set whether to recycle a class by using parameters. Hotspot provides-XNOCLASSGC for control using-verbose,-xx:+traceclassloading,-xx:+traceclassunloading can view class loading and unloading information-verbose,-XX:+TR Aceclassloading can be used in product version hotspot-xx:+traceclassunloading need fastdebug version hotspot support

Garbage collector

In the GC mechanism, play an important role is the garbage collector, garbage collector is the implementation of the GC, the Java Virtual Machine specification for the garbage collector does not have any provisions, so different vendors to implement the garbage collector is not the same, HotSpot 1.6 version of the garbage collector used in the following figure (map from the " Deep understanding of Java Virtual machines: JVM advanced effects and best implementations, there is a connection between the two collectors in the diagram, indicating that they can be used together:

  

  

Before introducing the garbage collector, it is important to be clear that the meaning of Stop (Stop-the-world) in the new generation of stop-replication algorithms is to suspend the execution of all other threads when memory is reclaimed. This is very inefficient, and now the various Cenozoic collectors are becoming more and more optimized for this, but still only shorten the stop time and not completely cancel the stop. Serial collector: Cenozoic Collector, using a stop-replication algorithm, using one thread for GC, serial, and other worker threads to suspend. Using-XX:+USESERIALGC, you can use the serial+serial old mode to run a memory recycle (which is also the default for virtual machines running in client mode) Parnew collector: The Cenozoic Collector, using the stop-replication algorithm, The multi-threaded version of the serial collector uses multiple threads for GC, parallel, and other worker threads to pause and focus on shortening garbage collection time. Use the-XX:+USEPARNEWGC switch to control the collection of memory using the parnew+serial old collector; use-xx:parallelgcthreads to set the number of threads that perform a memory recycle. Parallel Scavenge Collector: New generation Collector, using stop replication algorithm, focus on CPU throughput, that is, the time/total time of running user code, for example: The JVM runs for 100 minutes, which runs user code 99 minutes, garbage collection 1 minutes, throughput is 99%, This collector is the most efficient use of CPU, suitable for running background operations (focus on shortening the garbage collection time of collectors, such as the CMS, waiting time is very small, so suitable for user interaction, improve the user experience). Use the-XX:+USEPARALLELGC switch to control the collection of garbage using the parallel scavenge+serial old collector (which is also the default value in server mode); Use-XX: Gctimeratio to set the ratio of user execution time to total time, the default 99, or 1% of the time, is used for garbage collection. Use-xx:maxgcpausemillis to set the maximum pause time for the GC (this parameter is valid only for parallel scavenge) and can be dynamically controlled with the switch parameter-xx:+useadaptivesizepolicy, such as automatic adjustment eden/ Survivor ratio, old age object age, Cenozoic size, etc., this parameter is not under Parnew. Serial old collector: Vintage collector, single-threaded collector, serial, using tag-finishing (sorting method is sweep (clean) and compact (compressed), cleaning is to kill the discarded objects, leaving only the surviving objects, compression is moving the object, will fill the space to ensure that the memory is divided into 2 pieces, aBlock is all objects, a piece of free) algorithm, using a single thread GC, other worker threads paused (note, in the old age of the tagging algorithm cleaning, also need to suspend other threads), before JDK1.5, serial old collector and parallelscavenge with the use. Parallel Old collector: older age collector, multithreading, parallel, multithreaded mechanism and Parallel scavenge poor, using tag collation (unlike the serial, the collation here is summary (summary) and compact (compressed), The idea of a rollup is to replicate the surviving objects to a prepared area, rather than to clean obsolete objects like sweep (cleanup), and still need to suspend other threads when the parallel old executes. Parallel old is very useful in multi-core computing. Parallel old (JDK 1.6), with Parallel scavenge with good results, fully embodies the Parallel scavenge collector throughput priority effect. Use the-XX:+USEPARALLELOLDGC switch to control the collection using the Parallel scavenge +parallel old combination collector. CMS (Concurrent Mark Sweep) Collector: The old age collector, dedicated to capturing the shortest recovery time (that is, shortening the time of garbage collection), using the tag cleanup algorithm, multithreading, the advantage of concurrent collection (user threads can work simultaneously with GC threads), pauses small. Use-XX:+USECONCMARKSWEEPGC to parnew+cms+serial old for memory recycling, priority to use PARNEW+CMS (cause see behind), when the user thread memory is not enough, use the standby solution serial old collection. The implementation of the CMS collection is: initial tag (cms-initial-mark)-> concurrency Mark (cms-concurrent-mark)--> pre-Cleanup (Cms-concurrent-preclean)--> controllable pre-cleanup ( Cms-concurrent-abortable-preclean)-> (cms-remark)-> concurrent Purge (cms-concurrent-sweep)-> concurrent reset state waiting for the next CMS trigger ( Cms-concurrent-reset) Specifically, first 2 marks, 1 times before cleaning, 1 times to mark, and then 1 times clear.   1, first JVM based on-XX:CMSINITIATINGOCCupancyfraction,-xx:+usecmsinitiatingoccupancyonly to decide what time to start garbage collection; 2, if the-xx:+usecmsinitiatingoccupancyonly is set, Then the CMS GC is triggered only if the old generation occupies exactly the proportions set by the-xx:cmsinitiatingoccupancyfraction parameter, 3 if no-xx:+ is set Usecmsinitiatingoccupancyonly, then the system will decide when to trigger the CMS GC based on the statistics, so it sometimes encounters a 80% proportional CMS GC, but it triggers at 50%, because the parameter is not set; 4, when the CMS GC starts, the first phase is the initial mark (Cms-initial-mark), the Stop the world phase, so the object in this phase is only the most direct-accessible object from the root set;       cms-initial-mark:961330k (1572864K), when the indicator is recorded, the used space of the old generation and the total space 5, the next stage is the concurrent tag (Cms-concurrent-mark), which is executed concurrently with the application thread. The so-called concurrent collector refers to this, the main role is to mark the object, this phase does not require users to pause.         This phase will print 2 entries: Cms-concurrent-mark-start,cms-concurrent-mark 6, the next stage is Cms-concurrent-preclean, this phase is mainly to do some prefetching, because the tag and application thread is executed concurrently, so some objects will be the state of the tag will change, this phase is to solve the problem because the rescan phase will also stop The world, in order to make the time to pause as small as possible, also need to preclean stage to do a part of the work to save time       This stage will print 2 logs: Cms-concurrent-preclean-start, Cms-concurrent-preclean 7, the next stage is the Cms-concurrent-abortable-preclean phase, the purpose of this phase is to make the CMS GC more controllable, and to perform some pre cleaning, To reduce the time that the rescan phase causes applications to pause       this stage involves a few parameters:       -xx:cmsmaxabortableprecleantime: It will not end when Abortable-preclean stage execution reaches this time      -xx:cmsscheduleremarkedensizethreshold (default 2m): control when the Abortable-preclean phase starts,       That is, when Eden uses this value, it will begin Abortable-preclean phase      -xx:cmsscheduleremarkedenpenetratio (default 50% ): Control when the Abortable-preclean phase ends

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.