The lifecycle, architecture, memory management, and garbage collection mechanisms of the JVM

Source: Internet
Author: User

First, the life cycle of the JVM

JVM instance: A standalone Java program that is a process-level

JVM execution Engine: the thread on which the user runs the program, which is part of the JVM instance

    1. The birth of the JVM instance

When a Java program is started. A JVM instance is born, and any function that has public static void main (string[] args) can be used as the starting point for an instance

2. JVM instance Run

Main is the starting point for the program initialization thread, and any other thread is started by it.

The JVM has two threads: the daemon thread and the non-daemon thread. The daemon thread is used by the JVM. Main will be a non-daemon thread after startup.

3. JVM instance Extinction

The JVM exits when all non-daemon threads in the program are aborted, and the program can use it if the security manager allows it.

Runtime class or System.exit () exit.

Second, the JVM architecture

    1. Class Loader ClassLoader

The role of the ClassLoader is to load the class file into memory, ClassLoader just load, as long as the file structure to load, as to say can not run, it is not responsible for it, it is responsible for the execution engine,

As long as the "JVM specification" Chinese such as the definition of class file structure.

2. Execution engine Execution engines

The execution engine, also called the Interpreter (interpreter), is responsible for interpreting the command and committing the operating system execution.

3. Native interface Local Interface

The role of the local interface is to fuse different programming languages for Java, and it is intended to be a fusion C/s program, with the specific practice of registering the native method in the native methods stack and loading native libraies when execution engine executes.

4. Runtime data area run Datastore

Running the data area is the focus of the entire JVM. All of our written programs are loaded here before they start to run.

The entire JVM framework is loaded by the loader file, and then the executor processes the data in memory, which requires interaction with heterogeneous systems that can be done through the local interface.

First, the memory management of the JVM

    1. Method area methods zone also known as permanent generation (Permanent Generation)

The method area is shared by all threads, and the zone holds all the fields and method bytecode, as well as some special methods such as constructors, where the interface code is defined.

The method area contains the structure information for each class, including constant pools, field descriptions, method descriptions, and so on.

The restrictions on this area in the VM Space description are very loose, with the exception of the Java heap, which does not require contiguous memory, can optionally be fixed size or extensible, and can even choose not to implement garbage collection. Relatively speaking, the garbage collection behavior in this area is relatively small occurrence, but not some of the description of the permanent generation will not occur GC, where the GC is mainly for the recovery of the constant pool and the class unloading, although the recovery of "score" is generally relatively passable, especially class unloading, the conditions are quite harsh.

The runtime Constant pool is also stored in the method area. class file In addition to the class version, fields, methods, interfaces and other information such as descriptions, there is also a constant table (Constant_pool table), for the compilation of the known constants, which will be loaded in the class load into the method area (permanent generation) storage. However, the Java language does not require constants to enter the method area constant pool only if the content of the const table of the class is pre-compiled, and the new content can be put into Chang (the most typical String.intern () method) during the run time. Running a constant pool is part of the method area, and is naturally limited by the memory of the method area, which throws a OutOfMemoryError exception when it is requested to memory

2. PC Register Program counter

Each thread has a program counter, which is a pointer to the method byte code in the method area, which reads the next instruction by the execution engine. Each Java thread has a program counter that is used to save which instruction the program executes to the current method, and for non-native methods, this area records the address of the VM primitive being executed, and if the Natvie method is being executed, this area is empty (undefined). This memory area is the only area in the VM spec that does not specify any outofmemoryerror conditions.

3.Native methods Stack Local method stacks

The local method stack is the native method service that is used for the virtual machine. Its implementation of the language, mode and structure is not mandatory, and even some virtual machines (such as Sun hotspot virtual machine) directly into the local method stack and the JVM stack. This area also throws Stackoverflowerror and OutOfMemoryError exceptions.

4.Stack stack is also called stack memory

The life cycle of the VM stack is also the same as the thread. The VM stack describes the memory model of the Java method call: When each method is executed, a frame is created to store the local variable table, the action stack, the dynamic link, the method exit, and so on. The call to completion of each method means that a frame is loaded into the stack in the VM stack.

The memory command area of the JVM. is created when the thread is created, its lifetime is the life of the following threads, the thread end stack memory is released, there is no garbage collection problem for the stack, as long as the end of the thread, the stack over. The question is: What data is stored in the stack? And what is the format?

The data in the stack is in the format of stack frame, the stack frame is a memory block, is a data set, is a data set about method and run-time data, when a method A is called, it generates a stack frame F1, and is pressed into the stack, a method calls the B method , the resulting stack frame F2 is also pressed into the stack, after execution, the first pop-up F2 stack frame, and then pop F1 stack frame, follow the "advanced after out" principle.

What data is there in the stack frame? The main storage of the stack frame is 3 kinds of data: local variables (locally Variables), including input parameters and output parameters as well as variables within the method, stack operation (Operand stack), record stack, stack operation, stack frame data, including class files, methods, etc.

The local variable table portion of each frame in the VM stack, where the local variable table holds the various scalar types known at compile time (Boolean, Byte, char, short, int, float, long, double), object reference (not the object itself, just a reference pointer), method returns the address, and so on. where long and double take up 2 local variable space (32bit), the remaining 1 are occupied.

The local variable table is allocated when entering the method, and when entering a method, the method needs to allocate a large local variable in the frame as a matter of complete certainty, without altering the size of the local variable table during the run of the method.

This area is specified in the VM spec with an exception of 2: If the thread requests a stack depth greater than the virtual machine allows, the STACKOVERFLOWERROR exception will be thrown, if the VM stack can be dynamically extended (VM spec allows fixed-length VM stacks), The OutOfMemoryError exception is thrown when the extension cannot request enough memory.

5. Heap Heap Memory

There is only one heap class stored in a JVM instance, and the size of the heap memory can be adjusted.

The Java heap can be in a physically discontinuous memory space, which is logically contiguous, just like our disk space. Implementation can be selected to achieve a fixed size, or can be extensible,

However, all commercially available virtual machines are implemented in a scalable way (via-XMX and-xms control). If memory cannot be allocated in the heap and the heap can no longer be expanded, an OutOfMemoryError exception will be thrown.

After the class loader reads the class file, it needs to put the class, method, and constant variables into the heap memory to facilitate execution by the executor, and heap memory is divided into three parts:

1) Permanent Space permanent storage (also called method area)

The persistent storage area is a resident memory area that holds the class,interface metadata that is carried by the JDK itself, that is, it stores the required class information for the running environment, the data that is loaded into this area is not reclaimed by the garbage collector, and the memory that is occupied by this zone is freed by shutting down the JVM.

2) Young Generation Space New District

The newborn area is the birth, growth, and extinction area of a class, where a class is generated, applied, and finally collected by the garbage collector, ending life. The new district is divided into two parts: Eden Space and the Survivor area (Survivor Pace), all of which are new in the Eden District. There are two surviving areas: Zone 0 (Survivor 0 space) and 1 (Survivor 1 space). When the space in Eden is exhausted, the program needs to create objects, and the JVM's garbage collector will garbage reclaim the Garden of Eden and destroy objects in the Eden area that are no longer referenced by other objects. The remaining objects in the Garden of Eden are then moved to the surviving zone 0. If the surviving zone 0 is full, then the area will be garbage collected and then moved to Zone 1. What if the 1 districts are full? Then move to the retirement area.

3) Tenure generation space Foster old

The pension area is used to save Java objects that are screened out from the newborn area, and the general pool objects are active in this area.

6. Native Direct Memory

Direct memory is not part of the data area when the virtual machine is running, it is simply the native memory and not the area that the VM directly manages. But this part of the memory also causes OutOfMemoryError to appear, so let's put it together to describe it. In JDK1.4, a new NiO class is introduced, which introduces a channel-to-buffer I/O approach, which can allocate native memory directly through the native native library, and then operate through a Directbytebuffer object stored in the Java heap as a reference to this memory. This can significantly improve performance in some scenarios because it avoids copying data back and forth between Java pairs and the native heap.

Obviously, the allocation of native direct memory is not limited by the Java heap size, but it is memory that must still be

Limited by the native physical memory (including swap area or windows virtual memory), the general server administrator configures the JVM parameters,

Parameter information such as-XMX is set according to the actual memory, but the direct memory is often ignored, so that the sum of each memory area is greater than the physical memory limit (including physical and operating system-level limitations), and OutOfMemoryError exceptions occur when dynamic scaling occurs.

Iv. JVM Garbage collection mechanism

The automatic object Memory recycling mechanism in the JVM is called a GC (garbage Collection).

Why garbage collection? With the running of the program, the memory of the instance objects, variables and other information Occupy more and more memory, if not timely garbage collection, will inevitably bring the program performance degradation, and even due to the lack of available memory caused some unnecessary system anomalies.

What "junk" needs to be recycled?

Of the six areas we described above, three are not garbage collected: program counters, JVM stacks, local method stacks. Because their lifecycles are synchronized with threads, the memory they occupy is automatically freed as the threads are destroyed, so only the method area and heap need to be GC. Method Area recycling is also very small, generally is the heap needs GC, specific to which objects, a brief summary: If an object no longer has any references, then it can be recycled. The popular explanation is that if an object has no effect, it can be recycled as a waste.

When does garbage collection take place?

Based on a classic reference counting algorithm, each object adds a reference counter, each referenced once, the counter is incremented by 1, loses the reference, the counter is minus 1, and when the counter remains at 0 o'clock for a period of time, the object is considered to be recoverable. However, this algorithm has obvious flaws: when two objects are referenced by each other, but they are no longer useful, they should be garbage collected as a rule, but they are referenced by each other and are not eligible for garbage collection, so this memory cleanup cannot be handled perfectly. Therefore, Sun's JVM does not use a reference counting algorithm for garbage collection. Instead, it uses a call: Root search algorithm

The basic idea is: Start with an object called GC roots, search down, if an object cannot reach the GC roots object, it is no longer referenced, it can be garbage collected (this is understood here, in fact, there are some differences, when an object is no longer referenced, it is not completely "Death", if the class overrides the Finalize () method and has not been called by the system, then the system calls a Finalize () method to complete the final work, in which case the object can be re-associated with any object that has a reference to the GC roots. Rebirth ", if not, then the description can be completely recycled), such as the Object5, OBJECT6, OBJECT7, although they can still reference each other 3, but in general, they have no effect, so that the reference counting algorithm can not solve the problem.

    1. GC Fundamentals

Consumes resources and time to reclaim objects that are no longer in use in memory.

1) Collection of new generation objects called minor GC

2) collection of old generation objects becomes full GC

3) A GC that is actively invoked System.GC () in the program is called the full GC

    1. Object reference types are divided into strong references, soft references, weak references, and virtual references.

1) Strong reference: that is, we generally declare the object is a reference to the virtual machine generation, under the strong reference environment, garbage collection needs to strictly judge whether the current object is strongly referenced, if strongly referenced, it will not be garbage collected.

2) Soft references: Soft references are generally used as caches. The difference from strong references is that when a soft reference is garbage collected, the virtual opportunity determines whether to recycle the soft reference based on the remaining memory of the current system. If the remaining memory is strained, the virtual opportunity reclaims the space referenced by the soft reference, and if the remaining memory is relatively rich, it will not be recycled. In other words, when a virtual machine occurs outofmemory, there must be no soft reference present.

3) Weak references: Weak references are similar to soft references, and are used as caches. However, unlike soft references, a weak reference is bound to be reclaimed when it is garbage collected, so its life cycle only exists during a garbage collection cycle.

4) "Virtual reference": is the same as a dummy, unlike several other references, the virtual reference does not determine the object's life cycle. If an object holds only virtual references, it can be garbage collected at any time, just as there are no references.

Virtual references are primarily used to track the activities of objects that are garbage collected. One difference between a virtual reference and a soft reference and a weak reference is that the virtual reference must be used in conjunction with the reference queue (Referencequeue). When the garbage collector prepares to reclaim an object, if it finds that it has a virtual reference, it will add the virtual reference to the reference queue associated with it before reclaiming the object's memory. The program can see if the referenced object is going to be garbage collected by judging whether the reference queue has been added to the virtual reference. If the program finds that a virtual reference has been added to the reference queue, it can take the necessary action before the memory of the referenced object is recycled.

Strong references Needless to say, our system is generally used as a strong reference. Soft references and weak references are relatively rare. They are generally used as caches, and generally are cached when the memory size is limited. Because if the memory is large enough, you can use the strong reference directly as the cache, while the controllability is higher. As a result, they are commonly used in the desktop application system cache.

In particular, the use of weak references and virtual references is seldom used in the programming of the century, because soft references can speed up the JVM's recovery of garbage memory, maintain the security of the system, and prevent the generation of memory overflow (OutOfMemory) problems.

    1. How do I do garbage collection?

This block content to introduce garbage collection algorithm mainly, because we have introduced earlier, memory is divided into three blocks, the new generation, the old generation, the last generation. Three generations of different characteristics, resulting in their use of the GC algorithm, the new generation for those with short life cycle, frequently created and destroyed objects, the old generation suitable for the life cycle of relatively long objects, durable generation in the Sun hotspot refers to the method area (some JVMs do not have a persistent generation of this argument). Firstly, the concept and characteristics of the next generation, the old generation and the enduring generations are introduced:

Cenozoic: New Generation or young Generation. The above is roughly divided into the Eden and survivor areas, and the survivor area is divided into two parts of the same size: Fromspace and Tospace. New objects are used to allocate memory in the Cenozoic, Eden Space is not enough, the surviving objects will be transferred to the survivor, the size of the Cenozoic can be controlled by-xmn, you can also use-xx:survivorratio to control the ratio of Eden and survivor.
Older generation: Old Generation. Used to store objects that are still alive after multiple garbage collections in the Cenozoic, such as cached objects. The old generation occupies a value that corresponds to the-XMX value minus-xmn.

Persistent generation: Permanent Generation. In Sun's JVM is the meaning of the method area, although most of the JVMs do not have this generation. Some information about the main storage constants and classes The default minimum value is 16MB and the maximum value is 64MB, and the minimum and maximum values can be set by-xx:permsize and-xx:maxpermsize.

5. Common GC Algorithms:

      tag-purge algorithm (mark-sweep)

The most basic GC algorithm that will require the collection of objects to be tagged, then scanned, tagged for recycling, resulting in two steps: Mark and clear. This algorithm is inefficient and generates memory fragmentation when the cleanup is complete, so that if a large object requires contiguous memory space, it needs to be defragmented, so the algorithm needs to be improved.

Replication Algorithm (Copying)

As we talked about, the new generation of memory is divided into three parts, Eden and 2 survivor area, the General Sun's JVM will be the Eden area and the survivor area of the ratio of 8:1, to ensure that a survivor area is free, so that when garbage collection, Place objects that do not need to be reclaimed in the free Survivor area and then clean the Eden area and the first survivor area completely, so there is a problem if the second block of Survivor area is not large enough to do? At this time, it is necessary when the survivor area is not enough, to temporarily borrow the memory of the permanent generation. This algorithm is suitable for the new generation.

Mark-Organize (or call compression) algorithm (MARK-COMPACT)

And mark-Clear the first half of the algorithm, just after the object that does not need to be reclaimed is marked, the tagged objects are moved together, so that the memory is contiguous, so long as the memory outside the mark boundary is cleaned up. This algorithm is suitable for persistent generations.

6. Common Garbage Collectors:

      according to the many algorithms described above, each JVM has a different implementation, we first introduce three kinds of actual garbage collector: Serial GC (SERIALGC), parallel recycle GC (Parallel scavenge) and parallel GC (PARNEW).

1), Serial GC. Is the most basic, the oldest collector, but is still widely used, is a single-threaded garbage collection mechanism, and not only that, it is the most important feature is the garbage collection at the time of the need to all the executing thread paused (Stop the world), for some applications this is unacceptable, But we can think of that, as long as we can control the time it pauses in the N-millisecond range, most applications are acceptable to us, and the fact is that it doesn't disappoint us, and the dozens of-millimeter pause is perfectly acceptable to us as a client, which applies to a single CPU, The new generation of small space and the demand for pause time is not very high application, is the client level of the default GC mode, can be-XX:+USESERIALGC to enforce the designation.

2), Parnew GC. Basic and Serial GC, but the essential difference is to add multithreading mechanism, improve efficiency, so that it can be used on the server side (server), and it can be combined with the CMS GC, so there is more reason to put it on the server side.

3), Parallel scavenge GC. In the entire scanning and replication process in a multi-threaded way, for multi-CPU, the time required for a short pause on the application, the server level is the default use of GC mode, can be-XX:+USEPARALLELGC to enforce the designation, with-XX: Parallelgcthreads=4 to specify the number of threads. Here are a few sets of usage combinations:

4), CMS (Concurrent Mark Sweep) collector. The collector's goal is to solve the serial GC pause problem to achieve the shortest payback time. The common B/S architecture application is suitable for this collector, because of its high concurrency, high response characteristics. The CMS collector is implemented based on the "tag-purge" algorithm, and the entire collection process is broadly divided into 4 steps: initial tag (CMS initial mark), concurrent tagging (CMS concurrenr mark), re-tagging (CMS remark), Concurrency Cleanup (cms Concurrent sweep).

Benefits of CMS Collectors: Concurrent collection, low pauses, but CMS is far from perfect.

5), G1 collector. Compared to the CMS collector has a lot of improvements, first based on the tag-collation algorithm, will not produce a memory fragmentation problem, second, can be more accurate control of the pause, which is no longer described in detail.

6), Serial old. Serial old is an older version of the Serial collector, which also uses a single thread to perform the collection using the "mark-and-organize" algorithm.

7), Parallel old. Parallel old is an older version of the Parallel scavenge collector, using multithreading and the "mark-and-organize" algorithm.

8), RTSJ garbage collector, for Java real-time programming, follow-up will be supplemented by the introduction.

The lifecycle, architecture, memory management, and garbage collection mechanisms of the JVM

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.