The architecture of the JVM

Source: Internet
Author: User

This article is intended to give a conceptual introduction to all of the students who want to learn about the JVM (Java Virtual machine), mainly on the components of the JVM and the mechanics and principles of their internal workings. Of course this article is just a simple introduction, will not involve too many complicated parameters and configuration, interested students can do more in-depth research, in the process of studying the JVM will find, in fact, the JVM itself is a computer architecture, a lot of principles and our usual hardware, microcomputer principle, operating system have a very similar place, So learning the JVM itself is a good way to deepen the understanding of the computer structure.

It is also important to note that although most of the times we use the JVM provided by Sun (now acquired by Oracle), the JVM itself is a specification, so there can be many implementations, in addition to hotspots, such as Oracle's JRockit, IBM's J9 are also very well-known JVMs.

First, the JVM structure

Shows the main structure of the JVM:

As you can see, the JVM consists mainly of the ClassLoader subsystem, the runtime data area (memory space), the execution engine, and the local method interface . The runtime data area is composed of method area, Heap, Java stack, PC register, and local method stack .

It can also be seen in the memory space that the method area and heap are shared by all Java threads , while the Java stack, the local method stack, and the PC register are privately owned by each thread , which raises some questions that will be discussed in detail later in this article.

It is well known that the Java language has a cross-platform feature, which is also implemented by the JVM. More precisely, it is Sun's implementation of the JVM on different platforms to help us solve the problem of platform affinity, which is like the HTML language can be in different manufacturers of browser rendering elements (although some browsers have some problems with the support of the standard). At the same time, the Java language supports the invocation of local methods via JNI (Java Native Interface), but it is important to note that if you call the native method in a Java program, your program will probably no longer have a cross-platform, which means that the local method destroys platform independence.

Second, class loader subsystem (class Loader)

The class loader subsystem is responsible for loading the compiled. Class bytecode file and loading the memory so that the JVM can instantiate or otherwise use the loaded class. The class loading subsystem of the JVM supports dynamic loading at runtime, and there are many advantages of dynamic loading, such as saving memory space and flexibly loading classes from the network, another benefit of dynamic loading is that the separation of classes can be achieved through namespace separation, thus enhancing the security of the whole system.

1, the ClassLoader classification:

A. Launch class loader (BootStrap class Loader): Responsible for loading all Java classes in the Rt.jar file, that is, the core class of Java is loaded by the ClassLoader. In Sun jdk, this classloader is implemented by C + + and is not available for reference in the Java language.

B. Extension class loader (Extension class Loader): A jar package that is responsible for loading some extended functionality.

C. System class loader (Loader): Responsible for loading the jar packages and directories in the classpath specified in the startup parameters, usually the Java classes we write ourselves are also loaded by the ClassLoader. In the Sun JDK, the system ClassLoader is named Appclassloader.

d. user-defined class loader (Users Defined class Loader): The loading rules of the user-defined class, you can manually control the steps in the loading process.

2. Working principle of ClassLoader

Class loading is divided into three steps: Load, link, and initialize.

A. Loading

The class is loaded with the fully qualified name and ClassLoader of the class, primarily by loading the specified. class file into the JVM. When the class is loaded, the class is marked with the fully qualified name +classloader instance ID of the class within the JVM.

In memory, instances of ClassLoader instances and classes are located in the heap, and their class information is in the method area.

The loading process employs a method known as the parent delegation model, and when a classloader loads a class, it asks its parent classloader (in fact there are only two ClassLoader, So called the parent ClassLoader may be easier to understand) the load class, and its parent ClassLoader will continue to commit the load request to a higher level of classloader until the class loader is started. Only the parent ClassLoader cannot load the specified class, it will load the class itself.

The parent delegation model is the JVM's first security line of defense, which guarantees the safe loading of classes, which also relies on the principle of class loader isolation: There is no direct interaction between classes loaded by the same loader, and even if the same class is loaded by different classloader, they cannot perceive each other's existence. Thus, even if a malicious class impersonates itself under a core package (for example, Java.lang), it cannot be compromised because it cannot be loaded by the startup ClassLoader.

It can also be seen that if the user customizes the ClassLoader, it is necessary to secure the class loading process.

B. Links

The task of linking is to merge the binary type information into the JVM runtime state.

The link is divided into the following three steps:

A. Validation: Verify the correctness of the. class file, ensure that it is compliant with the specification, and is suitable for use by the current JVM.

B. Prepare: Allocates memory for the class, while static variables in the initialization class are assigned the default values.

C. Parsing (optional): The main point is to resolve the symbolic reference in the class's constant pool to a direct reference, which can be resolved when the appropriate reference is used.

C. Initialization

Initializes a static variable in the class and executes the static code, constructor, in the class.

The JVM specification strictly defines when a class needs to be initialized:

A, when instantiating an object by using the New keyword, reflection, clone, and deserialization mechanism.

B, when invoking a static method of a class.

C, when you use a static field of a class or assign a value to it.

D. When invoking a method of a class by reflection.

E, when initializing subclasses of the class (the parent class must have been initialized before the subclass is initialized).

F, the class that is marked as the startup class when the JVM is started (simply understood as a class with the Main method).

Third, Java stack (Java stack)

The Java stack consists of stack frames, one frame for one method call. When the method is called, it presses into the stack frame, and the method returns and discards the stack frame. The main task of the Java stack is to store method parameters, local variables, intermediate operation results, and provide some data that other modules need to work on. As mentioned earlier, the Java stack is thread-private, which guarantees thread safety and allows programmers to not consider the problem of stack synchronization access, only the thread itself can access its own local variable area.

It is divided into three parts: local variable area, operand stack, frame data area.

1. Local variable Area

The local variable area is an array in Word length, where the byte, short, and char types are converted to the type int store, except that long and double types occupy a length of two words. In particular, the Boolean type is converted to an int or byte type at compile time, and the Boolean array is treated as an array of type Byte. The local variable area also contains references to objects, including class references, interface references, and array references.

The local variable area contains the method parameters and local variables, in addition, the instance method implies the first local variable, this, which points to the object reference that called the method. For an object, there is always only a reference to the heap in the local variable area.

2, the operation of the stack

The operand stack is also an array in word length, but as its name is, it can only be used in the basic operation of the stack. At the time of calculation, the operand is popped up, and then the stack is counted.

3. Frame Data area

The main tasks of the frame data area are:

A. Record a pointer to a constant pool of classes for easy resolution.

B. The normal return of the Help method, including restoring the stack frame that called the method, setting the PC register to point to the next instruction corresponding to the calling method, and pressing the return value into the stack of operands of the call stack frame.

C. Record the exception table, when an exception occurs, the control is handed to the catch clause of the corresponding exception, and if no corresponding catch clause is found, the stack frame of the calling method is resumed and the exception is re-thrown.

The size of the local variable area and the operand stack is determined at compile time according to the specific method. When the method is called, the type information of the corresponding class is found from the method area, and the local variable area of the concrete method and the size of the operand stack are obtained, which allocates the stack frame memory and presses into the Java stack.

Iv. Local methods Stack (Native method Stack)

The local method stack is similar to the Java stack and primarily stores the state of the local method call. In the Sun JDK, the local method stack and the Java stack are the same.

V. Method area

The type information and static variables of the class are stored in the method area. The following data is stored for each class in the method area:

a. The fully qualified name of the class and its parent class (Java.lang.Object no parent)

b. Class type (class or Interface)

c. Access modifier (public, abstract, final)

d. List of fully qualified names of implemented interfaces

e. Constant pool

f. Field information

g. Method information

h. Static variable

i.classloader refers to

j. Class reference

All information for the visible class is stored in the method area. Because the method area is shared by all threads, it is important to ensure thread safety, for example, if two classes are loading a class that has not yet been loaded, then a class will request its classloader to load the required class, and the other class can wait instead of loading repeatedly.

Also, in order to speed up the invocation of a method, a private method table is usually created for each non-abstract class, which is an array that holds a direct reference to the instance method that the instance might be called. The method table is very important for polymorphism, and it can be referred to "the realization of polymorphism" in the article "on the significance and realization of polymorphic mechanism".

In the sun JDK, the method area corresponds to the persistent generation (Permanent Generation), the default minimum value is 16MB and the maximum value is 64MB.

Vi. Heaps (heap)

The heap is used to store object instances and array values. There is a pointer to the class data in the heap that points to the corresponding type information in the method area. A pointer to a method table may also be stored in the heap. The heap is shared by all threads, so you need to resolve synchronization issues when you instantiate objects, and so on. In addition, object locks are included in the instance data in the heap, and data such as reference counts or sweep flags may be stored for different garbage collection strategies.

In the management of the heap, Sun JDK introduced the way of generational management from version 1.2. Mainly divided into the new generation, the old generation. The generational approach greatly improves the efficiency of garbage collection.

1. Cenozoic (New Generation)

In most cases, new objects are allocated in the Cenozoic, and the Cenozoic is made up of Eden Space and two blocks of survivor space of the same size, both of which are used primarily for object replication when Minor GC (the process of Minor GC is not discussed in detail here).

The JVM will open up a small, separate Tlab (Thread Local Allocation Buffer) region for more efficient memory allocation in Eden Space, and we know that allocating memory on the heap requires locking the entire heap, which is not required on Tlab. The JVM allocates objects as much as possible on tlab to increase efficiency.

2. Old Generation/tenuring Generation

In the new generation, the object that survives longer will be transferred to the old generation, and the generation of garbage collection is not high in the Cenozoic.

VII. Implementation Engine

Execution engine is the core of JVM executing Java bytecode, which is mainly divided into explanation execution, compiling execution, adaptive optimization execution and hardware chip execution mode.

The JVM's instruction set is based on stacks rather than registers, and the advantage is that it makes the instructions as compact as possible and facilitates fast transmission over the network (not forgetting that Java was originally designed for the network), and that it was easy to adapt to a platform with fewer general-purpose registers and to facilitate code optimization. Because Java stacks and PC registers are thread-private, threads cannot interfere with each other's stacks. Each thread has a separate instance of the JVM execution engine.

JVM directives consist of a single-byte opcode and several operands. For instructions that require an operand, the operand is usually pressed into the operand stack, even if the local variable is assigned a value, and then the stack is first assigned. Note that this is a "normal" situation, and then you'll talk about exceptions due to optimizations.

1. Interpretation and execution

Similar to some dynamic languages, the JVM can interpret the execution byte code. The Sun JDK uses a token-threading approach, and interested students can take a closer look.

There are several optimization methods for interpreting execution:

A. Stack top cache

The value at the top of the operand stack is cached directly on the register, and for most instructions requiring only one operand, there is no need to re-enter the stack, which can be computed directly on the register, and the result is pressed into the operand station. This reduces the switching overhead of registers and memory.

B. Partial stack frame sharing

The called method can use the operand stack in the call method stack frame as its own local variable area, which reduces the cost of copying parameters when obtaining the method parameters.

C. Executing machine instructions

In some special cases, the JVM executes machine instructions to increase speed.

2. Compile and Execute

To speed up execution, the Sun JDK provides support for compiling bytecode into machine instructions, primarily using the JIT (just-in-time) compiler to compile at run time, which compiles the bytecode to machine code and caches the first time it is executed, and can then be reused. Oracle JRockit uses full compilation execution.

3. Adaptive optimization Execution

The idea of adaptive optimization execution is that the 10%~20% code in the program takes up the execution time of the 80%~90%, so the execution efficiency can be greatly improved by compiling the few pieces of code into the optimized machine code. The typical representative of adaptive optimization is Sun's hotspot VM, which, as its name indicates, the JVM monitors the execution of the code, and when it is determined that a particular method is a bottleneck or hotspot, a background thread is started and the bytecode of the method is compiled into extremely optimized, statically linked C + + code. When the method is no longer a hot zone, the compiled code is canceled and the interpretation is performed again.

Adaptive optimization Not only uses a small amount of compilation time to achieve most of the efficiency gains, but also because of the implementation process of monitoring at all times, the internal code and other optimization has played a big role. Because of the object-oriented polymorphism, a method may correspond to many different implementations, and adaptive optimization can greatly reduce the size of the inline function by monitoring only those code used in the inline.

The Sun JDK is compiled with two modes: client and server mode. The former is more lightweight and consumes less memory. The latter has a higher optimization program and more memory.

In server mode, the escape analysis of the object is performed, that is, whether the object in the method is used outside the method, and if it is used by another method, the object is escaped. For non-escaping objects, the JVM allocates objects directly on the stack (so the object is not necessarily allocated on the heap), the thread gets the object faster, and when the method returns, it facilitates garbage collection of the object because the stack frame is discarded. The server mode also removes unnecessary synchronizations through analysis, and interested students can study the biased locking mechanism introduced by Sun JDK 6.

In addition, the execution engine must also ensure thread safety, so JMM (Java Memory Model) is also ensured by the execution engine.

The architecture of the JVM

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.