Java Virtual Machine Architecture analysis

Source: Internet
Author: User

This blog post focuses on the components of the JVM (Java Virtual machine) and how they work within them. It is important to note that although most of the times we use the JVM provided by Sun (now acquired by Oracle), the JVM itself is a specification, so there are many implementations, in addition to hotspots, such as Oracle's JRockit, IBM's J9 are also very well-known JVMs.

1. Structure

Shows the main structure of the JVM:

As you can see, the JVM consists mainly of the ClassLoader subsystem, the runtime data area (memory space), the execution engine, and the local method interface . The runtime data area is composed of method area, Heap, Java stack, PC register, and local method stack .

It can also be seen in the memory space that the method area and heap are shared by all Java threads, while the Java stack, the local method stack, and the PC register are privately owned by each thread , which raises some questions that will be discussed in detail later in this article.

It is well known that the Java language has a cross-platform feature, which is also implemented by the JVM. More precisely, it is Sun's implementation of the JVM on different platforms to help us solve the problem of platform affinity, which is like the HTML language can be in different manufacturers of browser rendering elements (although some browsers have some problems with the support of the standard). At the same time, the Java language supports the invocation of local methods via JNI (Java Native Interface), but it is important to note that if you call the native method in a Java program, your program will probably no longer have a cross-platform, which means that the local method destroys platform independence.

2. Class Loader subsystem (class Loader)

The class loader subsystem is responsible for loading the compiled. Class bytecode file and loading the memory so that the JVM can instantiate or otherwise use the loaded class. The class loading subsystem of the JVM supports dynamic loading at runtime, and there are many advantages of dynamic loading, such as saving memory space and flexibly loading classes from the network, another benefit of dynamic loading is that the separation of classes can be achieved through namespace separation, thus enhancing the security of the whole system.

Classification of 2.1 ClassLoader
    • Launch class loader (BootStrap class Loader): Responsible for loading all Java classes in the Rt.jar file, that is, the core class of Java is loaded by that ClassLoader. In Sun jdk, this classloader is implemented by C + + and is not available for reference in the Java language.
    • Extension class loader (Extension class Loader): A jar package that is responsible for loading some extended functionality.
    • System Class Loader: Responsible for loading the jar packages and directories in the classpath specified in the startup parameters, usually the Java classes we write ourselves are also loaded by the ClassLoader. In the Sun JDK, the system ClassLoader is named Appclassloader.
    • user-defined class loader (users Defined class Loader): The load rules for user-defined classes allow you to manually control the steps in the loading process.
how the 2.2 classloader works

Class loading is divided into three steps: Load, link, and initialize.

2.2.1 Loading

The class is loaded with the fully qualified name and ClassLoader of the class, primarily by loading the specified. class file into the JVM. When the class is loaded, the class is marked with the fully qualified name +classloader instance ID of the class within the JVM.

In memory, instances of ClassLoader instances and classes are located in the heap, and their class information is in the method area.

The loading process employs a method known as the parent delegation model, and when a classloader loads a class, it asks its parent classloader (in fact there are only two ClassLoader, So called the parent ClassLoader may be easier to understand) the load class, and its parent ClassLoader will continue to commit the load request to a higher level of classloader until the class loader is started. Only the parent ClassLoader cannot load the specified class, it will load the class itself.

The parent delegation model is the JVM's first security line of defense, which guarantees the safe loading of classes, which also relies on the principle of class loader isolation: There is no direct interaction between classes loaded by the same loader, and even if the same class is loaded by different classloader, they cannot perceive each other's existence. Thus, even if a malicious class impersonates itself under a core package (for example, Java.lang), it cannot be compromised because it cannot be loaded by the startup ClassLoader.
It can also be seen that if the user customizes the ClassLoader, it is necessary to secure the class loading process.

2.2.2 Links

The task of linking is to merge the binary type information into the JVM runtime state.
The link is divided into the following three steps:

    1. Validation: Verifies the correctness of the. class file, ensures that the file is compliant with the specification, and is suitable for use by the current JVM.
    2. Prepare: Allocates memory for the class, while static variables in the initialization class are assigned the default values.
    3. Parsing (optional): The main point is to resolve the symbolic reference in the class's constant pool to a direct reference, which can then be parsed when the appropriate reference is used.
2.2.3 Initialization

Initializes a static variable in the class and executes the static code, constructor, in the class.
The JVM specification strictly defines when a class needs to be initialized:
1. When instantiating an object by using the New keyword, reflection, clone, and deserialization mechanism.
2. When invoking a static method of a class.
3. When you use a static field of a class or assign a value to it.
4. When you invoke a method of a class by reflection.
5. When initializing subclasses of the class (the parent class must have been initialized before the subclass is initialized).
6. The class that is marked as the startup class when the JVM is started (simply understood as a class with the Main method).
A more detailed virtual machine loading mechanism can refer to the Java Virtual machine class loading mechanism.

3. Run-time data area

The runtime data area consists of a method area, a heap, a Java stack, a PC register, and a local method stack.

3.1 java Stack (Java stack)

The main task of the Java stack is to store method parameters, local variables, intermediate operation results, and provide some data that other modules need to work on.

The Java stack is always associated with threads, and whenever a thread is created, the JVM creates the corresponding Java stack for that thread, which in turn contains multiple stack frames (stack frame) that are associated with each method, creating a stack frame each time a method is run. Each stack frame contains information such as local variables, operation Stacks, and method return values. Whenever a method executes, the stack frame pops up the element of the stack frame as the return value of the method, and the stack frame at the top of the Java stack is the stack frame that is currently executing, i.e. the current executing method, and the PC register points to that address. Only the local variables of this active stack frame can be used by the manipulation stack, and when another method is called in this stack frame, a new stack frame corresponding to it is created, and the newly created stack frame is placed on top of the Java stack and becomes the current active stack. Also now only the local variables of this stack can be used, when all the instructions in this stack frame are completed, the stack frame is removed from the Java stack, the stack frame just now becomes the active stack frame, the return value of the front stack frame becomes an operand of this stack frame's Operation Stack.

Since the Java stack is associated with threads, Java stack data is not shared by threads, so there is no need to care about its data consistency or the issue of synchronous locks.

It is divided into three parts: local variable area, operand stack, frame data area.

3.1.1 Local variable area

The local variable area is an array in Word length, where the byte, short, and char types are converted to the type int store, except that long and double types occupy a length of two words. In particular, the Boolean type is converted to an int or byte type at compile time, and the Boolean array is treated as an array of type Byte. The local variable area also contains references to objects, including class references, interface references, and array references.

The local variable area contains the method parameters and local variables, in addition, the instance method implies the first local variable, this, which points to the object reference that called the method. For an object, there is always only a reference to the heap in the local variable area.

3.1.2 Operand Stacks

The operand stack is also an array in word length, but as its name is, it can only be used in the basic operation of the stack. At the time of calculation, the operand is popped up, and then the stack is counted.

3.1.3 Frame Data Area

The main tasks of the frame data area are:
Records a pointer to a constant pool of classes for easy resolution.
The normal return of the helper method includes recovering the stack frame that invokes the method, setting the PC register to point to the next instruction corresponding to the calling method, and pressing the return value into the stack of operands of the call stack frame.
The exception table is logged, and when an exception occurs, the control is handed over to the catch clause of the corresponding exception, and if no corresponding catch clause is found, the stack frame of the calling method is resumed and the exception is re-thrown.

The size of the local variable area and the operand stack is determined at compile time according to the specific method. When the method is called, the type information of the corresponding class is found from the method area, and the local variable area of the concrete method and the size of the operand stack are obtained, which allocates the stack frame memory and presses into the Java stack.

In the Java Virtual Machine specification, there are two exceptions to this area: if the thread requests a stack depth greater than the virtual machine allows, the STACKOVERFLOWERROR exception will be thrown, and if the virtual machine can be dynamically extended, if the extension cannot request enough memory, The OutOfMemoryError exception is thrown.

3.2 Local method Stack (Native)

The local method stack is similar to the Java stack and primarily stores the state of the local method call. The difference is that the Java stack executes Java method Services for the JVM, while the local method stack executes the native method service for the JVM. The local method stack also throws Stackoverflowerror and OutOfMemoryError exceptions. In the Sun JDK, the local method stack and the Java stack are the same.

3.3 pc Register/Procedure counter (Program Count Register)

Strictly speaking, it is a data structure that holds the memory address of the currently executing program, and because Java supports multithreading, the trajectory of the program execution is not always linear. When multiple threads cross execution, the program of the interrupted thread is currently executing to which memory address must be saved so that the thread being used for the interruption resumes execution and then continues execution at the instruction address when it was interrupted. in order to return to the correct execution location after thread switching, each thread needs to have a separate program counter, the counters between the various threads do not affect each other, isolated storage, we call this kind of memory area as "thread-private" memory, which is somewhat similar to "ThreadLocal" in some way, is thread-safe.

3.4 Method Area

The type information and static variables of the class are stored in the method area. The following data is stored for each class in the method area:

    • The fully qualified name of the class and its parent class (Java.lang.Object has no parent)
    • Type of Class (class or Interface)
    • Access modifiers (public, abstract, final)
    • A list of fully qualified names of the implemented interfaces
    • Constant pool
    • Field information
    • Method information
    • static variables
    • ClassLoader references
    • Class reference

All information for the visible class is stored in the method area. Because the method area is shared by all threads , it is important to ensure thread safety, for example, if two classes are loading a class that has not yet been loaded, then a class will request its classloader to load the required class, and the other class can wait instead of loading repeatedly.

The constant pool itself is a data structure in the method area . Constant pools are stored in constants such as strings, final variable values, class names, and method names. The Chang is determined during compilation and is saved in the compiled. class file. Generally divided into two categories: literal and application volume. Literals are strings, final variables, and so on. The class name and method name belong to the reference amount. The most common reference amount is when invoking a method, a reference to the method is found based on the method name, and the function code is executed in the body. The reference amount consists of the name of the class and interface, the name and descriptor of the field, and the names and descriptors of the method.

Also, in order to speed up the invocation of a method, a private method table is usually created for each non-abstract class, which is an array that holds a direct reference to the instance method that the instance might be called.

In the sun JDK, the method area corresponds to the persistent generation (Permanent Generation), the default minimum value is 16MB and the maximum value is 64MB. The size can be set by parameters, and the initial value can be specified by-xx:permsize,-xx:maxpermsize specifies the maximum value.

3.5 Heaps (heap)

The heap is the largest piece of memory managed by the JVM and is shared by all Java thread locks, not thread-safe, created when the JVM is started.

The heap is used to store object instances and array values. The heap is where Java objects are stored, as described in the Java Virtual Machine specification: All object instances and arrays are allocated on the heap. There is a pointer to the class data in the heap that points to the corresponding type information in the method area. A pointer to a method table may also be stored in the heap. The heap is shared by all threads, so you need to resolve synchronization issues when you instantiate objects, and so on. In addition, object locks are included in the instance data in the heap, and data such as reference counts or sweep flags may be stored for different garbage collection strategies.

In the management of the heap, Sun JDK introduced the way of generational management from version 1.2. Mainly divided into the new generation, the old generation. The generational approach greatly improves the efficiency of garbage collection.

1. Cenozoic (New Generation)
In most cases, new objects are allocated in the Cenozoic, and the Cenozoic is made up of Eden Space and two blocks of survivor space of the same size, both of which are used primarily for object replication when Minor GC (the process of Minor GC is not discussed in detail here).
The JVM will open up a small, separate Tlab (Thread Local Allocation Buffer) region for more efficient memory allocation in Eden Space, and we know that allocating memory on the heap requires locking the entire heap, which is not required on Tlab. The JVM allocates objects as much as possible on tlab to increase efficiency.
2, the old age (generation/tenuring Generation)
In the new generation, the objects that survive longer will be transferred to the old age, and the frequency of garbage collection in the old age is not high in the Cenozoic.

4. Execution Engine

Execution engine is the core of JVM executing Java bytecode, which is mainly divided into explanation execution, compiling execution, adaptive optimization execution and hardware chip execution mode.

The JVM's instruction set is based on stacks rather than registers, and the advantage is that it makes the instructions as compact as possible and facilitates fast transmission over the network (not forgetting that Java was originally designed for the network), and that it was easy to adapt to a platform with fewer general-purpose registers and to facilitate code optimization. Because Java stacks and PC registers are thread-private, threads cannot interfere with each other's stacks. Each thread has a separate instance of the JVM execution engine.

JVM directives consist of a single-byte opcode and several operands. For instructions that require an operand, the operand is usually pressed into the operand stack, even if the local variable is assigned a value, and then the stack is first assigned. Note that this is a "normal" situation, and then you'll talk about exceptions due to optimizations.

4.1 Interpretation Execution

Similar to some dynamic languages, the JVM can interpret the execution byte code. The Sun JDK uses a token-threading approach, and interested students can take a closer look. There are several optimization methods for interpreting execution:

    • Stack Top cache: The value at the top of the operand stack is cached directly on the register, and for most instructions requiring only one operand, there is no need to re-enter the stack, which can be computed directly on the register, and the result is pressed into the operand station. This reduces the switching overhead of registers and memory.
    • Partial stack frame sharing: The called method can use the stack of operands in the call method stack as its own local variable area, which reduces the cost of copying parameters when obtaining the method parameters.
    • Execute machine Instructions: In some special cases, the JVM executes machine instructions to speed up.
4.2 Compile execution

To speed up execution, the Sun JDK provides support for compiling bytecode into machine instructions, primarily using the JIT (just-in-time) compiler to compile at run time, which compiles the bytecode to machine code and caches the first time it is executed, and can then be reused. Oracle JRockit uses full compilation execution.

4.3 Adaptive optimization Execution

The idea of adaptive optimization execution is that the 10%~20% code in the program takes up the execution time of the 80%~90%, so the execution efficiency can be greatly improved by compiling the few pieces of code into the optimized machine code. The typical representative of adaptive optimization is Sun's hotspot VM, which, as its name indicates, the JVM monitors the execution of the code, and when it is determined that a particular method is a bottleneck or hotspot, a background thread is started and the bytecode of the method is compiled into extremely optimized, statically linked C + + code. When the method is no longer a hot zone, the compiled code is canceled and the interpretation is performed again.

Adaptive optimization Not only uses a small amount of compilation time to achieve most of the efficiency gains, but also because of the implementation process of monitoring at all times, the internal code and other optimization has played a big role. Because of the object-oriented polymorphism, a method may correspond to many different implementations, and adaptive optimization can greatly reduce the size of the inline function by monitoring only those code used in the inline.

The Sun JDK is compiled with two modes: client and server mode. The former is more lightweight and consumes less memory. The latter has a higher optimization program and more memory.

In server mode, the escape analysis of the object is performed, that is, whether the object in the method is used outside the method, and if it is used by another method, the object is escaped. For non-escaping objects, the JVM allocates objects directly on the stack (so the object is not necessarily allocated on the heap), the thread gets the object faster, and when the method returns, it facilitates garbage collection of the object because the stack frame is discarded. The server mode also removes unnecessary synchronizations through analysis, and interested students can study the biased locking mechanism introduced by Sun JDK 6.

In addition, the execution engine must also ensure thread safety, so JMM (Java Memory Model) is also ensured by the execution engine.

Resources:
1. Simple analysis of Java Virtual machine and Java memory model
2. Java Virtual machine class loading mechanism
3. "In-depth understanding of Java Virtual Machine" Zhou Zhiming.

Java Virtual Machine Architecture analysis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.