In-depth research on Java Virtual Machine
Author: Liu xuechao
1 Java technology and Java Virtual Machine
Speaking of Java, the first thing people think of is JavaProgramming LanguageHowever, Java is actually a technology that consists of four aspects: Java programming language, Java file format, Java Virtual Machine and Java applicationProgramJava API ). Shows the relationship between them:
Figure 1 Relationship between four aspects of Java
Runtime Environment represents the Java platform, developers write JavaCode(. Java file), and then compile it into a bytecode (. Class file ). The final bytecode is loaded into the memory. Once the bytecode enters the virtual machine, it will be interpreted and executed by the interpreter, or converted to the machine code by the real-time code generator. We can also see that the Java platform is built by the Java Virtual Machine and Java application interface, and the Java language is the channel to this platform. programs compiled and compiled in Java can run on this platform. Shows the structure of the Platform:
In the structure of the Java platform, it can be seen that the Java Virtual Machine (JVM) is at the core position and is the key for programs to be irrelevant to the underlying operating system and hardware. Below it is A porting interface, which consists of an adapter and a Java operating system. The platform-dependent part is called an adapter; JVM is implemented on a specific platform and operating system by porting interfaces. Above the JVM is the basic Java class library, extended class library, and their APIs, applications and applets compiled using Java APIs can run on any Java platform without considering the underlying platform) java platform independence is achieved by separating programs from operating systems.
So what is a Java Virtual Machine (JVM? When we talk about JVM, we may mean:
- Abstract description of JVM specifications;
- Specific implementation of JVM;
- A jvm instance generated during the running of the program.
Abstract descriptions of JVM specifications are a collection of concepts that have been described in detail in the book the Java Virtual Machine specification (Java Virtual Machine specifications; the specific implementation of JVM is either software or a combination of software and hardware, which has been implemented by many manufacturers and coexist on multiple platforms; java program running tasks are undertaken by a single JVM runtime instance. The Java Virtual Machine (JVM) discussed in this article is mainly applicable to the third scenario. It can be regarded as an imaginary machine, which is realized through software simulation on an actual computer. It has its own imaginary hardware, such as the processor, stack, and register, you can also use your own command system.
JVM has a clear task in its life cycle, that is, to run Java programs. Therefore, when a Java program is started, an instance of JVM is generated. When the program runs, the instance also disappears. Next we will conduct a more in-depth study on the JVM architecture and its running process.
2 architecture of Java Virtual Machine
As mentioned earlier, JVM can be implemented by different vendors. Due to vendor differences, the implementation of JVM is different. However, JVM can still implement cross-platform features, thanks to the architecture during JVM design.
We know that the behavior of a JVM instance is not only about itself, but also about its subsystems, storage areas, data types, and commands, they describe an abstract internal architecture of JVM. The purpose is not only to specify the internal architecture when JVM is implemented, but also to provide a way, it is used to strictly define external behaviors during implementation. Each JVM has two mechanisms: one is to load a class (class or interface) with the appropriate name, called the class loading subsystem; the other one is responsible for executing commands contained in loaded classes or interfaces, called the running engine. Each JVM also includes five parts: Method Area, heap, java stack, program counter, and local method stack. The architecture of these parts is composed of the class loader and running engine mechanism:
Figure 3 JVM Architecture
Each JVM instance has its own method domain and a heap. All threads running in the JVM share these regions. When a VM loads class files, it parses the class information contained in the binary data and places them in the method domain. When the program runs, JVM places all the objects initialized by the Program on the stack; each thread has its own program counter and java stack when it is created. The value of the program counter points to the next command to be executed, the java stack of the thread stores the status of the Java method called by the thread. The status of the local method call is stored in the local method stack, which depends on the specific implementation.
The following sections describe these parts respectively.
The execution engine is at the core of JVM. in Java Virtual Machine specifications, its behavior is determined by the instruction set. Although the specifications for each instruction explain in detail what should be done when the JVM executes the bytecode when it encounters an instruction, there is little to say about how to do it. The Java Virtual Machine supports approximately 248 bytecode. Each bytecode performs a basic CPU operation, for example, adding an integer to a register or subroutine transfer. The Java instruction set is equivalent to the Java program assembly language.
The commands in the Java Instruction Set contain a single-byte operator, which is used to specify the operation to be executed. There are also 0 or multiple operands to provide the parameters or data required for the operation. Many Commands have no operands and are composed of only one single-byte operator.
The execution process of the inner loop of the virtual machine is as follows: do {take an operator byte; execute an action based on the operator value;} while (the program has not ended)
Because of the simplicity of the command system, the virtual machine execution process is very simple, which is conducive to improving the execution efficiency. The number and size of operands in the command are determined by the operator. If the operand is larger than a byte, the storage order of the operand is higher than that of the byte. For example, a 16-bit parameter occupies two bytes and its value is:
The first byte x 256 + the second byte code.
Generally, the instruction stream is only byte aligned. The command tableswitch and lookup are exceptions. The two commands require forced 4-byte boundary alignment.
For local method interfaces, JVM implementation does not require support, or even completely unavailable. Sun implements the Java Local interface (JNI) for portability considerations. Of course, we can also design other local interfaces to replace Sun's JNI. However, these designs and implementations are complicated. Make sure that the garbage collector does not release objects that are being called by local methods.
The Java heap is a runtime data zone. The class instance (object) allocates space from it, and its management is handled by garbage collection: do not explicitly release objects for programmers. Java does not specify garbage collection for specific useAlgorithmYou can use a variety of algorithms according to system requirements.
The Java method area is similar to the compiled code in a traditional language or the text segment in a Unix process. It saves the method code (Compiled Java code) and symbol table. In the current Java implementation, the method code is not included in the garbage collection heap, but is planned to be implemented in future versions. Each class file contains the compiled code of a Java class or a Java interface. It can be said that the class file is the Execution Code file of the Java language. To ensure platform independence of class files, the Java Virtual Machine specification also describes the format of class files in detail. For details, see Sun's Java Virtual Machine specifications.
The registers of the Java Virtual Machine are used to save the running status of the machine, similar to some special registers in the microprocessor. Java Virtual Machine registers have four types:
- PC: Java program counter;
- Optop: pointer to the top of the operand stack;
- Frame: pointer to the execution environment of the current execution method ;.
- Vars: pointer to the first variable in the local variable area of the current execution method.
In the above architecture diagram, we are talking about the first kind, that is, the program counter. Once a thread is created, it has its own program counter. When a thread executes a Java method, it contains the address of the instruction that the thread is executing. However, if the thread executes a local method, the value of the program counter will not be defined.
The stack of a Java Virtual Machine has three areas: local variable zone, Runtime Environment zone, and operand zone.
Local Variable Area
Each Java method uses a fixed local variable set. They are addressing according to the word offset from the vars register. Local variables are all 32-bit. Long integers and double-precision floating-point numbers occupy the space of two local variables, but are addressed according to the index of the first local variable. (For example, if a local variable with index n is a double-precision floating point number, it actually occupies the storage space represented by index n and n + 1) the virtual machine specification does not require the 64-bit values in local variables to be 64-bit aligned. The virtual machine provides commands to load the values in local variables to the operand stack, and also to write the values in the operand stack into the local variables.
Runtime Environment zone
Information contained in the running environment is used for Dynamic Links, normal method return, and exception capturing.
Dynamic Link
The runtime environment includes pointers to the interpreter symbol table pointing to the current class and current method, which is used to support dynamic links of method code. The class file code of the method uses the symbol when referencing the method to be called and the variable to be accessed. Dynamic links translate symbolic method calls into actual method calls, and load necessary classes to explain symbols that have not yet been defined, translate variable access into an offset address corresponding to the storage structure of these variables during runtime. Dynamic Linking of methods and variables makes changes to other classes used in methods do not affect the code of this program.
Normal method return
If the current method ends normally, a returned command with the correct type will be returned when the called method is executed. The execution environment is used to restore the caller's register when the caller returns normally, and adds an appropriate value to the caller's program counter to skip the executed method call command, then, the invocation continues in the caller's execution environment.
Exception capture
An exception is called an error or exception in Java. It is a subclass of the throwable class. The cause in the program is: ① Dynamic Link error, if you cannot find the required class file. ② Runtime error, such as a reference to a null pointer. The program uses the throw statement.
When an exception occurs, the Java Virtual Machine takes the following measures:
- Check the catch clause table associated with the current method. Each catch clause contains its valid instruction range, exception types that can be processed, and the address of code blocks that can handle exceptions.
- The catch clause that matches the exception should meet the following conditions: the exception-causing command is within the scope of its command, and the exception type is the child type of the exception type that it can handle. If a matched catch clause is found, the system transfers it to the specified exception processing block for execution. If no exception processing block is found, repeat the process of searching for a matched catch clause, until all nested catch clauses of the current method are checked.
- Since the VM continues executing from the first matched catch clause, the order in the catch clause table is very important. Because Java code is structured, you can always sort all the exception processors of a method into a table in order to view the values of any possible program counters, you can find appropriate Exception Processing blocks in a linear order to handle exceptions under the program counter value.
- If no matching catch clause is found, the current method returns a result of "no exceptions intercepted" and the caller of the current method, as if an exception had just occurred in the caller. If no exception handling block is found in the caller, the error will be propagated. If an error is propagated to the top layer, the system calls a default Exception Handling block.
Operand Stack
Machine commands only take the operands from the operand stack, operate on them, and return the results to the stack. The reason for choosing the stack structure is that the virtual machine behavior can be efficiently simulated on machines with only a few registers or non-General registers (such as intelease. The operand stack is 32-bit. It is used to pass parameters to a method and receive results from the method. It is also used to support operation parameters and save the operation results. For example, the iadd command adds two integers. The two integers added should be the two characters at the top of the operand stack. These two words are pushed into the stack by the previous commands. The two integers will pop up and add from the stack, and press the result back to the operand stack.
Each raw data type has a special command to perform required operations on them. Each operand needs a storage location in the stack, except for the long and double types, they need two locations. An operand can only be operated by an operator of its type. For example, it is invalid to press the numbers of two int types if they are treated as a long number. In Sun's Virtual Machine implementation, this restriction is enforced by the bytecode validators. However, there are a few operations (operators dupe and swap) that are used to perform operations on the runtime data zone regardless of the type.
Local method Stack: When a thread calls a local method, it is no longer subject to the structure and security constraints of the virtual machine. It can access the runtime data zone of the virtual machine, you can also use a local processor and any type of stack. For example, if the local stack is a C-language stack, when the C program calls the C function, the function parameters are pushed to the stack in some order, and the result is returned to the call function. When a Java Virtual Machine is implemented, the local method interface uses the C language model stack, so its local method stack scheduling and usage are completely the same as the C language stack.
3. Java virtual machine running process
This section gives a detailed description of each part of the VM. The following uses a specific example to analyze its running process.
The VM starts by calling the main method of a specified class and passes it to main a string array parameter to load the specified class, link other types used by the class, and initialize them. For example, for programs:
Class helloapp {public static void main (string [] ARGs) {system. Out. println ("Hello world! "); For (INT I = 0; I <args. length; I ++) {system. Out. println (ARGs [I]) ;}}
After compilation, type: Java helloapp run virtual machine in command line mode.
The Java Virtual Machine is started by calling the helloapp method main and passed to main an array containing three strings: "run", "virtual", and "machine. The following describes the steps that a VM may take when executing helloapp.
He started to execute the main method of the helloapp class and found that the class was not loaded. That is to say, the virtual machine does not currently contain the binary representation of the class, so the virtual machine uses classloader to find such a binary representation. If the process fails, an exception is thrown. Before the main method is called, the class helloapp must be linked to other types and initialized. The Link contains three phases: Test, preparation, and resolution. Check the symbols and semantics of the loaded main class, and create static fields of the class or interface and initialize these fields as standard default values, resolution checks the symbolic references of the main class to other classes or interfaces. This step is optional. Class initialization is the execution of the static initialization function declared in the class and the initialization constructor of the static domain. The parent class of a class must be initialized before initialization. The process is as follows:
Figure 4: virtual machine running process
4 Conclusion
This article, through in-depth research on the JVM architecture and a detailed analysis of the virtual machine running process during Java program execution, intends to clearly analyze the mechanism of the Java Virtual Machine.