Java Virtual Machine (JVM) is a hypothetical computer that can run Java code. As long as the interpreter is transplanted to a specific computer according to the JVM Specification Description, it can ensure that any compiled Java code can run on the system.
A Java virtual machine is an imaginary machine that is simulated by software on an actual computer. The Java Virtual Machine has its own hardware, such as the processor, stack, and register. It also has the corresponding command system.
1. Java Virtual Machine (JVM) Overview
1. Why should I use a Java Virtual Machine?
A very important feature of Java is its independence from the platform. The use of Java virtual machines is the key to achieving this feature. If a general advanced language needs to run on different platforms, at least it needs to be compiled into different target codes. After the Java Virtual Machine is introduced, the Java language does not need to be re-compiled when running on different platforms. Java language usage mode: the Java Virtual Machine shields information related to specific platforms, so that the Java language compiler only needs to generate the target code (bytecode) that runs on the Java Virtual Machine ), it can be run on multiple platforms without modification. When executing the bytecode, the Java Virtual Machine interprets the bytecode as a machine instruction execution on a specific platform.
2. Who needs to know about the Java Virtual Machine?
Java Virtual Machine (VM) is the underlying implementation basis of the Java language. Anyone interested in the Java language should have a general understanding of the Java Virtual Machine. This helps you understand some of the nature of the Java language and also help you use the Java language. For software developers who want to implement Java virtual machines on a specific platform, compiler authors of Java language and those who want to implement Java virtual machines using hardware chips, they must have a deep understanding of the specifications of Java virtual machines. In addition, if you want to expand the Java language or compile other languages into the bytecode of the Java language, you also need to have a deep understanding of the Java Virtual Machine.
3. Data Types supported by Java Virtual Machine
Java Virtual Machine supports the following basic data types:
Byte: // complement of a signed integer in 1 byte
Short: // 2byte signed integer Complement
INT: // 4-byte signed integer Complement
Long: // 8-byte signed integer Complement
Float: // 4-byte ieee754 Single-precision floating point number
Double: // 8-byte ieee754 double-precision floating point number
CHAR: // 2-byte unsigned Unicode Character
Almost all Java type checks are completed at compilation. The raw data type data listed above does not need to be marked by hardware during Java execution. The bytecode (Instruction) for operating these raw data types has already pointed out the data type of the operands. For example, the iadd, Ladd, FADD, and dadd commands add two numbers, the operands are int, long, float, and double. The VM does not set a single-independence command for the boolean type. Boolean data is processed by integer commands, including integer return. Boolean arrays are processed using Byte arrays. The VM uses a floating point number in ieee754 format. Older computers in IEEE format are not supported. It may be very slow when running Java numeric computing programs.
Other data types supported by virtual machines include:
Object // 4-byte reference to a javaobject
Returnaddress // 4 bytes for the JSR/RET/JSR-W/ret-W command
Note: Java arrays are processed as objects.
The specifications of virtual machines have no special requirements on the internal structure of objects. In Sun's implementation, an object reference is a handle, which contains a pair of pointers: one pointer points to the object's method table, and the other points to the object's data. Programs represented by bytecode of Java virtual machines should comply with Type rules. The implementation of Java virtual machines should reject the execution of bytecode programs that violate the class specification. The Java Virtual Machine can only run on 32-bit address space machines due to the bytecode definition restrictions. However, you can create a Java virtual machine that automatically converts bytecode into a 64-bit format. From the data types supported by Java virtual machines, we can see that Java strictly defines the internal formats of data types, so that the implementation of various Java virtual machines has the same interpretation of data, this ensures that Java is platform independent and
Portability.
Ii. JVM specifications of Java Virtual Machine
The JVM is designed to provide a computer model based on abstract specification descriptions and run it on any system that explains how the program developer applies. JVM defines some aspects of its implementation, especially the Java executable code, that is, the bytecode format. This specification includes the syntax and value of the operation code and operand, the numerical expression of the identifier, and the Java object in the Java class file, and the storage image of the constant buffer pool in the JVM. These definitions provide JVM interpreter developers with the required information and development environment. Java designers hope to give developers the freedom to use Java as they wish. JVM defines five specifications that control Java code interpretation execution and implementation. They are:
* JVM Command System
* JVM registers
* JVM stack structure
* JVM fragment collection heap
* JVM storage Zone
2.1jvm Command System
The JVM command system is very similar to the command system of other computers. Java commands are composed of operation codes and operands. The operation code is an 8-bit binary number, and the operands follow the operation code. The length varies according to requirements. The operation code is used to specify the nature of a command operation (Here we use the form of Assembly symbols to describe). For example, iload indicates loading an integer from the memory, anewarray is used to allocate space for a new array, and Iand is used to represent the "and" of two integers. RET is used for process control, which indicates that it is returned from a call to a method. When the length is greater than 8 bits, the operands are divided into two or more bytes. JVM uses the "Big endian" encoding method to handle this situation, that is, high bits are stored in low bytes. This is consistent with the encoding method used by Motorola and its CPU. It is different from Intel's "little endian" encoding method, that is, the method for storing low bits in low bytes.
The Java command system is designed for the implementation of the Java language. It contains commands used to call methods and monitor multi-process systems. The length of the Java 8-bit operating code allows the JVM to have a maximum of 256 commands. Currently, more than 160 operating codes are used.
2.2jvm Command System
All CPUs contain a register group used to save the system status and information required by the processor. If the Virtual Machine defines more registers, it can obtain more information from them without having to access the stack or memory, which improves the running speed. However, if there are more registers in the virtual machine than the actual CPU registers, it will take a lot of time for the processor to use the regular memory to simulate registers, which will reduce the efficiency of the virtual machine. In this case, JVM only sets four of the most common registers. They are:
PC program counter
Optop operand stack top pointer
Frame current execution environment pointer
Vars pointer to the first local variable in the current execution environment
All registers are 32-bit. The PC is used to record program execution. Optop, frame, and vars are used to record pointers to the java stack.
2.3jvm stack structure
As a stack-based computer, java stack is the main method for JVM to store information. After JVM obtains a Java bytecode application, it creates a stack framework for each method of a class in the code to save the state information of the method. Each stack framework includes the following three types of information:
Local variable
Execution Environment
Operand Stack
Local variables are used to store the local variables used in a class method. The vars register points to the first local variable in the variable table.
The execution environment is used to save the information required by the interpreter to interpret the Java bytecode. They are: the method called last time, the local variable pointer, And the stack top and bottom pointer of the operand stack. The execution environment is a control center that executes a method. For example, if the interpreter needs to execute iadd (integer addition), first find the current execution environment from the frame register, and then find the operand Stack from the execution environment, two integers are displayed from the top of the stack for addition calculation, and the result is pushed to the top of the stack.
The operand stack is used to store the operands required for the operation and the results of the operation.
2.4jvm fragment collection
The storage space required for Java-class instances is allocated on the stack. The interpreter is responsible for allocating space for class instances. After a bucket is allocated to an instance, the interpreter starts to record the usage of the memory occupied by the instance. Once the object is used, it is recycled to the heap.
In Java, there are no other methods except the new statement to apply for and release memory for an object. The memory is released and recycled by the Java operating system. This allows the designers of Java runtime systems to decide their own methods for fragment collection. In the Java interpreter and hot Java environment developed by Sun, fragment is executed in the background thread mode. This not only provides good performance for the running system, but also frees programmers from the risk of controlling memory usage.
2.5jvm storage Zone
JVM has two types of storage areas: constant buffer pool and method zone. Constant buffer pool is used to store class names, methods, field names, and string constants. The method area is used to store the bytecode of the Java method. The implementation methods of these two storage regions are not clearly defined in the JVM specification. Therefore, the storage layout of Java applications must be determined during running and depends on the implementation method of the specific platform.
JVM is a type description defined for Java bytecode independent of a specific platform, and is the basis for Java platform independence. At present, the JVM still has some limitations and deficiencies, which need to be further improved. However, the idea of JVM is successful in any case.
Comparative Analysis: if we think of the original Java program as our original C ++ program, the bytecode generated after the original Java program is compiled is equivalent to the 80x86 machine code (Binary program file) compiled by the C ++ original program. The JVM virtual machine is equivalent to the 80x86 computer system, the Java interpreter is equivalent to 80x86cpu. The machine code is run on 80 x86cpu, And the Java bytecode is run on the Java interpreter.
The Java interpreter is equivalent to the "CPU" that runs the Java bytecode, but the "CPU" is not implemented by hardware, but by software. The Java interpreter is actually an application on a specific platform. As long as the interpreter program is implemented on a specific platform, Java bytecode can be run on the platform through the interpreter program, which is the basis of Java cross-platform. Currently, not all platforms have corresponding Java interpreter programs. This is why Java cannot run on all platforms, it can only run on a platform that has implemented a Java interpreter program.
Iii. Architecture of Java Virtual Machine JVM
As mentioned earlier, JVM can be implemented by different vendors. Due to vendor differences, some JVM implementations are inevitable ?? This is due to the architecture used to design the JVM.
We know that the behavior of a JVM instance is not only about itself, but also about its subsystems, storage areas, data types, and commands, they describe an abstract internal architecture of JVM. The purpose is not only to specify the internal architecture when JVM is implemented, but also to provide a method, it is used to strictly define external behaviors during implementation. Each JVM has two mechanisms: one is to load a class (class or interface) with the appropriate name, called the class loading subsystem; the other one is responsible for executing commands contained in loaded classes or interfaces, called the running engine. Each JVM also includes five parts: Method Area, heap, java stack, program counter, and local method stack. The architecture of these parts is composed of the class loader and running engine mechanism:
Figure 3jvm Architecture
Each JVM instance has its own method domain and a heap. All threads running in the JVM share these regions. When a VM loads class files, it parses the class information contained in the binary data and places them in the method domain. When the program runs, JVM places all the objects initialized by the Program on the stack; each thread has its own program counter and java stack when it is created. The value of the program counter points to the next command to be executed, the java stack of the thread stores the status of the Java method called by the thread. The status of the local method call is stored in the local method stack, which depends on the specific implementation.
The following sections describe these parts respectively.
The execution engine is at the core of JVM. in Java Virtual Machine specifications, its behavior is determined by the instruction set. Although the specifications for each instruction explain in detail what should be done when the JVM executes the bytecode when it encounters an instruction, there is little to say about how to do it. The Java Virtual Machine supports approximately 248 bytecode. Each bytecode performs a basic CPU operation, for example, adding an integer to the memory generator or subroutine transfer. The Java instruction set is equivalent to the Java program assembly language.
The commands in the Java Instruction Set contain a single-byte operator, which is used to specify the operation to be executed. There are also 0 or multiple operands to provide the parameters or data required for the operation. Many Commands have no operands and are composed of only one single-byte operator.
The execution process of the internal loop of the virtual machine is as follows:
Do {
Obtains an operator byte;
Execute an action based on the operator value;
} While (the program has not ended)
Because of the simplicity of the command system, the virtual machine execution process is very simple, which is conducive to improving the execution efficiency. The number and size of operands in the command are determined by the operator. If the operand is larger than a byte, the storage order of the operand is higher than that of the byte. For example, a 16-bit parameter occupies two bytes and its value is:
The first byte x 256 + the second byte code.
Generally, the instruction stream is only byte aligned. The command tableswitch and lookup are exceptions. The two commands require forced 4-byte boundary alignment.
For local method interfaces, JVM implementation does not require support, or even completely unavailable. Sun implements the Java Local interface (JNI) for portability considerations. Of course, we can also design other local interfaces to replace Sun's JNI. However, these designs and implementations are complicated. Make sure that the garbage collector does not release objects that are being called by local methods.
The Java heap is a runtime data zone. The class instance (object) allocates space from it, and its management is handled by garbage collection: do not explicitly release objects for programmers. Java does not specify the specific garbage collection algorithm used. Various algorithms can be used according to system requirements.
The Java method area is similar to the compiled code in a traditional language or the text segment in a Unix process. It saves the method code (Compiled Java code) and symbol table. In the current Java implementation, the method code is not included in the garbage collection heap, but is planned to be implemented in future versions. Each class file contains the compiled code of a Java class or a Java interface. It can be said that the class file is the Execution Code file of the Java language. To ensure platform independence of class files, the Java Virtual Machine specification also describes the format of class files in detail. For details, see Sun's Java Virtual Machine specifications.
The registers of the Java Virtual Machine are used to save the running status of the machine, similar to some special registers in the microprocessor. Java Virtual Machine registers have four types:
PC: Java program counter;
Optop: pointer to the top of the operand stack;
Frame: pointer to the execution environment of the current execution method ;.
Vars: pointer to the first variable in the local variable area of the current execution method.
In the above architecture diagram, we are talking about the first kind, that is, the program counter. Once a thread is created, it has its own program counter. When a thread executes a Java method, it contains the address of the instruction that the thread is executing. However, if the thread executes a local method, the value of the program counter will not be defined.
The stack of a Java Virtual Machine has three areas: local variable zone, Runtime Environment zone, and operand zone.
Local Variable Area
Each Java method uses a fixed local variable set. They are addressing according to the word offset from the vars register. Local variables are all 32-bit. Long integers and double-precision floating-point numbers occupy the space of two local variables, but are addressed according to the index of the first local variable. (For example, if a local variable with index n is a double-precision floating point number, it actually occupies the storage space represented by index n and n + 1) the virtual machine specification does not require the 64-bit values in local variables to be 64-bit aligned. The virtual machine provides commands to load the values in local variables to the operand stack, and also to write the values in the operand stack into the local variables.
Runtime Environment zone
Information contained in the running environment is used for Dynamic Links, normal method return, and exception capturing.
Dynamic Link
The runtime environment includes pointers to the interpreter symbol table pointing to the current class and current method, which is used to support dynamic links of method code. The class file code of the method uses the symbol when referencing the method to be called and the variable to be accessed. Dynamic links translate symbolic method calls into actual method calls, and load necessary classes to explain symbols that have not yet been defined, translate the variable access questions into the offset address corresponding to the storage structure of these variables during runtime. Dynamic Linking of methods and variables makes changes to other classes used in methods do not affect the code of this program.
Normal method return
If the current method ends normally, a returned command with the correct type will be returned when the called method is executed. The execution environment is used to restore the caller's register when the caller returns normally, and adds an appropriate value to the caller's program counter to skip the executed method call command, then, the invocation continues in the caller's execution environment.
Exception capture
An exception is called an error or exception in Java. It is a subclass of the throwable class and is an error during running of the program, such as a reference to a null pointer. The program uses the throw statement.
When an exception occurs, the Java Virtual Machine takes the following measures:
§ Check the catch clause table associated with the current method. Each catch clause contains its valid instruction range, exception types that can be processed, and the address of code blocks that can handle exceptions.
§ Catch clauses matching exceptions should comply with the following conditions: the exception-causing command is within the scope of its instruction, and the exception type is the child type of the exception type that it can handle. If a matched catch clause is found, the system transfers it to the specified exception processing block for execution. If no exception processing block is found, repeat the process of searching for a matched catch clause, until all nested catch clauses of the current method are checked.
§ As the VM continues executing from the first matched catch clause, the order in the catch clause table is very important. Because Java code is structured, you can always sort all the exception processors of a method into a table in order to view the values of any possible program counters, you can find appropriate Exception Processing blocks in a linear order to handle exceptions under the program counter value.
§ If no matching catch clause can be found, the current method will get a result of "not intercepting exceptions" and return it to the caller of the current method, as if an exception had just occurred in the caller. If no exception handling block is found in the caller, the error will be propagated. If an error is propagated to the top layer, the system calls a default Exception Handling block.
Operand Stack
Machine commands only take the operands from the operand stack, operate on them, and return the results to the stack. The reason for choosing the stack structure is that the virtual machine behavior can be efficiently simulated on machines with only a few registers or non-General registers (such as intelease. The operand stack is 32-bit. It is used to pass parameters to a method and receive results from the method. It is also used to support operation parameters and save the operation results. For example, the iadd command adds two integers. The two integers added should be the two characters at the top of the operand stack. These two words are pushed into the stack by the previous commands. The two integers will pop up from the stack, add them together, and press the result back to the operand stack.
Each raw data type has a special command to perform required operations on them. Each operand needs a storage location in the stack, except for the long and double types, they need two locations. An operand can only be operated by an operator of its type. For example, it is invalid to press the numbers of two int types if they are treated as a long number. In Sun's Virtual Machine implementation, this restriction is enforced by the bytecode validators. However, there are a few operations (operators dupe and swap) that are used to perform operations on the runtime data zone regardless of the type.
Local method Stack: When a thread calls a local method, it is no longer subject to the structure and security constraints of the virtual machine. It can access the runtime data zone of the virtual machine, you can also use a local processor and any type of stack. For example, if the local stack is a C-language stack, when the C program calls the C function, the function parameters are pushed to the stack in some order, and the result is returned to the call function. When a Java Virtual Machine is implemented, the local method interface uses the C language model stack, so its local method stack scheduling and usage are completely the same as the C language stack.
Iv. JVM running process of Java Virtual Machine
This section gives a detailed description of each part of the VM. The following uses a specific example to analyze its running process.
The VM starts by calling the main method of a specified class and passes it to main a string array parameter to load the specified class, link other types used by the class, and initialize them. For example, for programs:
Class helloapp {<br/> Public static void main (string [] ARGs) {<br/> system. Out. println ("Hello world! "); <Br/> for (INT I = 0; I <args. length; I ++) {<br/> system. out. println (ARGs); <br/>}< br/>}After compilation, type: Java helloapp run virtual machine in command line mode.
The Java Virtual Machine is started by calling the helloapp method main and passed to main an array containing three strings: "run", "virtual", and "machine. The following describes the steps that a VM may take when executing helloapp.
He started to execute the main method of the helloapp class and found that the class was not loaded. That is to say, the virtual machine does not currently contain the binary representation of the class, so the virtual machine uses classloader to find such a binary representation. If the process fails, an exception is thrown. Before the main method is called, the class helloapp must be linked to other types and initialized. The Link contains three phases: Test, preparation, and resolution. Check the symbols and semantics of the loaded main class, and create static fields of the class or interface and initialize these fields as the standard default values, resolution checks the symbolic references of the main class to other classes or interfaces. This step is optional. Class initialization is the execution of the static initialization function declared in the class and the initial constructor of the static domain. The parent class of a class must be initialized before initialization. The process is as follows:
Figure 4: Java virtual machine running process