JVM (Java Virtual Machine) is a specification used for computing devices. It can be implemented in different ways (software or hardware. The Instruction Set for compiling a virtual machine is very similar to the instruction set for compiling a microprocessor. The Java Virtual Machine includes a set of bytecode instruction sets, a set of registers, a stack, a garbage collection heap, and a storage method domain.
Java Virtual Machine (JVM) is a hypothetical computer that can run Java code. As long as the interpreter is transplanted to a specific computer according to the JVM Specification Description, it can ensure that any compiled Java code can run on the system.
A Java virtual machine is an imaginary machine that is simulated by software on an actual computer. The Java Virtual Machine has its own hardware, such as the processor, stack, and register. It also has the corresponding command system.
1. Java Virtual Machine (JVM) Overview
1. Why should I use a Java Virtual Machine?
A very important feature of Java is its independence from the platform. The use of Java virtual machines is the key to achieving this feature. General advanced languages
Run on, at least need to compile into different target code. After the Java Virtual Machine is introduced, the Java language does not need to be re-compiled when running on different platforms. Java virtual
The machine shields information related to specific platforms, so that the Java language compiler only needs to generate the target code (bytecode) that runs on the Java Virtual Machine ), it can be run on multiple platforms without modification.
When executing the bytecode, the Java Virtual Machine interprets the bytecode as a machine instruction execution on a specific platform.
2. Who needs to know about the Java Virtual Machine?
Java Virtual Machine (VM) is the underlying implementation basis of the Java language. Anyone interested in the Java language should have a general understanding of the Java Virtual Machine. This helps to understand the Java language
It also helps to use the Java language. For software personnel who want to implement Java virtual machines on a specific platform, the author of the Java language compiler and the use of hardware chips to implement Java virtual machines
Machine personnel must have a deep understanding of the specifications of Java virtual machines. In addition, if you want to extend the Java language or compile other languages into the bytecode of the Java language, you also need to have an in-depth understanding
Java virtual machine.
3. Data Types supported by Java Virtual Machine
Java Virtual Machine supports the following basic data types:
Byte: // complement of a signed integer in 1 byte
Short: // 2byte signed integer Complement
INT: // 4-byte signed integer Complement
Long: // 8-byte signed integer Complement
Float: // 4-byte ieee754 Single-precision floating point number
Double: // 8-byte ieee754 double-precision floating point number
CHAR: // 2-byte unsigned Unicode Character
Almost all Java type checks are completed at compilation. The raw data types listed above do not need to be marked with hardware during Java execution. Operate on these raw data classes
Type data bytecode (Instruction) itself has already pointed out the data type of the operand, for example, the iadd, Ladd, FADD, and dadd commands are to add two numbers, the operand type is int,
Long, float, and double. The VM does not set separate commands for the boolean type. Boolean data is composed of integer commands, including
Integer. Boolean arrays are processed using Byte arrays. The VM uses a floating point number in ieee754 format. Earlier IEEE format
When running a Java numeric computing program, it may be very slow.
Other data types supported by virtual machines include:
Object // 4-byte reference to a javaobject
Returnaddress // 4 bytes for the JSR/RET/JSR-W/ret-W command
Note: Java arrays are processed as objects.
The specifications of virtual machines have no special requirements on the internal structure of objects. In Sun's implementation, object reference is a handle, which contains a pair
Needle: one pointer points to the method table of the object, and the other points to the data of the object. Programs represented by bytecode of Java virtual machines should comply with Type rules. Java Virtual Machine implementation
The execution of bytecode programs that violate the type regulations should be rejected. The Java Virtual Machine can only run on 32-bit address space machines due to the bytecode definition restrictions. However, you can create a Java virtual machine.
Automatically converts bytecode into a 64-bit format. From the data types supported by Java virtual machines, we can see that Java strictly defines the internal formats of data types, so that various Java virtual machines
The implementation of the data is the same, so as to ensure that Java has nothing to do with the platform and can
Ii. JVM specifications of Java Virtual Machine
JVM is designed to provide a computer model based on abstract specification descriptions, which provides great flexibility for interpreter developers and ensures that Java code complies with this specification.
. JVM defines some aspects of its implementation, especially the Java executable code, that is, the bytecode format. This
The Specification includes the syntax and value of the operation code and operand, the numerical expression of the identifier, and the Java object in the Java class file, and the storage image of the constant buffer pool in JVM. These definitions are explained by JVM.
Developers provide the required information and development environment. Java designers hope to give developers the freedom to use Java as they wish. JVM defines control of Java code interpretation and execution
And the specific implementation of the five specifications, they are:
* JVM Command System
* JVM registers
* JVM stack structure
* JVM fragment collection heap
* JVM storage Zone
2.1jvm Command System
The JVM command system is very similar to the command system of other computers. Java commands are also operated by the operation code and
Composed of two parts. The operation code is an 8-bit binary number, and the operands follow the operation code. The length varies according to requirements. The operation code is used to specify the nature of a command operation (Here we use an assembly symbol
For example, iload indicates that an integer is loaded from the memory, anewarray indicates that space is allocated for a new array, and Iand indicates the two integers "and", RET is used for the stream
Process control, which indicates that it is returned from a call to a method. When the length is greater than 8 bits, the operands are divided into two or more bytes. JVM uses the "Big endian" encoding method to handle this situation.
Condition, that is, high bits is stored in low bytes. This is consistent with the encoding method used by Motorola and its CPU.
The encoding method of "little endian" is that the low bits is stored in the low byte in different ways.
The Java command system is designed for the implementation of the Java language. It contains commands used to call methods and monitor multi-process systems. The length of the Java 8-bit operating code allows the JVM to have a maximum of 256 commands. Currently, more than 160 operating codes are used.
2.2jvm Command System
All CPUs contain a register group used to save the system status and information required by the processor. If the Virtual Machine defines more messages
Memory, you can get more information from it without having to access the stack or memory, which is conducive to improving the running speed. However, if the virtual machine has more registers than the actual CPU
It takes a lot of time for the processor to use conventional memory to simulate registers, which reduces the efficiency of virtual machines. In this case, JVM only sets four of the most common registers. They are:
PC program counter
Optop operand stack top pointer
Frame current execution environment pointer
Vars pointer to the first local variable in the current execution environment
All registers are 32-bit. The PC is used to record program execution. Optop, frame, and vars are used to record pointers to the java stack.
2.3jvm stack structure
As a stack-based computer, java stack is the main method for JVM to store information. After JVM obtains a Java bytecode application, it creates a stack framework for each method of a class in the code to save the state information of the method. Each stack framework includes the following three types of information:
Local variables are used to store the local variables used in a class method. The vars register points to the first local variable in the variable table.
The execution environment is used to save the information required by the interpreter to interpret the Java bytecode. They are: the method called last time, the local variable pointer, And the stack top and bottom finger of the operand Stack
Needle. The execution environment is a control center for executing a method. For example, if the interpreter needs to execute iadd (integer addition), first find the current execution environment from the frame register, and then from the execution environment
Locate the operand stack, pop up two integers from the top of the stack for addition calculation, and finally press the result into the top of the stack.
The operand stack is used to store the operands required for the operation and the results of the operation.
2.4jvm fragment collection
The storage space required for Java-class instances is allocated on the stack. The interpreter is responsible for allocating space for class instances. After a bucket is allocated to an instance, the interpreter starts to record the usage of the memory occupied by the instance. Once the object is used, it is recycled to the heap.
In Java, there are no other methods except the new statement to apply for and release memory for an object. The memory is released and recycled by the Java operating system. This
Allows the Java operating system designer to decide the fragment collection method. In the Java interpreter and hot Java environment developed by Sun, fragment is executed in the background thread mode. This
It not only provides good performance for the running system, but also frees programmers from the risk of controlling memory usage.
2.5jvm storage Zone
JVM has two types of storage areas: constant buffer pool and method zone. Constant buffer pool is used to store class names, methods, field names, and strings
Constant. The method area is used to store the bytecode of the Java method. The implementation methods of these two storage regions are not clearly defined in the JVM specification. This makes the storage layout of Java applications must be running
It depends on the implementation method of the specific platform.
JVM is a type description defined for Java bytecode independent of a specific platform, and is the basis for Java platform independence. At present, the JVM still has some limitations and deficiencies, which need to be further improved. However, the idea of JVM is successful in any case.
Comparative Analysis: if we think of the original Java program as our original C ++ program, the bytecode generated after the original Java program is compiled is equivalent to 80x86 after the original C ++ program is compiled.
Machine code (Binary program file), JVM virtual machine is equivalent to 80x86 computer system, Java interpreter is equivalent to 80 x CPU. The machine code is run on 80 x86cpu.
The Java interpreter runs the Java bytecode.
The Java interpreter is equivalent to the "CPU" that runs the Java bytecode, but the "CPU" is not implemented by hardware, but by software. The Java interpreter is actually
Is an application under a specific platform. As long as the interpreter program is implemented on a specific platform, Java bytecode can be run on the platform through the interpreter program, which is the basis of Java cross-platform. Current,
Not all platforms have corresponding Java interpreter programs. This is why Java cannot run on all platforms, it can only run on platforms that have implemented Java interpreter programs
Iii. Architecture of Java Virtual Machine JVM
As mentioned earlier, JVM can be implemented by different vendors. Due to vendor differences, the implementation of JVM is different. However, JVM can still implement cross-platform features, thanks to the architecture during JVM design.
We know that the behavior of a JVM instance is not only about its own business, but also about its subsystems, storage areas, data types, and commands.
The internal architecture not only specifies the internal architecture when JVM is implemented, but also provides a way to strictly define external behavior when JVM is implemented. Each JVM has two mechanisms:
A class (class or interface) with a proper name is loaded, which is called the class loading subsystem. Another class is responsible for executing commands contained in the loaded class or interface, which is called the running engine. Each JVM includes methods
The five parts of zone, heap, java stack, program counter, and local method Stack are composed of the following components:
Figure 3jvm Architecture
Each JVM instance has its own method domain and a heap. All threads running in the JVM share these regions. When the VM loads class files, it resolves
And place them in the method domain. When the program is running, JVM places all objects initialized by the Program on the stack. When each thread is created, all have their own processes
The program counter points to the next instruction to be executed, the java stack of the thread stores the status of the Java method called by the thread, and the status of the local method call is
Stored in the local method stack, which depends on the specific implementation.
The following sections describe these parts respectively.
The execution engine is at the core of JVM. in Java Virtual Machine specifications, its behavior is determined by the instruction set. Although for each instruction, the specification details when JVM
What should I do when I run bytecode in case of instructions, but little is said about how to do it. The Java Virtual Machine supports approximately 248 bytecode. Each bytecode performs a basic CPU operation, for example
For example, add an integer to a register or subprogram transfer. The Java instruction set is equivalent to the Java program assembly language.
The commands in the Java Instruction Set contain a single-byte operator, which is used to specify the operation to be executed. There are also 0 or multiple operands to provide the parameters or data required for the operation. Many Commands have no operands and are composed of only one single-byte operator.
The execution process of the internal loop of the virtual machine is as follows:
Obtains an operator byte;
Execute an action based on the operator value;
} While (the program has not ended)
Because of the simplicity of the command system, the virtual machine execution process is very simple, which is conducive to improving the execution efficiency. The number and size of operands in the command are determined by the operator. If the operand is larger than a byte, the storage order of the operand is higher than that of the byte. For example, a 16-bit parameter occupies two bytes and its value is:
The first byte x 256 + the second byte code.
Generally, the instruction stream is only byte aligned. The command tableswitch and lookup are exceptions. The two commands require forced 4-byte boundary alignment.
For local method interfaces, JVM implementation does not require support, or even completely unavailable. Sun implements Java Local interface (JNI) for portability
Of course, we can also design other local interfaces to replace Sun's JNI. However, these designs and implementations are complicated. Make sure that the garbage collector will not
Objects called by the method are released.
The Java heap is a runtime data zone. The class instance (object) allocates space from it, and its management is handled by garbage collection: do not explicitly release objects for programmers. Java does not specify the specific garbage collection algorithm used. Various algorithms can be used according to system requirements.
The Java method area is similar to the compiled code in a traditional language or the text segment in a Unix process. It saves the method code (Compiled Java code) and symbol table. In the current
In Java implementation, the method code is not included in the garbage collection heap, but is planned to be implemented in future versions. Each class file contains the compiled code of a Java class or a Java interface. Class
The file is the Execution Code file of the Java language. To ensure platform independence of class files, the Java Virtual Machine specification also describes the format of class files in detail. For details, refer to Sun's
Java Virtual Machine specifications.
The registers of the Java Virtual Machine are used to save the running status of the machine, similar to some special registers in the microprocessor. Java Virtual Machine registers have four types:
PC: Java program counter;
Optop: pointer to the top of the operand stack;
Frame: pointer to the execution environment of the current execution method ;.
Vars: pointer to the first variable in the local variable area of the current execution method.
In the above architecture diagram, we are talking about the first kind, that is, the program counter. Once a thread is created, it has its own program counter. When a thread executes a Java method, it contains the address of the instruction that the thread is executing. However, if the thread executes a local method, the value of the program counter will not be defined.
The stack of a Java Virtual Machine has three areas: local variable zone, Runtime Environment zone, and operand zone.
Local Variable Area
Each Java method uses a fixed local variable set. They are addressing according to the word offset from the vars register. Local variables are all 32-bit. Long Integer and Double Precision Floating
Points occupy the space of two local variables, but are addressed according to the index of the first local variable. (For example, if a local variable with index n is a double-precision floating point number, it actually occupies index n.
And the storage space represented by n + 1) Virtual Machine standards do not require the 64-bit values in local variables to be 64-bit aligned. The virtual machine provides commands to load the values in local variables to the operand stack.
The instruction for writing values in the operand stack into local variables.
Runtime Environment zone
Information contained in the running environment is used for Dynamic Links, normal method return, and exception capturing.
The runtime environment includes pointers to the interpreter symbol table pointing to the current class and current method, which is used to support dynamic links of method code. The class file code of the method references the method to be called.
And the variable to be accessed. Dynamic links translate symbolic method calls into actual method calls, load necessary classes to interpret undefined symbols, and translate variable access
The corresponding offset address of the storage structure. Dynamic Linking of methods and variables makes changes to other classes used in methods do not affect the code of this program.
Normal method return
If the current method ends normally, a returned command with the correct type will be returned when the called method is executed. The execution environment is used to restore normally returned results
Registers of the caller, and adds an appropriate value to the caller's program counter to skip the executed method call command, and then continues execution in the caller's execution environment.
An exception is called an error or exception in Java. It is a subclass of the throwable class. The cause in the program is: ① Dynamic Link error, if you cannot find the required class file. ② Runtime error, such as a reference to a null pointer. The program uses the throw statement.
When an exception occurs, the Java Virtual Machine takes the following measures:
§ Check the catch clause table associated with the current method. Each catch clause contains its valid instruction range, exception types that can be processed, and the address of code blocks that can handle exceptions.
§ Catch clauses matching exceptions should comply with the following conditions: the exception-causing command is within the scope of its instruction, and the exception type is the child type of the exception type that it can handle. If you find
The system transfers the matched catch clause to the specified exception processing block for execution. If no exception processing block is found, repeat the matching catch clause process until all nested
All catch clauses have been checked.
§ As the VM continues executing from the first matched catch clause, the order in the catch clause table is very important. Because Java code is structured
All the exception processors of the method are arranged in order in a table. The values of any possible program counters can be found in a linear order, to process the differences under the counter value of this program
This is often the case.
§ If no matching catch clause can be found, the current method will get a result of "not intercepting exceptions" and return it to the caller of the current method, as if an exception had just occurred in the caller. For example
If no exception handling block is found in the caller, the error will be propagated. If an error is propagated to the top layer, the system calls a default Exception Handling block.
Machine commands only take the operands from the operand stack, operate on them, and return the results to the stack. The reason for choosing the stack structure is that there are only a few registers or non-General Register machines.
Such as intelease. The operand stack is 32-bit. It is used to pass parameters to the method and receive results from the method. It is also used to support operation parameters and save
Operation result. For example, the iadd command adds two integers. The two integers added should be the two characters at the top of the operand stack. These two words are pushed into the stack by the previous commands. The two integers will pop up from the stack,
And press the result back to the operand stack.
Each raw data type has a special command to perform required operations on them. Each operand needs a storage location in the stack, except for the long and double types, they need two
Location. An operand can only be operated by an operator of its type. For example, it is invalid to press the numbers of two int types if they are treated as a long number. Virtual Machine in Sun
Currently, this restriction is enforced by the bytecode validators. However, there are a few operations (operators dupe and swap) that are used to perform operations on the runtime data zone regardless of the type.
Local method Stack: When a thread calls a local method, it is no longer subject to the structure and security constraints of the virtual machine. It can access the data zone during the runtime of the virtual machine or
Use a local processor and any type of stack. For example, if the local stack is a C-language stack, when the C program calls the C function, the function parameters are pushed to the stack in some order, and the result is returned to the call function. In
When a Java virtual machine is running, the local method interface uses the C language model stack, so its local method stack scheduling and usage are completely the same as the C language stack.
Iv. JVM running process of Java Virtual Machine
This section gives a detailed description of each part of the VM. The following uses a specific example to analyze its running process.
The VM starts by calling the main method of a specified class and passes it to main a string array parameter to load the specified class, link other types used by the class, and initialize them. For example, for programs:
Public static void main (string  ARGs)
System. Out. println ("Hello world! ");
For (INT I = 0; I <args. length; I ++)
System. Out. println (ARGs );
After compilation, type: Java helloapp run virtual machine in command line mode.
The Java Virtual Machine is started by calling the helloapp method main and passed to main an array containing three strings: "run", "virtual", and "machine. The following describes the steps that a VM may take when executing helloapp.
Started to execute the main method of the helloapp class and found that the class is not loaded. That is to say, the virtual machine does not contain the binary representation of the class, so the virtual machine uses
Classloader tries to find such binary representation. If the process fails, an exception is thrown. Before the main method is called
Helloapp links to other types and then initializes them. The Link contains three phases: Test, preparation, and resolution. Check the symbols and semantics of the loaded main class.
And initialize these fields as standard default values. Parsing is responsible for checking the symbol reference of the main class to other classes or interfaces. It is optional in this step. Class initialization is the initial process of static initialization functions and static fields declared in the class.
Execution of the initial constructor. The parent class of a class must be initialized before initialization. The process is as follows:
Figure 4: Java virtual machine running process