A Java Virtual machine (JVM) is an imaginary computer that can run Java code. As long as the interpreter is ported to a specific computer based on the JVM specification description, any compiled Java code can be guaranteed to run on that system. This article begins with a brief introduction to the process of compiling from the Java file to the final execution, followed by a description of the JVM specification.
I. Compilation, download, interpretation and execution of Java source files
The development cycle for Java applications includes compiling, downloading, interpreting, and executing several parts. The Java compiler translates the Java source program into a JVM executable bytecode. This compilation process is somewhat different from the compilation of C + +. When the C compiler compiles code that generates an object, the code is generated for running on a particular hardware platform. Therefore, during the compilation process, the compiler converts all references to symbols into a specific memory offset by looking up the table to ensure that the program runs. The Java compiler does not compile references to variables and methods into numeric references, and does not determine the memory layout during program execution, but rather retains the symbolic reference information in bytecode, which is created by the interpreter during the run, and then the address of a method is determined by looking up the table. This will effectively ensure the portability and security of Java.
The work of running the JVM bytecode is done by the interpreter. Explain the execution of the process in three parts: code loading, code validation, and code execution. The work of loading code is done by the class loader (loader). The class loader is responsible for loading all the code required to run a program, which also includes classes inherited by classes in program code and classes that are called by them. When a class loader loads a class, the class is placed in its own namespace. There is no other way between classes that can affect other classes, except by using symbols to refer to classes other than their namespaces. All classes on this computer are within the same address space, and all classes imported from the outside have a separate namespace. This allows local classes to achieve higher operational efficiencies by sharing the same namespace, while ensuring that they do not interact with classes imported from outside. When all the classes required by the running program are loaded, the interpreter can determine the memory layout of the entire executable program. The interpreter establishes correspondence and query tables for symbolic references with specific address spaces. By determining the memory layout of the code at this stage, Java nicely solves the problem of the subclass being broken by the superclass, and also prevents the code from illegally accessing the address.
The code that is loaded is then checked by the bytecode validator. The validator can find many errors such as overflow of operand stack, illegal data type conversion and so on. After the validation, the code begins to execute.
There are two ways to execute Java bytecode:
1. Instant compile: The interpreter compiles the bytecode into a machine code before executing the machine code.
2. Explanation of execution: The interpreter completes all operations of the Java bytecode program by interpreting and executing a small piece of code each time.
The second method is usually used. Because the JVM specification is sufficiently flexible, this makes it highly efficient to translate bytecode into machine code. For applications with high operating speeds, the interpreter can instantly compile Java bytecode into machine code, which guarantees the portability and high performance of Java code.
Two. JVM Specification description
The design goal of the JVM is to provide a computer model based on an abstract specification description that provides a good flexibility for the interpreter developer, while also ensuring that Java code runs on any system that conforms to the specification. The JVM gives a specific definition of some aspects of its implementation, especially the format of the Java executable code, the bytecode (bytecode), that gives a clear specification. This specification includes the syntax and values of opcode and operands, the numeric representation of identifiers, and the Java objects in Java class files, and the storage mappings of constant buffer pools in the JVM. These definitions provide the required information and development environment for the JVM interpreter developer. Java's designers want to give developers the freedom to use Java at their own whim.
The JVM defines five specifications that control Java code interpretation execution and implementation, which are:
JVM Command System
JVM Registers
JVM Stack Structure
JVM Fragment Reclaim Heap
JVM Storage Area
2.1JVM command System
The JVM instruction system is very similar to the instruction system of other computers. Java directives are also made up of two parts: opcode and operand. The opcode is a 8-bit binary number, and the operand is immediately behind the opcode, and its length is different depending on the need. The opcode is used to specify the nature of an instruction operation (as described here in the form of assembly symbols), such as Iload, which represents an integer loaded from memory, Anewarray represents a new array allocation space, Iand represents two integers "with", RET is used for process control, Represents a return from a call to a method. When the length is greater than 8 bits, the operand is divided into two or more bytes to be stored. The JVM uses the "big endian" encoding to handle this situation, where high bits is stored in a low byte. This is consistent with the way that Motorola and other RISC CPUs are encoded, while the "little endian" encoding used by Intel is different from the way low bits is stored in low-bit bytes.
The Java instruction system is designed with the implementation of the Java language, which contains instructions for invoking methods and monitoring multi-first-pass systems. The length of Java's 8-bit opcode allows the JVM to have a maximum of 256 instructions, and currently uses more than 160 opcode.
2.2JVM command System
All CPUs contain register groups that are used to hold the system State and the information required by the processor. If the virtual machine defines more registers, it can get more information from it without having to access the stack or memory, which helps to improve the running speed. However, if the registers in the virtual machine are more than the actual CPU registers, it will take a large amount of time for the processor to use conventional memory to simulate registers when implementing the virtual machine, which will reduce the efficiency of the virtual machines. In this case, the JVM only sets the 4 most common registers. They are:
PC Program counter
optop operand stack top pointer
Frame Current Execution Environment pointer
VARs A pointer to the first local variable in the current execution environment
All registers are 32-bit. The PC is used to record program execution. Optop,frame and VARs are used to record pointers to the Java stack area.
2.3JVM Stack Structure
As a stack-based computer, the Java stack is the primary method by which the JVM stores information. When the JVM obtains a Java bytecode application, it creates a stack frame for each method of a class in the code to hold the state information for the method. Each stack frame includes the following three types of information:
Local variables
Execution Environment
Operand stacks
A local variable is used to store a local variable used in a method of a class. The VARs register points to the first local variable in the variable table.
The execution environment is used to hold the information required by the interpreter during the interpretation of Java bytecode. They are: The last called method, the local variable pointer, and the stack top and bottom pointer of the operand stack. The execution environment is a control center that executes a method. For example, if the interpreter is going to perform iadd (integer addition), it first finds the current execution environment from the frame register, then finds the operand stack from the execution environment, pops two integers from the top of the stack, and then pushes the result into the top of the stack.
The operand stack is used to store the operands required for the operation and the results of the operation.
2.4JVM Fragment Recycling Heap
The storage space required for an instance of the Java class is allocated on the heap. The interpreter assumes the task of allocating space for the class instance. After the interpreter has allocated storage space for an instance, it begins to record the use of the area of memory occupied by that instance. Once the object has been used, it is recycled into the heap.
In the Java language, there is no other method to request and free memory for an object other than the new statement. The work of freeing and reclaiming memory is carried out by the Java operating system. This allows the designers of the Java Runtime system to determine the method of fragment reclamation themselves. In the Java interpreter and hot Java environment developed by Sun, fragment recycling is performed in the same way as a background thread. This not only provides good performance for the operating system, but also gives program designers the risk of controlling their memory usage.
2.5JVM Storage Area
The JVM has two types of storage: a constant buffer pool and a method area. A constant buffer pool is used to store class names, method and field names, and string constants. The method area is used to store the bytecode of the Java method. The specific implementation of these two storage areas is not explicitly specified in the JVM specification. This allows the storage layout of the Java application to be determined during operation, depending on how the platform is implemented.
The JVM is a platform-independent specification that is defined for Java bytecode and is the basis for the independence of the Java platform. The current JVM still has some limitations and shortcomings that need to be further perfected, but the JVM's ideas are successful anyway.
Comparative analysis: If the Java source program is imagined as our C + + source program, the Java source code generated after compiling the bytecode is equivalent to C + + source program compiled 80x86 machine code (binary program Files), the JVM virtual machine equivalent to 80x86 computer system, The Java interpreter is equivalent to 80X86CPU. The machine code is running on the 80X86CPU, and Java bytecode is running on the Java interpreter.
The Java interpreter is equivalent to "CPU" running Java bytecode, but the "CPU" is implemented by software instead of hardware. The Java interpreter is actually an application under a particular platform. As long as the interpreter program under a specific platform is implemented, Java bytecode can be run through the interpreter program on the platform, which is the root of the Java cross-platform. Currently, not all platforms have a corresponding Java interpreter program, which is why Java does not work on all platforms, it can only run in the implementation of the Java Interpreter program platform.
Recommended books: "In-depth understanding of computer systems", "deep Java Virtual machines"
java compiler, JVM, interpreter