The result of code compiling from local machine code to bytecode, is a small step in the development of storage format, but it is a big step in the development of programming language. Computers only know 0 and 1, so our program needs to be translated by the compiler into a binary format consisting of 0 and 1 in order to be executed by the computer. After the development of technology, compiling the program into binary local machine code is not the only choice, more and more programming languages choose the operating system and machine instruction independent, platform-neutral format as the program compiled storage format.
The basis for implementing language independence is virtual machines and bytecode storage formats. The Java virtual machine is not bound to any language, including Java, and is associated only with a specific binary file such as the class file, which contains the Java Virtual machine instruction set symbol table and several other ancillary information. A Java Virtual machine is a generic, machine-independent execution platform that any other language can use as a language product delivery medium.
class file Structure
The class file is a set of binary streams that are based on 8-bit bytes, and each data item is arranged in a compact order in the class file without any delimiters, so that almost all of the content stored in the class file is the necessary data for the program to run, and no gaps exist. When a data item that requires more than 8 bytes of space is encountered, it is stored by splitting several 8-bit bytes in front of the high-order.
According to the Java Virtual Machine specification, the class file format uses a pseudo-structure similar to the C language structure to store data with only two data types: unsigned number and table.
Unsigned number : belongs to the basic data type, with U1, U2, U4, U8 representing 1 bytes, 2 bytes, 4 bytes, 8 bytes of unsigned number, unsigned number can be used to describe a number, index reference, quantity value, or string value by UTF-8 encoding.
table : The data type of a composite type that consists of multiple unsigned numbers and other tables as data items, all of which are accustomed to ending with "_info", and tables are used to describe the data of a hierarchical composite structure, and the entire class file is essentially a table. , which consists of the data items shown in the following table:
The structure of class is not like the description language of XML, because it does not have any separator symbol, so the data items in the above table, regardless of the order or quantity, even the data stored in the byte order such details are strictly stipulated.
Here's a look at the specific meanings of each data item in the table:
1. Version of magic number and class file
The first four bytes of each class file are called Magic Numbers, and its only purpose is to determine if the file is a class file that can be accepted by the virtual machine. Many file storage standards are identified using magic numbers rather than extensions, primarily based on security considerations. The four bytes of the magic number are then stored in the version number of the class file: Fifth and sixth are this version number, seventh and eighth are the major version numbers, take the following class as an example:
We can first implement a class file
public class testclass{ private String m; Public String test () { return m; }}
Then take a look at the generated class file
The first line cafebabe that this file is a class file, which is what we call magic number, followed by 4 bytes is 00000032, the first two bytes represent the minor version number, the last two bytes represents the major version number, according to the class file version number, We can tell that this file can be executed by JDK1.6 and above compiler.
2, Chang
followed by the main version number is the constant pool entrance, the constant pool can be understood as a class file in the repository, it is the class file structure with the most associated with other projects of the data type, is the largest class file space to occupy one of the data items, It is also the first table-type data item that appears in a class file. Because the number of constants in a constant pool is not fixed, you need to place an U2 type of data U at the entrance of the constant pool, representing the constant pool capacity count value, which starts at 1 instead of 0. The purpose of this is to satisfy some of the subsequent data that points to the index value of the constant pool, in a particular case, to express the meaning of "do not refer to any of the constant pool items", in which case the index value can be set to the zero expression. The constant pool capacity is 0x0016, which is 22 in decimal, which means that there are 21 constants in the constant pool.
There are two main types of constants in a constant pool: literal and symbolic references.
Literals approximate Java constants, such as literal strings, constant values declared final, and symbolic references are concepts of compilation principles, including: fully qualified names of classes and interfaces, names and descriptors of fields, names and descriptors of methods.
Java code is not connected at compile time, but is dynamically connected when the virtual machine loads the class file. When a virtual run runs, it needs to obtain the corresponding symbolic reference from Chang, and then parse and translate it into the specific memory address at the time of class creation or runtime.
Every constant of a constant pool is a table, and there are 14 table structure data for each of the different structures in JDK 1.7. One common feature of these tables is that the first bit of the table starts with a U1 type of flag bit.
Looking back at the table above, the constant pool of the first constant tag, which is used to distinguish the constant type, its flag bit is 0x07, belongs to the Constant_class_info type, this type of constant represents a class or interface symbol reference.
Then the next is Name_index, which is an index value because his value is 0002, which indicates that it is the constant--utf8_info type.
Then we can look at the structure of the UTF8, the first one is the flag bit tag, we can get it is 01, and the project type is consistent. Next is the length of 2 bytes, which we read is 000D, which is 13 bytes long, and then the next 13 bytes are all bytes content.
In the bin directory of the JDK, there is a tool dedicated to parsing the class file bytecode: JAVAP, the class file generated by the above code is output using the-verbose parameter of the JAVAP tool as follows:
3. Access Mark
At the end of the constant pool, there are two bytes representing the access flag, which identifies the access information for classes and interface hierarchies, whether the class or interface is a public type, whether it is defined as an abstract type, if it is a class, whether it is declared as a final type, and so on.
4. Class index, parent class index, and interface index
Both the class index and the parent class index are a U2 type of data, and the interface index collection is a collection of data of a set of U2 types, and this three data in the class file loves to determine the inheritance of the class. The class index is used to determine the fully qualified name of this class, and the parent class index is used to determine the fully qualified name of the parent class of the class. Because the Java language does not allow multiple continuation, there is only one parent class index. The interface index collection is used to describe which interfaces the class implements, and the implemented interfaces are arranged from left to right in the index collection, following the order of the interfaces after the implements statement.
5. Field table Collection
The field table is used to describe the variables declared in an interface or class. Fields include class-level variables and instance variables, but do not include local variables declared inside the method. The information that can be included is the scope of the field, the instance variable or the class variable (static modifier), the variability (final), the concurrency visibility (volatile adornment), whether it can be serialized (transient modifier), the field data type, and the field name. The above information, each modifier is a Boolean value.
6. Collection of method tables
Method tables are used to describe methods, including information such as access flags, name indexes, descriptor indexes, attribute table collections, and so on. The Java code in the method is compiled into bytecode instructions by the compiler and stored in a property named code in the collection of the method attribute table, which is the most extensible data item in the class file format.
In the Java language, to overload (overload) a method, in addition to having the same simple name as the original method, requires a feature signature that differs from the original method (a signature is a collection of field symbol references in a method where each parameter is in a constant pool). The method signature of the Java code includes only the method name, the parameter order, and the parameter type, which is because the return value is not included in the signature, so the Java language cannot overload an existing method simply by relying on the difference in the return value.
7. Attribute table Collection
In class files, field tables, method tables, you can carry your own set of property sheets that describe the information that is proprietary to certain scenarios.
Code property: After the Javac compiler has processed the codes in the Java program method body, the bytecode instruction is eventually stored inside the code attribute. The Code property appears in the property collection of the method table, but not all method tables must have this property, such as the interface or methods in an abstract class that do not have the Code property.
Exceptions property: The function is to enumerate the detected exceptions that might be thrown, that is, the exceptions listed after the throws keyword in the method description.
Constantvalue property: Notifies the virtual machine to automatically assign a value to a static variable that can be used only by variables modified by the static keyword.
Inerclass property: Used to record an association between an inner class and a host class. If an inner class is defined in a class, the compiler generates inerclass properties for it and the inner classes it contains.
Three, byte code instruction
The instructions for a Java Virtual machine consist of a byte-length number that represents the meaning of a particular operation, called the opcode, and the parameters required to follow the actions of 0 to more delegates. Because the Java virtual machine is oriented to the operand stack instead of the register schema, most directives do not contain operands, only one opcode. Because the length of the Java virtual Machine opcode is limited to one byte (0-255), this means that the total number of opcode in the instruction set cannot exceed 256.
1. Byte code and data type
For most byte-code directives that are related to data types, their opcode mnemonics have special characters that indicate which data type to service: I delegate data operations of type int, L represents long,s on behalf of Short,b on behalf of BYTE,C on behalf of the CHAR,F on behalf of float, D represents Double,a on behalf of reference. There are also some instruction mnemonics that do not explicitly indicate the type of operation of the letter. There are also some directives that are not related to data types. The instruction set for a Java Virtual machine provides only a limited type-dependent instruction to support it for special operations, and the instruction set is designed to be not completely independent, and there are separate instructions that can be used to convert unsupported types to supported types when necessary.
2. Loading and storing instructions
The load and store directives are used to transfer data back and forth between the local variable table and the operand stack in the stack frame.
3. Operation Instruction
An operation or arithmetic instruction is used to perform a specific operation on the values on the two operand stacks and to re-deposit the results to the top of the Operation Stack.
4. Type conversion Instruction
Type conversion directives can convert two different numeric types, which are typically used to implement a display type conversion operation in user code, or to handle an issue where the related instructions for the data type in the bytecode instruction set mentioned above do not correspond to data type one by one.
The Java virtual machine directly supports the following numeric types of widening type conversions (small-range types to large-range types of security conversions):
- type int to long, float, or double type
- Long type to float, double type
- Float type to double type
When processing a narrowing type conversion, you must explicitly use a conversion directive to complete it.
5. Exception Handling Commands
The Java Virtual Machine specification specifies that many run-time exceptions will be thrown automatically when other Java Virtual machine instructions detect an abnormal condition, except that the throw exception is displayed by the Athrow directive, which shows that the exception is thrown.
In a Java virtual machine, the exception table is used to handle exceptions (catch statements) that are not implemented by bytecode directives.
Java Virtual machine-class file