Virtual machines can be run on different platforms, all of which can load and execute the same platform-independent bytecode (bytecode)-the program storage format that is used uniformly by various platform virtual machines and all platforms to achieve "write once, run everywhere".
Virtual machines have language independence. is not bound to any language and is only associated with the specific binary format of the class file, which contains the Java Virtual machine instruction set and symbols, as well as a number of other ancillary information.
Program source file--> corresponding compiler-->.class byte code file-->java virtual machine
The structure of the class file
Each class file corresponds to the definition information of a single class or interface, but conversely, the information of a class or interface is not necessarily defined in a file, for example, a class or interface can be generated directly from the ClassLoader.
class file, a set of binary streams based on 8-bit bytes, and each data item is tightly arranged in the class file in the order in which no separator is added in the middle. Class The entire file is almost all the necessary data for program operation, no gap exists. class files use the pseudo structure of the C structure to store data, only two data types: unsigned numbers and tables.
Unsigned numbers, base data types, U1,u2,u4,u8 (1/2/4/8 bytes) that can be used to describe numbers, index references, quantity values, or strings that are encoded by UTF-8 encoding.
A table, multiple unsigned numbers, or other tables that form the load data type of a data item. All tables habitually end with _info.
As shown in the following table, because there are no delimiters, the order of length of the data item, byte sequence, and so on are strictly qualified and are not allowed to change.
Magic number and class file version
The first 4 bytes of each class file are called Magic Numbers (Magic number), and the only effect is to determine whether the file is a class file that can be accepted by the virtual machine.
Files, pictures, and other file formats are identified by magic numbers, because the file extension can be changed, and the magic number can only be confused use. Java's magic number is 0xCAFEBABE (coffee baby).
The version number of Java begins with 45, JDK1.1 every half of the major version of this release is up to +1, or 7-8 bytes (the third row, the second line is the minor version number). A high version can be backward compatible, but cannot run a later version of the class file.
Constant pool
A constant pool, a resource warehouse for a class file, is the data type that is associated with most other projects, and is one of the data items that occupy the largest class file space. It is also the first table type data item that appears in the class file.
The number of constant pools is not fixed. Therefore, it is necessary to put two bytes in the constant pool entry and start at 1 (only Chang is starting from 1). Constant pool capacity (offset address 0x00000008= hexadecimal 0x0016, that is, decimal 22, which represents 21 constants in a constant pool and an index value of 1-21). 0 index values are set for a particular case when no single constant pool item is referenced.
A constant pool holds constants: Literal (Literal) and symbolic references (symbolic Reference). Literal, such as a text string, a constant value declared as final, and so on. A symbolic reference includes the fully qualified name of the class and interface, the name and descriptor of the field, and the name and descriptor of the method.
Each vector in a constant pool is a table. There are 14 kinds of constant pool project types. The common feature is that the first bit of the table is a U1 type of flag bit (tag, such as 0x07, offset address 0x00000007), representing which constant type it belongs to. 14 types of constants each have their own structure.
class method fields, and so on, must refer to Constant_utf9_info to describe the name, so its maximum length is the Java method, the maximum length of the field name. For example, LENGTH=U2, the maximum value is 65535, so the definition cannot be compiled beyond 64KB.
Access Flag Access_flags
After Chang, two bytes are the access flags that identify the hierarchical access information for a class or interface, including whether class is an interface or a public type, whether it is defined as an abstract type, whether the class is declared final, and so on. A total of 16 flag bits, currently defined only 8, not defined as 0.
Class index, parent class index, and interface index collection
The class index (This_class), the parent class index (SUPER_CLASS), the interface index collection (interfaces), are U2 types (the interface is a set of U2). The class file is determined by these three entries to determine the inheritance of the classes.
Class index, which is the index of this class. Single inheritance, so there is only one parent class index. Except for the object class, all Java classes have a parent class index of 0. An interface index collection represents which interfaces the class implements. The implemented Interfaces (inherited), if they are interfaces, are arranged from left to right in the order of implements.
Both the parent class and the class index point to Constant_class_info, and the first entry of the interface index is the U2 interface counter, if not, 0, and the following index table does not occupy any bytes.
Picture Source
Field table Collection Field_info
A field table that describes the variables declared in an interface or class. Fields include class-level variables and instance-level variables, but do not include local variables declared within methods.
Such as: Field scope: Public, instance variable or class variable (static modifier), variability (final), concurrent visibility (volatile modifier, whether forced to read and write from main memory), serializable (transient), field data type (base data type , objects, arrays), field names.
where modifiers, etc., are Boolean values that are suitable to be represented by a flag bit, and the types of fields cannot be fixed and can only be described by referring to constants in a constant pool.
The left and right of the descriptor describe the data type of the field, the parameter list of the method (including quantity, type, and order), and the return value. Basic data types and void can be represented by an uppercase character, such as Void->v,byte->b. Array [, two-dimensional array [[,int[]-> "[I",string[][]-> "[[Ljava/lang/string;"]. L refers to the Ljava/lang/object object type. The order description of the value returned after the argument is first.
Method table Collection
The method and the description of the field are almost identical. Includes access flags, name indices, descriptor indices, property sheet counters, and property sheet collections.
The volatile keyword and the transient keyword cannot modify the method, and Synchronized,native,strictfp,abstract can modify the method.
The code in the method is compiled into a byte-code instruction by the compiler, and is stored in a property called code in the Method Property table collection, which is the most extensible data item in the class file format.
Compiler added instance construction method <init> and Source method GC ();
If the parent class method is not overridden in a subclass, information from the parent class does not appear in the Method table collection. However, it is possible to have a method that is automatically added by the compiler, the most typical of which is the class constructor <clinit> and instance method builder <init>. When overloaded in Java, a different signature is required in addition to the method name, which is the collection of string references in the method for each parameter in the constant pool. However, in the class file, the signature range is larger, can have the same name and signature, but the return value is different, can be lawfully coexist in the same class file.
Property sheet Collection
The Field Table method table can carry its own collection of property sheets to facilitate the description of information that is proprietary to some scenarios.
The property sheet has no strict order requirements as long as it does not duplicate the property name.
1.code Property--The code in the method becomes a bytecode instruction and is stored in the Code property. However, there is no code attribute for methods in interfaces or abstract classes.
Within any instance method, you can access the object to which this method belongs. The implementation is very simple. The access to the This keyword is converted to a generic method parameter by javac the compiler, and then automatically passed in when the virtual machine invokes the instance.
The exception table is actually part of Java. The compiler uses exception tables instead of simple jump commands to implement Java exceptions and finally processing mechanisms.
2.Exception Property-Lists the throws exceptions that may be thrown in code, that is, the exception that is enumerated later.
3.LineNumberTable Property--The correspondence between the line number and the bytecode line number (byte-code offset). Runtime is not required. is generated by default in the class file.
4.LocalVariableTable Properties-Describes the relationship between a variable in a local variable table in a stack frame and a variable defined in Java source code, and is not required at runtime. is generated by default in the class file.
5.sourceFile Property--Used to record the name of the source file that generated this class file. Optional.
6.ConstantValue Property--notifies the virtual machine to automatically assign a value to a static variable. This property can only be used by variables that are modified by static.
Non-static, in instance builder <init>; Static:constantvalue or <clinit>.
Introduction to Byte code directives
The instruction of a Java Virtual machine consists of a byte length (0-255), a number representing the meaning of a particular operation (opcode opcode), and the following 0 to many operands (oprands) representing the required parameters for this operation. Java virtual machines employ an operand-oriented architecture, so most directives do not include operands, only one opcode. A byte is used to represent the opcode, as well as to obtain short, lean compiled code as much as possible.
1. Byte code and data type
iload-> load int to operand stack,fload-> load float to operand stack. You must have separate opcode in the class file.
2. Loading and storing instructions
Load and store directives are used to transmit data back and forth between the local variable table and the operand stack in the stack frame, including:
Loads a local variable into the operation stack, such as iload,iload_<n>.
Storing a numerical value from the operand stack to the local variable table,istore,istore_<n>, etc.;
Loads a constant onto the operand stack: bipush,sipush, etc.;
An instruction that expands the access index of a local variable table: wide.
3. Operation Instruction
Add: Idd,ladd etc, reduce isub,lsub ...
4. type conversion instruction int->long. Long->float.. Float->double ...
5. Object creation and access instruction new NewArray GetField ....
6. Operand stack management instruction pop pop2 ...
7. Control transfer instruction Ifeq Ifle goto
8. Method call and return instruction invokevirtual ...
9. Exception Handling Instructions
10. Synchronization instructions are supported by Monitor.
Public design and private implementation
Public design: A common storage format, class file, and byte code instruction set.
Private implementation: must be able to read the class file and precisely implement the semantics of the Java Virtual machine code contained therein.
Implementation: Translates the input Java Virtual machine code into another virtual machine instruction set when loading or executing, or translates the local instruction set of the Cheng host CPU, JIT code generation technology.
The development of class file structure
Add some new flags to the access flag.