1. Overview
A class file is a group of 8-byte binary streams. Each data item is arranged strictly in a sequential and compact manner in the class file without any separators, this makes almost all the content stored in the entire class file necessary for the program to run. When a data item needs to occupy more than 8 bytes of space, it will be separated into multiple 8-bit bytes for storage in the way of the top position.
The class file uses a pseudo structure similar to the C-language structure to store data. There are only two data types in this structure: unsigned number and table.
- Unsigned Number:It is a basic data type. u1, U2, U4, and u8 represent the unsigned number of 1 byte, 2 byte, 4 byte, and 8 byte, respectively, the unsigned number can be used to describe a number, index reference, number value, or UTF-8-encoded string value
- Table:It is a data type that consists of multiple unsigned numbers or other tables as data items. Therefore, tables are used to ending with _ info. A table is used to describe data in a hierarchical composite structure. The entire class file is essentially a table, which consists of the data items shown in the following table:
Table 1. Table Structure Data items
2. The class file consists of the 2.1 magic number and the class file version number.
- The first 4 bytes of each class file become magic number. Its only function is to determine whether the file is a class file that can be accepted by the virtual machine. The magic value of the class file is 0 xcafebabe.
- The four bytes of the magic number store the version number of the class file: 5 and 6 are the minor version ), the main version number (major version) is two bytes (7 and 8 ). The following table lists common class file versions;
Table 2. Common class file versions
2.2 constant pool
Every constant in the constant pool is a table. In jdk1.7, there are a total of 14 table structure data. The first place that starts from the Bid 14 is a U1 tag bit. The specific meanings are shown in the following table:
Table 3. Table Structure Types in the constant pool
2.2.1 constant pool location
- The constant pool is followed by the Primary and Secondary versions. The constant pool can be understood as the resource warehouse in the class file, which is the data type most associated with other projects in the class file structure, it is also one of the Data projects that occupy the largest file space, and it is also the first data project that appears in the table structure in the class file.
- At the entry of the constant pool, a U2 type data is placed to represent the capacity Count value (constant_pool_count) of the constant pool, and the Capacity count starts from 1.
2.2.2 Data Types in the constant pool
- Literal ):Mainly stores some text strings and constants whose declarations are final.
- Symbolic references ):It mainly includes the full qualified name of the class and interface, the name and descriptor of the field, and the name and descriptor of the method.
2.2.3 various table Structures
Table 4. Various table structures in the constant pool
2.3 access flag
After the constant pool ends, the next two bytes represent the access flag (access_flags), which is used to identify access information at the class or interface level, including: whether the class is a class or an interface; whether it is defined as public type, whether it is defined as abstract type, and whether it is declared as final if it is a class. The specific flags and meanings are shown in the following table:
Table 5. Class File Access attributes
2.4 category indexes, parent indexes, and interface Indexes
The class index, parent index, and interface index set are arranged in order after the access flag, and are all U2 data items. The class index and the parent index are represented by two U2-type index values. Each of them points to a class descriptor constant of the constant_class_info type, you can use the index value in a constant_class_info constant to find the fully qualified name string defined in a constant of the constant_utf8_info type. Both this_class and super_class are of the U2 type, while interfaces is a set of U2 data, the three data items in the class file determine the inheritance relationship of the class.
2.5 field table set
The field table (field_info) is used to describe fields declared in interfaces or classes. Fields include class-level variables and instance-level variables, but do not include local variables declared within the method. The format of the field table is as follows:
Table 6 Field table
The field modifier is placed in the access_flags project. It is similar to the access_flags project in the class and is a data type of U2. The following table lists the flag spaces and meanings that can be set:
Table 7 Field table access modifier
The following two index values follow the access_flags MARK: name_index and descriptor_index. They are all references to the constant pool, representing the simple names of fields and descriptors of fields and methods respectively. According to the descriptor rules, the basic data type and the void type that represents the type without return values are expressed in uppercase letters, while the object type is the character l plus the full qualified name of the object, as shown in the following table:
Table 8 object type identifier
2.6 method table set
In the class file storage format, the method description is almost the same as the field description. The fields in the method table are the same as those in the field table, followed by the access flag (access_flags), name index (name_index), descriptor index (descriptor_index), and attribute table set (attributes ). The method table attributes are shown in the following table:
Table 9. method table attributes
For the method table, see the following table for all flags and their values:
Table 10. method table flag list
2.7 Attribute Table set
The class file, field table, and method table can carry their own attribute table set to describe the proprietary information in some scenarios. Attribute tables do not require strict order of each attribute table, as long as they do not have the same name as an existing attribute. The following table lists some predefined attributes of a Java VM.
Table 11. Virtual Machine Attribute Table types
For each attribute, its name must be represented by a constant of the constant_utf8_info type referenced from the constant pool, and the attribute value format is completely custom, you only need to use a U4 Length attribute to describe the number of digits occupied by the attribute value. The structure of the Attribute Table must meet the structure shown in the following table:
Table 12. Attribute Table Structure
2.7.1 Code attributes
After compilation, the method body in Java is finally stored in the Code attribute in bytes. The Code attribute appears in the attribute set of the method table (excluding the abstract class or interface method). The structure of the Code Attribute Table is as follows:
Table 13. Code Attribute Table Structure
- Attribute_name_index: An index pointing to a constant_utf8_info constant. The constant value is fixed to "code", which indicates the attribute name of this attribute.
- Attribute_length: Length of the attribute value. Because the index and length of the attribute name are 6 bytes in total, the length of the attribute value = the length of the Attribute Table-6
- Max_stack: Represents the maximum depth of the operand stack (operand stacks. The operand stack does not exceed this maximum value at any time of method execution. When running a virtual machine, you need to allocate the operation stack depth in the stack frame according to this value.
- Max_locals: The storage space required by the local variable table. The Unit is slot.
- Code_length: Bytecode Length
- Code: Bytecode command generated after compilation
- Prediction_table:It contains four fields (start_pc, end_pc, handler_pc, and catch_type ). The meaning of these fields is: when the bytecode is between the start_pc row and the end_pc row (try range) if an exception of the type catch_type or its subclass is abnormal (catch_type is an index pointing to a constant_class_info constant), the row jumps to the handler_pc for further processing.
2.7.2 exceptions attributes
The exception attribute is an attribute at the same level as the Code attribute. Its structure is shown in the following table:
Table 14. Exceptions table attribute Structure
Number_of_exceptions indicates that the method may throw a query exception in number_of_exceptions. Each query exception is indicated by an prediction_index_table item. prediction_index_table is an index pointing to a constant_class_info constant in the constant pool, indicates the type of the query exception.
2.7.3 linenumbertable attributes
The linenumbertable attribute is used to describe the correspondence between the Java source code line number and the bytecode line number. It is not a required attribute during running, but is generated to the class file by default. You can use-G: none or-G: In javac: the lines option is canceled or this information is required to be generated. If you choose not to generate the linenumbertable attribute, when the program throws an exception, the stack will not display the wrong line number, And when debugging the program, you cannot set a breakpoint according to the source line, its structure is shown in the following table:
Table 15 and linenumbertable Attribute Table Structure
Line_number_table is a set of line_number_table_length and line_number_info types. The line_number_info table contains two U2 data items, start_pc and line_number. The former indicates the row number of the bytecode and the latter indicates the source code line number of Java
2.7.4 localvariabletable attributes
The localvariabletable attribute is used to describe the relationship between the variables in the local variable table in the stack frame and the variables defined in the Java source code. It is not required during runtime, but is generated in the class file by default. You can use-G: none or-G: lines in javac to cancel or require this information to be generated. If this information is not generated, all parameter names will be lost when others introduce this method. Ide will use placeholders such as arg1 and arg2 to replace the original parameter names, this does not affect the running of the program, but it is inconvenient to compile the code. During debugging, you cannot obtain the parameter value from the context based on the parameter name. Its structure is shown in the following table:
Table 16. Structure of the localvariabletable Attribute Table
Local_variable_info indicates the association between a stack frame and a local variable in the source code. The local_variable_info table structure is as follows:
Table 17 and local_variable_info table structure
- Start_pc: the bytecode offset starting from the lifecycle of a local variable.
- Length: the range of the local variable's bytecode that begins on Thursday.
- Name_index: Index pointing to the constant_utf8_info type constant in the constant pool, representing the name of the local variable
- Descriptor_index: Index pointing to the constant_utf8_info type constant in the constant pool, representing the descriptor of the local variable
- Index: Location of the local variable in the slot table of the stack frame local variable
2.7.5 sourcefile attributes
The sourcefile attribute is used to record the source code name of the class file. This attribute is also optional. You can use javac's-G: none or-G: source to disable or require this information to be generated. If this information is not generated, the file name of the error code is not displayed in the stack when an exception is thrown. This property is a fixed-length property with the following structure:
Table 18. Structure of the sourcefile Attribute Table
The sourcefile_index data item points to the index of the constant_utf8_info type constant in the constant pool. The constant value is the source code file name.
2.7.6 constantvalue attributes
The constantvalue attribute is used to notify virtual machines to automatically assign values to static variables. Only variables modified by static can use this attribute.
2.7.7 innerclasses attributes
The innerclasses attribute is used to record the relationship between the internal class and the host class. If an internal class is defined in a class, the compiler generates the innerclasses attribute for it and its internal classes. The structure of this attribute is shown in:
Table 19. Structure of the innerclasses Attribute Table
The data item number_of_classes indicates the number of internal classes to be recorded. The information of each internal class is described by an inner_classes_info table. The structure of the inner_classes_info table is as follows:
Table 20, inner_classes_info table structure
- Inner_class_info_index: Pointing to the index of the constant_class_info type constant in the constant pool, representing the symbol reference of the internal class
- Outer_class_info_index: Indicates the index of the constant_class_info type constant in the constant pool, representing the symbolic reference of the host class.
- Inener_name_index:The index pointing to the constant_utf8_info type constant in the constant pool, representing the name of the internal class. If it is an anonymous internal class, the value is 0.
- Inner_class_access_flags:The access flag of the internal class. Its value range is shown in the following table:
Table 21. Internal access flag
2.7.8 deprecated and synthetic attributes
Both deprecated and synthetic attributes belong to the Boolean attribute of the Flag type. There is only a difference between the deprecated and synthetic attributes and there is no attribute value concept. The deprecated attribute is used to represent a class, field, or method. It can be set by using the @ deprecated annotation in the code. The synthetic attribute indicates that this field or method is not directly generated by the Java source code, but added by the compiler. The structure of the deprecated and synthetic attributes is as follows:
Table 22, deprecated, and synthetic attributes
2.7.9 stackmaptable attributes
The stackmaptable attribute is added to the class File Specification after jdk1.6 is released. It is a complex variable-length attribute located in the Attribute Table of the Code attribute. This attribute will be used by the Type checker in the bytecode verification phase loaded by the VM class, the purpose is to replace the data flow analysis-based type derivation validators that previously consumed performance. The stackmaptable attribute contains zero to multiple stack map frames (stack map frames). Each stack frame shot frame explicitly or implicitly represents the offset of A bytecode, indicates the verification type of the local variable table and the operand stack when the bytecode is executed. The Type checker checks the local variables of the target method and the types required by the operand stack to determine whether the commands comply with logical constraints. Its structure is as follows:
Table 23 and stackmaptable Attribute Table Structure
2.7.10 signature attributes
The signature attribute is added to the class File Specification after jdk1.5 is released. It is an optional fixed-length attribute that can appear in the Attribute Table of the class, attribute table, and method table structure. If the generic conciseness of any class, interface, initialization method, or member contains a type variable () or parameterized type (), the signature attribute records information about the generic signature for it. The structure of the signature attribute is as follows:
Table 24 and signature Attribute Table Structure
- Signature_index: Must be a valid index for the constant pool. The unknown item of the constant pool in this index must be in the constant_utf8_info structure, indicating class signature, method type signature, or field type signature. If the current signature attribute is a class file attribute, it indicates a class signature; if it is a method table attribute, it indicates a method signature; if it is a field table attribute, it indicates a field signature.
2.7.11 attributes of bootstrapmethods
The bootstrapmethods attribute is added to the class File Specification after jdk1.7 is released. It is a complex variable-length attribute located in the class file table. This attribute is used to save the guide method qualifier referenced by the invokedynamic command. If a constant of the constant_invokedynamic_info type has appeared in the constant pool of a class file, a clear bootstrapmethods attribute must exist in the Attribute Table of this class file. In addition, A constant of the constant_invokedynamic_info type can have only one bootstrapmethods attribute, even if it appears multiple times in the constant pool. The property structure of bootstrapmethods is as follows:
Table 25 and attributes of bootstrapmethods
The structure of the referenced bootstrap_method is as follows:
Table 26bootstrap_method Structure
Class file structure