Learn Java friends should know that Java from the beginning of the platform to play a non-independent banner, said "write once, run everywhere", actually speaking of irrelevant, the Java platform has another irrelevant that is language-independent, to achieve the language of independence, Then the Java System class file structure or bytecode is very important, in fact, Java from the beginning when there are two sets of specifications, one is the Java language Specification, the other is the Java Virtual Machine specification, The Java language Specification only specifies the Java language-related constraints and rules, and the virtual machine specification is really designed from a cross-platform perspective. Today we'll look at a practical example of what the bytecode of a class file in Java should look like. This article will begin with a general description of what the class is composed of, and then use a practical Java class to analyze the class's file structure.
Before we proceed, we need to make clear the following points first:
1) The class file is composed of 8 bytes-based byte streams, which are arranged in strict order, and there is no gap between the bytes, for more than 8 bytes of data will be stored in the order of Big-endian, that is, high-bit bytes stored on the low address, While the low byte is stored on top of the high address, this is also the class file to cross the platform of the key, because the processing of the PowerPC architecture takes Big-endian storage order, and x86 series of processors are Little-endian storage order, so for the class text Uniform storage sequence in each of the processor architectures, the virtual machine specification must be unified.
2) class file structure using a C-like structure to store data, there are two main types of data items, unsigned number and table, unsigned number used to describe numbers, index references and strings, such as U1,U2,U4,U8, respectively, representing 1 bytes, 2 bytes, 4 bytes, 8 bytes of unsigned number, The table is a composite structure with multiple unsigned numbers and other tables. Perhaps you see here on the unsigned number and the table is not very clear above, but it does not matter, and so on the following example, I will be an example to explain.
Having identified the above two points, we will then look at what data is specifically contained in the class file in the strict order of the byte stream:
(From the Java Virtual Machine specification Java SE 7 Edition)
When we look at it, there is one thing we need to be aware of, such as cp_info,cp_info means constant pool, using constant_pool[constant_pool_count-1] to represent a constant pool with Constant_pool_ Count-1 a constant, which is the expression of the array here, but you do not mistakenly think that all the constant pool constant length is the same, in fact, this place is just to facilitate the description of the way to use the array, but this is not like a programming language there, an int array, each int length is the same. With that in mind, we look back and see what the meaning of each item is.
1) U4 Magic is the magic number, and the magic number takes up 4 bytes, what is the magic number exactly? It actually means that the type of the file is a class file, not a JPG image, or an AVI movie. The magic number corresponding to the class file is 0xCAFEBABE.
2) U2 minor_version represents the minor version number of the class file, and this version number is an unsigned number representation of the U2 type.
3) U2 major_version represents the major version number of the class file, and the major version number is the unsigned number representation of the U2 type. Major_version and minor_version are used primarily to indicate whether the current virtual machine accepts the current version of the class file. The version of the class file compiled by different versions of the Java compiler is not the same. A higher version of the virtual machine supports a lower version of the compiler-compiled class file structure. For example, Java SE 6.0 corresponds to a virtual machine that supports the Java SE 5.0 compiler-compiled class file structure, and vice versa.
4) U2 Constant_pool_count represents the number of constant pools. Here we need to focus on what the constant pool is, and please do not confuse it with the run-time pool in the JVM memory model, where the constant pool primarily stores literal and symbolic references, where the literal consists primarily of a string, the value of a final constant, or the initial value of a property, etc. The symbolic reference is the fully qualified name of the primary storage class and interface, the name of the field and the descriptor, the name of the method, and the descriptor, where the name may be easy to understand, as well as the concept of the descriptor, put in the following Field table and table of methods to say. In addition, we all know that the JVM's memory model has a heap, stack, method area, program counter composition, and there is another area in the method area called the running constant pool, running a constant pool of things in fact is the compiler longevity of the various literal and symbolic references, but the run-time pool is dynamic, It can add other constants to it at run time, the most representative of which is the intern method of string.
5) Cp_info represents a constant pool, where there is a variety of literal and symbolic references. Data items placed in a constant pool there are 14 constants in the Java Virtual Machine specification Java SE 7 Edition, each of which is a table, and each constant uses a common part tag to represent what type of constant.
The following is a brief description of the specific details to wait for the subsequent examples of refinement.
- Constant_utf8_info tag Flag bit is 1, UTF-8 encoded string
- Constant_integer_info tag Flag bit 3, shaping literal
- Constant_float_info tag Flag bit 4, floating-point literal
- Constant_long_info tag Flag bit 5, long shaping literal
- Constant_double_info tag Flag bit 6, double precision literal
- Constant_class_info tag is 7, class or interface symbol reference
- Constant_string_info tag Flag bit 8, literal of string type
- Constant_fieldref_info tag Flag bit 9, field symbol reference
- Constant_methodref_info the tag flag bit is 10, the symbol reference of the method in the class
- Constant_interfacemethodref_info the tag flag bit is 11, the symbol reference of the method in the interface
- Constant_nameandtype_info tag Flag bit 12, field and method name and symbol reference of type
6) U2 Access_flags represents the access information for a class or interface, as shown in:
7) U2 This_class represents the constant pool index of the class, pointing to constants constant_class_info in the constant pool
8) U2 Super_class represents the index of the superclass, pointing to constants in the constant pool Constant_class_info
9) U2 interface_counts indicates the number of interfaces
U2 interface[interface_counts] Represents the interface table, each of which points to a constant pool of Constant_class_info constants
One) U2 Fields_count represents the number of instance variables and class variables for a class
Field_info Fields[fields_count] Represents the information for the field table, where the structure of the field table is as follows:
Access_flags represents the access representation of a field, such as a field is Public,private,protect, Name_index represents a field name, and a constant in a constant pool that is of type Constant_utf8_info, descriptor _index represents a descriptor for a field, which also points to a constant in a constant pool of type Constant_utf8_info, Attributes_count represents the number of attribute tables in the field table, and the attribute table is an extensible structure that is used with the properties of the Description field, method, and class. The number of attribute tables supported by different versions of the Java Virtual machine is different.
U2 Methods_count represents the number of method tables
Method_info represents the method table, and the concrete structure of the method table is as follows:
Where access_flags represents the access representation of the method, Name_index represents the index of the name, Descriptor_index represents the method descriptor, Attributes_count, and attribute_info the attribute table in the similar field table, Only the attributes in the attribute table in the field table and the method table are different, such as the code attribute in the method table, which represents the method, and the field table does not. How many properties are in the specific class, and wait until the attribute table in the class file structure is added.
Attribute_count indicates the number of attribute tables, when it comes to the attribute table, we need to make the following points clear:
- The property sheet exists at the end of the class file structure, in the field table, in the method table, and in the Code property, which means that the attribute table can also exist in the attribute table
- The length of the property sheet is not fixed, different attributes, the length of the property sheet is different
Now that we're done with the composition of each item in the class file structure, we'll explain the following in a practical example.
|
packagecom.ejushang.TestClass;publicclassTestClassimplementsSuper{privatestaticfinalintstaticVar =0;privateintinstanceVar=0;publicintinstanceMethod(intparam){ returnparam+1; }}interfaceSuper{ } |
The binary structure of the testclass.java corresponding to the Javac compiled by Jdk1.6.0_37 is shown in the following:
Below we will parse the following byte stream according to the file structure of the class described above.
1) Magic number
From the file structure of class we know that the first 4 bytes is the magic number, the content from the address 00000000h-00000003h is the magic number, from the known class of the file's magic number is 0xCAFEBABE.
2) Primary and secondary version number
The next 4 bytes is the primary and secondary version number, it is known from the 00000004h-00000005h corresponding to the 0x0000, so class minor_version is 0x0000, from 00000006h-00000007h corresponding to 0x0032 , so the major_version version of the class file is 0x0032, which is exactly the primary and secondary version of the class corresponding to jdk1.6.0 without the target parameter compiled.
3) Number of constant pools
The next 2 bytes represent the number of constant pools from 00000008h-00000009h, which can be known as 0x0018, with a decimal of 24, but the number of constant pools needs to be clarified, and the number of constant pools is constant_pool_count-1. Why minus one is because index 0 means that the data item in class does not reference constants in any constant pool.
4) Constant Pool
We said that there are different types of constants in the constant pool, let's take a look at the first constant of Testclass.class, we know that each constant has a U1 type tag identifier to represent the type of the constant, the content in 0000000ah is 0x0a, and the conversion to two-level system is 10, There is a description of the constant type above, which shows that the constant of tag 10 is constant_methodref_info, and Constant_methodref_info's knot is enough as shown:
where Class_index points to constants of type Constant_class_info in a constant pool, it can be seen from the TestClass binary file structure Class_ The value of index is 0x0004 (address is 0000000bh-0000000ch), which means that it points to a fourth constant.
Name_and_type_index points to a constant pool of type Constant_nameandtype_info constants. You can see that the value of Name_and_type_index is 0x0013, which represents the 19th constant in a constant pool.
You can then find all the constants in the constant pool in the same way. However, the JDK provides a handy tool for us to see the constants contained in the constant pool. Constants in all constant pools can be obtained by Javap-verbose TestClass, as follows:
As we can see clearly, the constant pool in TestClass has 24 constants, and don't forget the No. 0 constant, because the No. 0 constant is used to indicate that the data item in class does not reference constants in any constant pool. From the above analysis we learned that TestClass's first constant representation method, where Class_index points to the fourth constant is Java/lang/object,name_and_type_index point to the 19th constant value of <init >:() V, it can be seen from here that the constant representing the method represents the instance constructor method generated by the Java compiler. Other constants of the constant pool can be analyzed in the same way. OK, after analyzing the constant pool, we'll analyze the next access_flags.
5) U2 Access_flags represents the access information for a class or interface, such as class or interface, whether it is public,static,final, and so on. The meaning of the specific visit has already been said before, so let's take a look at TestClass's access mark. Class is the access mark from 0000010DH-0000010E, the period value of 0x0021, according to the various access markers mentioned above, we can know: 0x0021=0x0001|0x0020 is Acc_public and Acc_super is true, Which acc_public everyone to understand, Acc_super is jdk1.2 after the compiled class will carry the flag.
6) U2 This_class represents the index value of the class, which is used to represent the fully qualified name of the class, as shown in the index value of the class:
From the clear to see, the class index value is 0x0003, corresponding to the constant pool of the third constant, through the results of JAVAP, we know that the third constant is a constant of type constant_class_info, through which you can know the fully qualified name of the class is: com/ejushang/ Testclass/testclass
7) U2 Super_class represents the index value of the parent class of the current class, the index value points to a constant pool of type Constant_class_info constants, the index value of the parent class is shown, its value is 0x0004, and the fourth constant of the constant pool is viewed. The fully qualified name of TestClass's parent class is: Java/lang/object
8) Interfaces_count and Interfaces[interfaces_count] represent the number of interfaces and the specific interface, the number of interfaces TestClass and the interface as shown, where 0X0001 indicates that the number of interfaces is 1, and 0x0005 represents the index value of the interface in the constant pool, finding the fifth constant of the constant pool, whose type is Constant_class_info, with a value of: Com/ejushang/testclass/super
9) Fields_count and Field_info, Fields_count represents the number of Field_info tables in the class, and Field_info represents the class's instance variables and class variables, which should be noted here Field_ Info does not contain fields that inherit from the parent class, as shown in the FIELD_INFO structure:
Where access_flags represents the access indication for a field, such as public,private,protected,static,final, the value of Access_flags is as follows:
Where Name_index and Descriptor_index are the index values of a constant pool, each representing the name of the field and the descriptor of the field, the name of the field is easy to understand, but how does the field descriptor understand? In fact, in the JVM specification, the descriptor for a field is specified as follows:
One needs to look at the last line, which represents the descriptor for a one-dimensional array, and the descriptor for string[][] will be [[Ljava/lang/string, and for int[][] the descriptor [[I]. The next Attributes_count and Attribute_info represent the number of attribute tables and the attribute table, respectively. Let's take a look at the field table of TestClass, for example, in the TestClass above.
First, let's take a look at the number of fields, as shown in the number of TestClass fields:
It can be seen that TestClass has two fields, see the source code of TestClass, there are really only two fields, then we look at the first field, we know that the first field should be private int staticvar, The binary representation of it in the class file is as follows:
Where 0x001a represents an access mark, by looking at the Access_flags table, it is acc_private,acc_static,acc_final, then 0x0006 and 0x0007 represent the 6th and 7th constants in the constant pool, respectively. By looking at the constant pool, the values are: Staticvar and I, where Staticvar is the field name, and I is the descriptor of the field, through the interpretation of the upper face descriptor, I describes a variable of type int, Next 0x0001 represents staticvar the number of attribute tables in this field table, from the attribute table that can staticvar the field to 1, 0x0008 represents the 8th constant in the constant pool, and viewing the constant pool can tell that this property is a Constantvalue property. The format of the Constantvalue property is as follows:
Where Attribute_name_index expresses the constant pool index of the property name, in this case constantvalue, and constantvalue attribute_length fixed length is 2, and Constantvalue_ Index represents a reference in a constant pool, in this case, 0x0009, to see the 9th constant, which represents a constant of type constant_integer_info, with a value of 0.
It's over. private static final int staticvar=0, let's go on to TestClass's private int instancevar=0, in this case, the binary representation of Instancevar as shown:
Where 0x0002 represents the name of the field that the access is labeled as acc_private,0x000a, it points to the 10th constant in the constant pool, the constant pool can know that the field name is Instancevar, and 0x0007 represents the descriptor of the field. It points to the 7th constant in the constant pool, viewing the constant pool to know that the 7th constant is I, the type is Instancevar type I, and the last 0x0000 indicates that the number of attribute tables is 0.
Methods_count and Method_info , where Methods_count represents the number of methods, and Method_info represents the method table, where the structure of the method table is as shown:
It can be seen that the structure of method_info and Field_info is very similar, all the flags of the method table Access_flag and the values shown are as follows:
where Name_index and Descriptor_index represent the name and descriptor of the method, they are the indexes that point to the constant pool, respectively. Here it is necessary to explain the descriptor of the method, the structure of the descriptor of the method is: (parameter list) return value, such as the public int instancemethod (int param) descriptor is: (i) I, representing a method with an int type parameter and the return value is also int type , followed by the number of attributes and the attribute table, both the method table and the field table have attribute numbers and attribute tables, but the properties they contain are different. Next we'll look at the binary representation of the method table in TestClass. First look at the number of method tables, as follows:
Since we can see that the number of method tables is 0x0002, there are two methods, then we analyze the first method, we first take a look at the access_flag,name_index,descriptor_index of the first method of TestClass, as follows:
From can know Access_flags for 0x0001, from above to Access_flags flag bit description, know method Access_flags value is acc_public,name_index for 0x000b, Look at the 11th constant in the constant pool, knowing that the name of the method <init>,0x000c means that Descriptor_index represents the 12th constant in the constant pool with a value of () V, which means that the <init> method has no parameters and a return value. In fact, this is the compiler auto-generated instance constructor method. The following 0x0001 indicates that the method table for the <init> method has 1 properties, as follows:
You can see that the constant in the 0x000d corresponds to code in the constant pool, which means that the Code property of the method is represented, so here's what you should know about the method, which is stored in the class file method table in the attribute table in a code attribute. Next we analyze the Code property, the structure of the Code property is as follows:
Where Attribute_name_index points to a constant in the constant pool with a value of code, the length of attribute_length represents the length of the Code attribute table (it is important to note that the length does not include attribute_name_ The 6-byte length of index and attribute_length).
Max_stack represents the maximum stack depth at which the virtual machine allocates the depth of the operand in the stack frame at run time, while the max_locals represents the storage space for the local variable table.
The unit of Max_locals is Slot,slot is the smallest unit in which the virtual machine allocates memory for local variables, at run time, for data types that do not exceed 32-bit types, such as Byte,char,int, which occupy 1 slots, The 64-bit data type of double and long requires allocating 2 slots, and the value of max_locals is not the sum of the amount of memory required for all local variables, because slots are reusable, and when local variables exceed their scope, Slots occupied by local variables are reused.
Code_length represents the number of bytecode instructions, while code represents the byte code instruction, from the type of code can be known as U1, a U1 type of the value of 0x00-0xff, the corresponding decimal 0-255, the current virtual machine specification has defined more than 200 instructions.
Exception_table_length and exception_table represent the exception information corresponding to the method.
Attributes_count and Attribute_info respectively represent the number of attributes in the code attribute and the attribute table, from here can be seen in the file structure of class, the attribute table is very flexible, it can exist in the class file, method table, Field table and the Code property.
Let's continue with the example above, as you can see from the Code property of the Init method above, that the property sheet has a value of 0x00000026,max_stack of 0x0002,max_locals 0x0001,code_ Length is 0x0000000a, then 00000149h-00000152h is bytecode, then Exception_table_length is 0x0000, and Attribute_count value is 0x0001, The value of 00000157h-00000158h is 0x000e, which represents the name of the property in the constant pool, and the Chang learns that the value of the 14th constant is linenumbertable. Linenumbertable is used to describe the corresponding relationship between the line number of the Java source code and the line number of the bytecode, which is not a required property at run time, and if this information is canceled by-g:none compiler parameters, the biggest effect is that when the exception occurs, the stack cannot display the wrong line number , debugging can not follow the source code to set breakpoints, and then we look at the structure of the linenumbertable as shown:
Where Attribute_name_index has been mentioned above, represents the index of the constant pool, attribute_length represents the property length, whereas the START_PC and Line_number tables represent the line number of the bytecode and the line number of the source code. In this example, the byte stream of the Linenumbertable property is as follows:
The above analysis of TestClass's first method, in the same way we can analyze the TestClass of the second method, as follows:
Where Access_flags is 0x0001,name_index for 0x000f,descriptor_index for 0x0010, you can know by viewing the constant pool that this method is public int instancemethod (int param) method. By using the same method as above, we can know that the Code property of Instancemethod is as follows:
Finally, we analyze the properties of the class file, from 00000191h-00000199h to the attribute table in the class file, where 0x0011 represents the name of the property, and you can see the constant pool to know that the property name is SourceFile. Let's look at the structure of the sourcefile as shown:
Where Attribute_length is the length of the property, Sourcefile_index points to constants in the constant pool where the value is the source code file name, in this case the SourceFile attribute is as follows:
Where Attribute_length is 0x00000002 for a length of 2 bytes, and the value of Soucefile_index is 0x0012, the 18th constant of the constant pool can be known as the name Testclass.java of the source code file.
Reproduced in
Example analysis of the file structure of Java class
[Go] Example analysis of the file structure of Java class