Learning Java friends should all know that Java from the beginning of the platform on the banner of independence, said "write, run Everywhere", in fact, to the irrelevant, the Java platform there is another independent of that is language independence, to achieve language independence, Then the Java System class file structure or byte code is very important, in fact, Java from the beginning there are two sets of specifications, one is the Java language Specification, the other is the Java Virtual Machine specification, The Java language Specification simply prescribes the constraints and rules associated with the Java language, and the virtual machine specification is really designed from a cross-platform perspective. Today we'll take a practical example to see what the byte code of a class file in Java should look like. This article will first of all describe what the class is composed of, and then use a real Java class to analyze the class file structure.
Before continuing, we first need to make the following points:
1 class file is composed of 8 bytes of byte based stream, these byte streams are strictly in accordance with the specified order, and there is no gap between the bytes, for more than 8 bytes of data, will be stored in the order of Big-endian, that is, high byte stored in the lower address, And the low byte storage to high address above, in fact this is also the class file to cross the platform of the key, because the PowerPC architecture processing using Big-endian storage order, while the x86 series of processors in the Little-endian storage sequence, so for the class text Components in each of the processor architecture to maintain a unified storage sequence, the virtual machine specification must be unified.
2 class file structure using a structure similar to C language to store data, there are two main types of data items, unsigned numbers and tables, unsigned numbers used to express numbers, index references, and strings, such as u1,u2,u4,u8 representing 1 bytes, 2 bytes, 4 bytes, and 8 bytes of unsigned numbers, The table is a composite structure with multiple unsigned numbers and other tables. Perhaps you can see here on the unsigned number and the table is not very clear, but it does not matter, when the following examples, I will again explain the example.
With the above two points in view, we then look at what data is specifically contained in the class file in a strictly sequential sequence of bytes:
(pictured above from the Java Virtual Machine specification Java SE 7 Edition)
When we look at the graph, there's one thing we need to be aware of, such as cp_info,cp_info representing a constant pool, the constant_pool[constant_pool_count-1 in the diagram above to indicate that a constant pool has Constant_pool_ Count-1 a constant, it's a representation of arrays, but don't assume that all the constant lengths are the same, but this place is just for the sake of describing the way the array was used, but it's not like the programming language where an int is an array, and each int is the same length. With that in mind, we look back and see what each item in the diagram specifically means.
1) U4 Magic represents the number of demons, and the magic number occupies 4 bytes, what is the magic number in the end? It actually means that the type of file is a class file, not a JPG picture, or an AVI movie. and the class file corresponds to the number of magic is 0xCAFEBABE.
2) U2 minor_version represents the minor version number of the class file, and this version number is an unsigned number representation of the U2 type.
3) U2 major_version represents the major version number of the class file, and the major version number is the unsigned number representation of the U2 type. Major_version and minor_version are used primarily to indicate whether the current virtual machine accepts the current version of the class file. The versions of the class files compiled by different versions of the Java compiler are not the same. The high version of the virtual machine supports the lower version of the compiler's compiled class file structure. For example, the Java SE 6.0 corresponding virtual machine supports Java SE 5.0 compiler-compiled class file structure, not vice versa.
4) U2 Constant_pool_count represents the number of constant pools. Here we need to focus on what the constant pool is, please do not confuse the Run-time pool in the JVM memory model, where the constant pools in the class file primarily store literal and symbolic references, where literal literals consist primarily of strings, values of final constants, or initial values of a property, etc. The symbol refers to the fully qualified name of the main storage class and interface, the name of the field and the descriptor, the name of the method, and the descriptor, where the name may be easy to understand, as for the concept of the descriptor, put it in the following fields table and the method table. And everyone knows that there are heaps in the JVM's memory model, stack, method area, program counter composition, and the method area there is a region called the operation of the constant amount of pool, the operation of the constant pool of things stored in fact is the compiler of all kinds of literal and symbolic reference, but the operation of the constant pool is dynamic, It can add other constants to it at run time, and the most representative is the intern method of string.
5) Cp_info represents a constant pool, where there is a wide range of literal and symbolic references mentioned above. The data item placed in the constant pool is a total of 14 constants in the Java Virtual Machine specification Java SE 7 Edition, each of which is a table, and each constant represents what type of constant with a public part tag.
The following is a brief description of the details until we refine them later in the example.
Constant_utf8_info tag bit is 1, UTF-8 encoded string constant_integer_info tag bit is 3, plastic literal constant_float_info tag mark bit is 4, Floating-point literal constant_long_info tag flag bit is 5, long plastic literal constant_double_info tag bit is 6, double literal constant_class_info tag marker bit is 7, The symbol reference Constant_string_info tag for a class or interface is 8, the literal number of the string type Constant_fieldref_info the tag flag bit to 9, and the field's symbolic reference constant_methodref_info Tag flag bit is 10, the method in the class symbol reference Constant_interfacemethodref_info tag bit is 11, the interface in the method of the symbol reference Constant_nameandtype_info tag Mark bit 12, Names of fields and methods and symbolic references to types
6) U2 Access_flags represents the access information for a class or interface, as shown in the following illustration:
7) U2 This_class represents a constant pool index for a class, pointing to a constant in a constant pool Constant_class_info
8) U2 Super_class represents the index of the superclass, pointing to constants in the constant pool Constant_class_info
9) U2 interface_counts represents the number of interfaces
U2 interface[interface_counts] Represents an interface table in which each item points to a constant pool of Constant_class_info constants
U2 Fields_count represents the number of instance variables and class variables for a class
Field_info Fields[fields_count] Represents the information for the field table, where the structure of the field table is as shown in the following illustration:
The access_flags in the figure above represents the access representation of the field, such as the field is Public,private,protect, Name_index represents the field name, and points to constants in the constant pool in which the type is constant_utf8_info. Descriptor_index represents a descriptor for a field, which also points to a constant in a constant pool of type Constant_utf8_info, Attributes_count represents the number of property sheets in the field table, and the property sheet is a Description field, method, and The extensible structure of the properties of a class, and the number of property sheets supported by different versions of Java virtual machines is different.
U2 Methods_count represents the number of method tables
Method_info represents the method table, the concrete structure of the method table as shown in the following illustration:
Where access_flags represents the access representation of the method, Name_index the index representing the name, Descriptor_index the descriptor for the method, Attributes_count, and attribute_info the property sheet in a similar field table. It's just that the properties in the property sheet in the Field table and method table are different, such as the Code property in the method table that represents the method, and the field table does not have a property of the code. How many attributes are there in the specific class, and wait until the property sheet in the class file structure is said again.
Attribute_count represents the number of property sheets, and when it comes to property sheets, we need to make the following points clear:
The property sheet exists at the end of the class file structure, the field table, the method table, and the Code property, which means that the length of the property sheet property sheet can also exist in the property sheet is not fixed, different properties, the length of the property sheet is different
With the above description of each item in the class file structure, let's take a practical example to explain what is said below.
Copy Code code as follows:
Package com.ejushang.TestClass;
public class TestClass implements super{
private static final int staticvar = 0;
private int instancevar=0;
public int instancemethod (int param) {
return param+1;
}
}
Interface super{}
The Testclass.class binary structure of the testclass.java corresponding to the Javac compiled by Jdk1.6.0_37 is shown in the following figure:
Below we will be based on the previous mentioned class file structure to resolve the following image of the word throttling.
1) Magic number
From the file structure of class we know that the first 4 bytes is the magic number, the image from the address of the 00000000h-00000003h is the number of magic, from the above figure that the number of the magic of the class file is 0xCAFEBABE.
2 primary and secondary version number
The next 4 bytes is the primary and secondary version number, there is a figure that corresponds to the 0x0000 from 00000004h-00000005h, so class minor_version for 0x0000, from the 00000006h-00000007h corresponding content of 0x 0032, so the major_version version of the class file is 0x0032, which is exactly the primary and secondary version of the class corresponding to jdk1.6.0 without the target parameter.
3 Number of constant pools
The next 2 bytes represent the number of constant pools from 00000008h-00000009h, the figure above can be known to be 0x0018, decimal is 24, but for the number of constant pools to be clear, the number of constant pools is constant_pool_count-1, Why minus one is because index 0 means that data items in class do not refer to constants in any constant pool.
4) Constant Pool
We said above that there are different types of constants in the constant pool, so let's look at the first constant of Testclass.class, and we know that each constant has a U1 type tag identifier that represents the type of the constant, and the 0000000ah at the top of the figure is 0x0a, which translates into a two-level system of 10, With the above description of the constant type, we know that the constant of tag 10 is Constant_methodref_info, and the Constant_methodref_info knot is as shown in the following figure:
where Class_index points to constants of type Constant_class_info in a constant pool, it can be seen from the TestClass binary file structure that the Class_index value is 0x 0004 (address is 0000000bh-0000000ch), which means pointing to the fourth constant.
Name_and_type_index point to a constant pool in which the type is Constant_nameandtype_info constant. From the figure above, you can see that the value of Name_and_type_index is 0x0013, representing the 19th constant in the constant pool.
You can then find all the constants in the constant pool in the same way. However, the JDK provides a handy tool for us to look at the constants contained in the constant pool. The constants in all constant pools can be obtained by Javap-verbose TestClass, with screenshots as follows:
As we can see from the above figure, there are 24 constants in the TestClass, and don't forget the No. 0 constant, because the No. 0 constant is used to indicate that the data item in class does not refer to constants in any constant pool. From the above analysis we learned that TestClass's first constant represents the method where Class_index points to the fourth constant Java/lang/object,name_and_type_index to the 19th constant value <init >:() V, you can see from here that the first constant representing the method represents the instance constructor method generated by the Java compiler. Other constants of the constant pool can be parsed by the same method. OK, after analyzing the constant pool, we'll analyze the next access_flags.
5 U2 Access_flags indicates the access information of class or interface, such as class or interface, whether it is public,static,final, etc. Specific access to the meaning of the logo has been said before, let's take a look at the TestClass access indicator. Class's access is marked from the 0000010dh-0000010e, the period value of 0x0021, according to the various access indicators mentioned above, we can know: 0x0021=0x0001|0x0020 also namely Acc_public and Acc_super as true, Which acc_public everyone good understanding, Acc_super is jdk1.2 after the compiled class will carry the flag.
6) U2 This_class represents the indexed value of the class, which represents the fully qualified name of the class, and the index value of the class as shown in the following illustration:
As you can see from the diagram above, the class index value is 0x0003 and corresponds to the third constant of the constant pool, and by the result of JAVAP we know that the third constant is a constant of type constant_class_info, through which the fully qualified name of the class is known: com/ejushang/ Testclass/testclass
7) U2 Super_class represents the index value of the parent class of the current class, the constant pool in which the index value refers to the constants of type Constant_class_info, the index value of the parent class is shown in the following figure, its value is 0x0004, and the fourth constant of the constant pool is viewed. The fully qualified name of the TestClass's parent class is: Java/lang/object
8) Interfaces_count and Interfaces[interfaces_count] represent the number of interfaces and each interface, the number of TestClass interfaces and the interface shown in the following figure, where 0x 0001 indicates that the interface number is 1, and 0x0005 represents the index value of the interface in the constant pool, the fifth constant of the constant pool is found, the type is Constant_class_info, and the value is: Com/ejushang/testclass/super
9) Fields_count and Field_info, Fields_count represents the number of Field_info tables in the class, and Field_info represents instance variables and class variables for the class, and here is the point to note field_ Info does not contain fields inherited from the parent class, and the structure of the field_info is shown in the following illustration:
Where access_flags represents the access indicator for the field, such as Public,private,protected,static,final, Access_flags is shown in the following figure:
Where Name_index and Descriptor_index are the index values of the constant pool, representing the name of the field and the descriptor of the field, the name of the field is easy to understand, but how does the field descriptor understand it? In fact, in the JVM specification, the descriptor for the field is specified as shown in the following illustration:
One of them needs to focus on the last line of the diagram, which represents the descriptor for a one-dimensional array, and the descriptor for string[][] will be [[Ljava/lang/string, and the descriptor for int[][] is [[I]. The next Attributes_count and Attribute_info represent the number of property sheets and the property sheet respectively. Below we still take the above TestClass as an example, look at the field table of TestClass.
First let's look at the number of fields, the number of TestClass fields as shown in the following figure:
From the above figure can be seen that TestClass has two fields, see the source code of TestClass know, indeed there are only two fields, and then we look at the first field, we know that the first field should be private int staticvar, Its binary representation in the class file is shown in the following illustration:
Where 0x001a represents an access indicator, which is acc_private,acc_static,acc_final by looking at the Access_flags table, and then 0x0006 and 0x0007 represent the 6th and 7th constants in the constant pool, respectively, By looking at the constant pool, the values are: Staticvar and I, where Staticvar is the field name, and I is the descriptor for the field, and I describe the variable of type int by the explanation of the upper face descriptor, and the next 0x 0001 represents the number of Staticvar in this field table, from the previous figure can staticvar the field corresponding to the property sheet has 1, 0x0008 represents the 8th constant in the constant pool, see the constant pool can know this property is Constantvalue property, The format of the Constantvalue property is shown in the following illustration:
Where Attribute_name_index is the constant pool index that expresses the property name, in this case constantvalue, and the constantvalue attribute_length fixed length is 2, and Constantvalue_ Index represents a reference in a constant pool, in this case, the 0x0009, and the 9th constant is known to represent a constant of type constant_integer_info with a value of 0.
It's over. private static final int staticvar=0, let's go on to TestClass's private int instancevar=0, in this case the Instancevar binary represents the following figure:
Where 0x0002 represents the name of the field that the access is labeled acc_private,0x000a, which points to the 10th constant in the constant pool, to view the constant pool to know that the field name is Instancevar and 0x0007 to represent the descriptor for the field, It points to the 7th constant in the constant pool, looking at the constant pool to know that the 7th constant is I, the type of Instancevar is I, and the last 0x0000 indicates that the number of property sheets is 0.
Methods_count and Method_info , where Methods_count represents the number of methods, and Method_info represents the method table, where the structure of the method table is shown in the following illustration:
From the diagram above, we can see that the structure of method_info and Field_info is very similar, the access_flag of all the flags of the method table and the values shown in the following figure:
where Name_index and Descriptor_index represent the name and descriptor of the method, which is the index that points to the constant pool, respectively. Here we need to explain the descriptor of the method, and the structure of the method descriptor is: (argument list) return value, for example, the descriptor for public int instancemethod (int param) is: (i) I, representing a method with an int type parameter and a return value of type int , followed by the number of attributes and the property sheet, both the method table and the field table have attribute numbers and property sheets, but they contain different attributes. Next we'll take a look at the binary representation of the method table with TestClass. First look at the number of method tables, screenshots are as follows:
From the above figure you can see that the number of method tables is 0x0002 two methods, then we analyze the first method, we first look at TestClass's first method of Access_flag,name_index,descriptor_index, Screenshot below:
From the above figure can know Access_flags as 0x0001, from the above to the access_flags sign bit description, the Access_flags value of the method is Acc_public,name_index for 0x000b, View the 11th constant in a constant pool, knowing that the method's name is <init>,0x000c, and that the Descriptor_index represents the 12th constant in the constant pool, with a value of () V, indicating that the <init> method has no parameters and a return value. This is actually an instance constructor method that the compiler automatically generates. The next 0x0001 representation of the <init> method table has 1 properties, and the property screenshot is as follows:
From the diagram above, we can see that the constants in the corresponding constant pool are code, the code attribute of the method, so here you should understand that the 0x000d of the method is stored in the property sheet in the code attribute in the class file method table. Next we're going to analyze the code attribute, the structure of the Code property as shown in the following illustration:
Where Attribute_name_index points to a constant in which the value in the constant pool is code, the length of the attribute_length represents the length of the Code property sheet (the length does not include attribute_name_ when required) The 6-byte length of index and attribute_length).
Max_stack represents the maximum stack depth, in which the virtual machine allocates the depth of the operand in the stack frame based on this value, while the max_locals represents the storage space of the local variable table.
The max_locals unit is Slot,slot is the smallest unit in which the virtual machine allocates memory for local variables, and at run time, it occupies 1 slot for no more than 32-bit type of data type, such as Byte,char,int, The 64-bit data type, double and long, requires 2 slot, and the max_locals value is not the sum of the amount of memory required for all local variables, because slot is reusable, and when the local variable exceeds its scope, The slot that the local variable occupies is reused.
Code_length represents the number of bytecode instructions, while the code is the byte code instructions, from the above figure can know the type of code is U1, a U1 type of 0x00-0xff, the corresponding decimal 0-255, the current virtual machine specification has defined more than 200 instructions.
Exception_table_length and exception_table each represent the exception information for the method.
Attributes_count and Attribute_info represent the number of attributes and the property sheet in the code attribute, and from here you can see that the property sheet is flexible in the file structure of class, which can exist in class files, method tables, Field table and the Code property.
Next, we continue to analyze the above example, from the above the Init method of the code properties of the screenshot can be seen, the length of the property sheet is 0x00000026,max_stack value is 0x0002,max_locals value of 0x0001,code_ Length is 0x0000000a, then 00000149h-00000152h is bytecode, next exception_table_length length is 0x0000, and Attribute_count value is 0x The value of 0001,00000157h-00000158h is 0x000e, which represents the name of the property in the constant pool, and view Chang that the value of the 14th constant is linenumbertable, Linenumbertable is used to describe the line number of Java source code and byte code line number of the corresponding relationship, it is not a run-time required properties, if the-g:none compiler parameters to cancel the generation of this information, the biggest effect is that when the exception occurs, the stack can not display the wrong line number , when debugging can not set the breakpoint according to the source code, and then we look at the structure of the linenumbertable as shown in the following figure:
Which Attribute_name_index mentioned above, represents the index of a constant pool, attribute_length represents the length of the property, and the START_PC and Line_number tables represent the line number of the bytecode and the line number of the source code. The byte stream of the Linenumbertable property in this example is shown in the following illustration:
The above analysis TestClass the first method, through the same way we can analyze the TestClass of the second method, screenshot as follows:
Where Access_flags is 0x0001,name_index as 0x000f,descriptor_index 0x0010, this method is known to be public int instancemethod by viewing the constant pool (int param) method. By using a method similar to the above, we can see that the Instancemethod's code attribute is shown in the following illustration:
Finally, we analyze the attributes of the class file, from 00000191h-00000199h to the property sheet in the class file, where 0x0011 represents the name of the property, and the view constant pool can know that the property name is SourceFile. Let's take a look at the structure of the sourcefile as shown in the following illustration:
Where Attribute_length is the length of the property, Sourcefile_index a constant that points to the name of the source code file in the constant pool, in this case the SourceFile property screenshot is as follows:
Where Attribute_length is 0x00000002 representing a length of 2 bytes, and the Soucefile_index value is 0x0012, view the 18th constant of the constant pool to know that the source code file name is Testclass.java
Finally, I hope to communicate more with friends who are interested in technology. Personal microblog: (HTTP://WEIBO.COM/XMUZYQ)