Analyze the file structure of Java Class by using an instance

Source: Internet
Author: User

Friends who are studying Java should know that Java started from the very beginning with a platform-independent flag, saying "one write, run everywhere". In fact, it is irrelevant, the Java platform has another independence, that is, language independence. To achieve language independence, the file structure or bytecode of the class in the Java system is very important, java has two sets of specifications since the beginning. One is the Java language specification and the other is the Java Virtual Machine specification. the Java language specification only specifies the constraints and rules related to the Java language, virtual Machine specifications are designed from a cross-platform perspective. Today, we will take a practical example to see what the bytecode corresponding to a Class file in Java should look like. This article will first explain the content of the Class in general, and then use an actual Java class to analyze the file structure of the Class.

Before proceeding, we need to clarify the following points:

1) the Class file is composed of eight byte streams. These byte streams are arranged strictly in the specified order, and there is no gap between the byte streams, for data with more than 8 bytes, the data is stored in the order of Big-Endian, that is, the high bytes are stored on the low address, and the low bytes are stored on the high address, in fact, this is also the key to cross-platform class files, because the processing of the PowerPC architecture adopts the Big-Endian storage sequence, while the x86 series of processors adopt the Little-Endian storage sequence, therefore, in order to maintain a Unified Storage sequence for Class files in various mid-processor architectures, Vm specifications must be unified.

2) the Class file structure uses a structure similar to the C language to store data. There are two main types of data items: the number of unsigned data and the table, and the number of unsigned data is used to express numbers, index references, and strings, for example, u1, u2, u4, and u8 represent 1 byte, 2 bytes, 4 bytes, and 8 bytes of unsigned number, A table is a composite structure consisting of multiple unsigned numbers and other tables. You may not be very clear about the unsigned number and table here, but it doesn't matter. I will explain it using instances when I wait for the following instances.

After clarifying the above two points, let's look at the data contained in the byte stream in the Class file in strict order:

(From The Java Virtual Machine Specification Java SE 7 Edition)

At the time of reading, we need to pay attention to one thing, such as cp_info, cp_info represents the constant pool, using constant_pool [constant_pool_count-1] to indicate that the constant pool has a constant_pool_count-1 constant, it uses arrays, but do not mistakenly think that the constant lengths of all constant pools are the same. In fact, this is just to facilitate the description of the array method, but this is not like programming languages where an int-type array, each int length is the same. After clarifying this point, let's look back at what each item represents.

1) u4 magic indicates the magic number, and the magic number occupies 4 bytes. What is the magic number? It indicates that the file type is a Class file, not a jpg image or AVI movie. The magic number of the Class file is 0xCAFEBABE.

2) u2 minor_version indicates the minor version number of the Class file, and the version number is expressed by the unsigned number of the u2 type.

3) u2 major_version indicates the major version number of the Class file, and the major version number is the unsigned number of the u2 type. Major_version and minor_version are used to indicate whether the current virtual machine accepts Class files of the current version. The corresponding versions of the Class files compiled by different Java compilers are different. A Vm of a higher version supports the Class file structure compiled by a lower version compiler. For example, the virtual machine corresponding to Java SE 6.0 supports the Class file structure compiled by the Java SE 5.0 compiler, but not vice versa.

4) u2 constant_pool_count indicates the number of constant pools. Here we need to focus on what the constant pool is. Please do not confuse it with the runtime frequent pool in the Jvm memory model, in the Class file, the constant pool mainly stores the number of faces and symbol references. The literal quantity mainly includes strings, final constant values, or the initial values of a certain attribute, symbol references fully qualified names of storage classes and interfaces, field names and descriptions, method names and descriptors, which may be easy to understand. As for the concept of descriptors, let's talk about the field table and method table below. In addition, we all know that the Jvm memory model contains a heap, stack, Method Area, and program counter, and another area in the method area is called the runtime constant pool, what is stored in the runtime pool is actually the literal volume and symbol reference generated by the compiler, but the runtime pool is dynamic, it can add other constants to it during running. The most representative is the String intern method.

5) cp_info indicates the constant pool. There are various literal and symbolic references mentioned above. In The constant pool, there are 14 constants in The Java Virtual Machine Specification Java SE 7 Edition, each of which is a table, each constant uses a common part tag to indicate which type of constant is used.

The following is a brief description of the specific details, which will be further refined by the subsequent instances.

CONSTANT_Utf8_info tag flag is 1, UTF-8 encoded string CONSTANT_Integer_info tag flag is 3, integer literal CONSTANT_Float_info tag flag is 4, float literal CONSTANT_Long_info tag flag is 5, the CONSTANT_Double_info tag flag of the long integer literal is 6, the CONSTANT_Class_info tag flag of the double-precision literal is 7, and the CONSTANT_String_info tag flag of the class or interface is 8, the CONSTANT_Fieldref_info tag flag of the string type is 9, The CONSTANT_Methodref_info tag flag of the field is 10, and the CONSTANT_InterfaceMethodref_info tag flag of the method in the class is 11, the CONSTANT_NameAndType_info tag of the method reference in the interface is 12, and the name and type of the field and method are referenced.

6) u2 access_flags indicates the access information of the class or interface, as shown in:

7) u2 this_class indicates the constant pool index of the class and points to the constant of CONSTANT_Class_info in the constant pool.

8) u2 super_class indicates the superclass index and points to the constant of CONSTANT_Class_info in the constant pool.

9) u2 interface_counts indicates the number of interfaces.

10) u2 interface [interface_counts] indicates the interface table, and each item in it points to the CONSTANT_Class_info constant in the constant pool.

11) u2 fields_count indicates the number of instance variables and class variables of the class.

12) field_info fields [fields_count] indicates the information of the field table. The structure of the field table is shown in:

Access_flags indicates the access representation of the field, for example, the field is public, private, protect, etc. name_index indicates the field name, pointing to the constant pool where the type is a constant of CONSTANT_UTF8_info, and descriptor_index indicates the field descriptor, it also points to the constant whose type is CONSTANT_UTF8_info in the constant pool. attributes_count indicates the number of attribute tables in the field table, while the attribute table is a field used and described, the number of attribute tables supported by Java virtual machines of different versions is different.

13) u2 methods_count indicates the number of table methods.

14) method_info indicates the method table. The specific structure of the method table is shown in:


Access_flags indicates the access representation of the method, name_index indicates the index of the name, descriptor_index indicates the description of the method, and attributes_count and attribute_info indicate attribute tables in fields similar to the field table, however, the attributes in the Attribute Table of the field table and method table are different. For example, the Code attribute of the method table indicates the Code of the method, and the Code attribute is not found in the field table. The number of attributes in the specific Class. Let's talk about the Attribute Table in the Class file structure.

15) attribute_count indicates the number of attribute tables. When it comes to attribute tables, we need to clarify the following points:

The Attribute Table exists at the end of the Class file structure, in the Field table, method table, and Code attribute. That is to say, the attribute table can also have an Attribute Table whose length is not fixed, different attributes have different attribute table lengths.

After completing the composition of each item in the Class file structure, we will explain the content mentioned above with an actual example.

Copy codeThe Code is as follows: package com. ejushang. TestClass;
Public class TestClass implements Super {
Private static final int staticVar = 0;
Private int instanceVar = 0;
Public int instanceMethod (int param ){
Return param + 1;
}
}
Interface Super {}

The binary structure of TestClass. class corresponding to TestClass. java compiled by javac of jdk1.6.0 _ 37 is shown in:

Next we will parse the following byte streams based on the file structure of the Class mentioned above.

1) magic number
From the file structure of the Class, we know that the first four bytes are the magic number, and the content from the address 00000000h-00000003h is the magic number. We can see that the magic number of the Class file is 0 xCAFEBABE.

2) major and minor version numbers
The next four bytes are the primary and secondary versions. It can be seen that the value of minor_version of the Class is 0x0000, and the value of minor_version of the Class is 0x0000, therefore, the major_version of the Class file is 0 × 0032, which is the primary and secondary versions of the Class compiled by jdk1.6.0 without the target parameter.

3) Number of constant pools
The next two bytes, 0008h-00000009h, represent the number of constant pools. We can know that the value is 0 × 0018 and the decimal value is 24. However, we need to clarify the number of constant pools, the number of constant pools is the constant_pool_count-1, and why is it dropped because index 0 indicates that the data items in the class do not reference the constant in any constant pool.

4) constant pool
We have mentioned that there are different types of constants in the constant pool. Let's take a look at TestClass. the first constant of the class. We know that each constant has a u1 type tag to indicate the type of the constant. The content of the constant 000ah is 0x0A, and the conversion to the second-level system is 10, the preceding description of the constant type shows that the constant Whose tag is 10 is Constant_Methodref_info, And the Constant_Methodref_info is close enough, as shown in:

Class_index points to a constant whose type is CONSTANT_Class_info in the constant pool. From the binary file structure of TestClass, we can see that the value of class_index is 0 × 0004 (Address: 0000000bh-0000000ch), that is, pointing to the fourth constant.

Name_and_type_index points to a constant whose type is CONSTANT_NameAndType_info in the constant pool. It can be seen that the name_and_type_index value is 0 × 0013, which indicates pointing to the 19th constants in the constant pool.

Next, we can find all constants in the constant pool in the same way. However, JDK provides a convenient tool for us to view constants contained in the constant pool. You can use javap-verbose TestClass to obtain constants in all constant pools, as follows:

We can clearly see that the constant pool in TestClass has 24 constants. Do not forget 0th constants, because 0th constants are used to indicate that the data items in the Class do not reference constants in any constant pool. From the above analysis, we know that the first constant representation method of TestClass, in which class_index points to the fourth constant of java/lang/Object, the 19th constant value pointed to by name_and_type_index is <init> :() V. From here, we can see that the first constant representing the method represents the instance constructor method generated by the java compiler. Other constants in the constant pool can be analyzed in the same way. OK. After analyzing the constant pool, we will analyze access_flags.
5) u2 access_flagsIndicates the access information of the Class or interface, such as whether the Class indicates the Class or interface, whether it is public, static, final, etc. The description of the access tag has been mentioned before. Let's take a look at the access tag of TestClass. The access ID of the Class is from 0000010dh-0000010e, and the period value is 0 × 0021. According to the access ID signs mentioned above, we can know: 0x0021 = 0x0001 | 0x0020, that is, the values of ACC_PUBLIC and ACC_SUPER are true. ACC_PUBLIC indicates that the classes compiled after jdk1.2.

6) u2 this_classIndicates the index value of the class, which is used to indicate the fully qualified name of the class. The index value of the class is shown in:

It can be clearly seen that the index value of the class is 0 × 0003, which corresponds to the third constant in the constant pool. Through javap results, we know that the third constant is a constant of the CONSTANT_Class_info type, it can be used to know the full-qualified name of the class: com/ejushang/TestClass

7) u2 super_classThe index value of the parent class of the current class. The index value points to a constant of the CONSTANT_Class_info type in the constant pool. The index value of the parent class is shown in. Its value is 0 × 0004, check the fourth constant in the constant pool. We can see that the full qualified name of the TestClass parent class is: java/lang/Object.

8) interfaces_count and interfaces [interfaces_count]Indicates the number of interfaces and each specific interface. shows the number of TestClass interfaces and interfaces. 0x0001 indicates that the number of interfaces is 1, 0 × 0005 indicates the index value of the interface in the constant pool. Find the fifth constant in the constant pool. Its type is CONSTANT_Class_info and its value is com/ejushang/TestClass/Super.

9) fields_count and field_info, Fields_count indicates the number of field_info tables in the class, while field_info indicates the instance variables and class variables of the class. Note that field_info does not contain fields inherited from the parent class, shows the structure of field_info:

Access_flags indicates the access ID of a field, such as public, private, protected, static, and final. The value of access_flags is shown in:

Among them, name_index and descriptor_index are the index values of the constant pool, which respectively indicate the field name and the field descriptor. The field name is easy to understand, but how can we understand the field descriptor? In fact, in the JVM specification, the field descriptor rules are as follows:

You need to pay attention to the last line, which represents the descriptor of the one-dimensional array. For String [] [], the descriptor will be [[Ljava/lang/String, the descriptor of int [] [] is [[I. The following attributes_count and attribute_info indicate the number of attribute tables and attribute tables respectively. Let's take the above TestClass as an example. Let's take a look at the field table of TestClass.

First, let's take a look at the number of fields. The number of fields in TestClass is shown in:

We can see that TestClass has two fields. We can see from the source code of TestClass that there are indeed only two fields. Next, let's look at the first field. We know that the first field should be private int staticVar, its binary representation in the Class file is shown in:


0x001A indicates the access ID. It can be seen from the access_flags table that it is ACC_PRIVATE, ACC_STATIC, ACC_FINAL. Then 0x0006 and 0x0007 indicate 6th and 7th constants respectively in the constant pool, through viewing the constant pool, we can see that the values are staticVar and I, where staticVar is the segment name, And I is the field descriptor, through the above description of the descriptor, I describes the int type variables. The following 0 × 0001 represents the number of attribute tables in the staticVar field table. One Attribute Table can correspond to the staticVar field, 0 × 0008 indicates the 8th constants in the constant pool. You can view the constant pool to know that this attribute is the ConstantValue attribute, as shown in the format of the ConstantValue attribute:

Attribute_name_index indicates the index of the constant pool of the attribute name. In this example, It is ConstantValue, while the attribute_length of ConstantValue is 2, while constantValue_index indicates the reference in the constant pool. In this example, it is 0 × 0009. You can check the 9th constants. It indicates a constant of the CONSTANT_Integer_info type and its value is 0.

Private static final int staticVar = 0. Next we will talk about the private int instanceVar = 0 of TestClass. In this example, the binary representation of instanceVar is shown in:


0 × 0002 indicates that the access is marked as ACC_PRIVATE, and 0x000A indicates the name of the field. It points to the 10th constants in the constant pool. You can view the constant pool to know that the field name is instanceVar, 0 × 0007 indicates the descriptor of the field. It points to the 7th constants in the constant pool. You can see that the 7th constants are I, indicating that the type of instanceVar is I, the last 0 × 0000 indicates that the number of attribute tables is 0.

10) methods_count and method_infoWhere methods_count indicates the number of methods, while method_info indicates the method table. The structure of the method table is shown in:

We can see that the structure of method_info and field_info is very similar. shows all the flag spaces and values of the access_flag in the method table:

Among them, name_index and descriptor_index represent the method name and descriptor, which respectively point to the index of the constant pool. Here we need to describe the method descriptor. The structure of the method descriptor is: (list of parameters) return value. For example, the descriptor of public int instanceMethod (int param) is: (I) I, it indicates a method with an int type parameter and the return value is also int type. The following describes the number of attributes and the attribute table. Although both the method table and the field table have the attribute quantity and Attribute Table, however, their attributes are different. Next, let's take a look at the binary representation of the method table in TestClass. The number of table methods is as follows:


We can see that the number of table methods is 0 × 0002, which indicates there are two methods. Next we will analyze the first method. Let's first look at access_flag, name_index, descriptor_index, as follows:


From the description of the access_flags flag, we can see that the access_flags value of the method is ACC_PUBLIC, The name_index value is 0x000B, And the 0001 constants in the constant pool are viewed, the method is named <init>, and 0x000C indicates descriptor_index indicates the 12th constant in the constant pool. Its value is () V, indicating that the <init> method has no parameters or return values, in fact, this is an instance constructor method automatically generated by the compiler. The following 0 × 0001 indicates that the method table of the <init> method has one attribute, which is as follows:

It can be seen that the constant in the constant pool corresponding to 0x000D is Code, indicating the Code attribute of the method, so here we should understand that the Code of the method is stored in the Code attribute of the Attribute Table in the Class file method table. Next, we will analyze the Code attribute. The structure of the Code attribute is shown in:

Attribute_name_index points to the constant whose value is Code in the constant pool. The length of attribute_length indicates the length of the Code Attribute Table (note that the length does not include the length of 6 bytes of attribute_name_index and attribute_length ).

Max_stack indicates the maximum stack depth. The VM allocates the depth of the operands in the stack frame based on this value during runtime, while max_locals indicates the storage space of the local variable table.

The unit of max_locals is slot, and slot is the smallest unit in which the VM allocates memory for local variables. during runtime, for data types of up to 32 bits, such as byte, char, int occupies 1 slot, while the 64-bit data types such as double and Long need to be allocated 2 slots. In addition, the max_locals value is not the sum of the memory required by all local variables, because slot can be reused, when a local variable exceeds its scope, the slot occupied by the local variable will be reused.

Code_length indicates the number of bytecode commands, while code indicates the number of bytecode commands. you can know that the code type is u1 and the value of a u1 type is 0 × 00-0xFF, the corresponding decimal value is 0-255. At present, the VM specification has defined more than 200 commands.

Prediction_table_length and prediction_table indicate the exception information corresponding to the method.

Attributes_count and attribute_info indicate the number of attributes in the Code attribute and the Attribute Table respectively. We can see from the file structure of the Class, the attribute table is flexible and can exist in the Class file, method table, field table, and Code attribute.

Next, let's continue with the above example for analysis. From the Code attribute of the init method above, we can see that the Attribute Table length is 0 × 00000026, And the max_stack value is 0 × 0002, the value of max_locals is 0 × 0001, and the length of code_length is 0x0000000A. The 00000149 h-00000152h is the bytecode, and the length of prediction_table_length is 0 × 0000, the value of attribute_count is 0 × 0001,000 00157h-00000158h and the value is 0x000E. It indicates the attribute name in the constant pool. Check that the value of the 14th constants in the constant pool is LineNumberTable, lineNumberTable is used to describe the correspondence between the java source code line number and the bytecode line number. It is not a required attribute during runtime. If this information is canceled through the-g: none compiler parameter, the biggest impact is that when an exception occurs, the error line number cannot be displayed in the stack, and the disconnection point cannot be set according to the source code during debugging. Next, let's look at the LineNumberTable structure as shown in:

Attribute_name_index indicates the index of the constant pool, attribute_length indicates the attribute length, and start_pc and line_number indicate the row number of the bytecode and the row number of the source code. In this example, the byte stream of the LineNumberTable attribute is shown in:

After analyzing the first method of TestClass, we can analyze the second method of TestClass in the same way, as shown below:

Access_flags is 0 × 0001, name_index is 0x000F, and descriptor_index is 0 × 0010. by viewing the constant pool, you can use the public int instanceMethod (int param) method. The Code attribute of instanceMethod is shown as follows:

Finally, let's analyze the attributes of the Class file. The Attribute Table in the Class file is from 0011 to, where 0 × indicates the attribute name. You can check the constant pool to know that the attribute name is SourceFile, let's take a look at the structure of SourceFile, as shown in:

Attribute_length is the length of the attribute, and sourcefile_index points to the constant whose value is the source code file name in the constant pool. In this example, the SourceFile attribute is as follows:


Attribute_length is 0 × 00000002, which indicates that the length is 2 bytes, and the value of soucefile_index is 0 × 0012. To view the 18th constants of the constant pool, you can know that the source code file is named TestClass. java.

Finally, I hope to have more friends who are interested in the technology. Personal microblog: (http://weibo.com/xmuzyq)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.