Java Class file format (3)

Source: Internet
Author: User


First, let's review the main content of the previous two blogs about the class file format. In an in-depth understanding of the Java Class file format (I), I explained the location and role of the class file in the entire java architecture, and explained the magic number and version number information in the class file, in addition, the constant pool is outlined. In an in-depth understanding of the Java Class file format (2), we mainly explain the special strings in the class file, including the full qualified names of classes, field descriptors, and method descriptors, these special strings appear in the constant pool of the class file, which is the basis for understanding the constant pool. This article will explain in detail each data item in the constant pool.
If you have not read the first two articles, you are advised to read them first to maintain the consistency of knowledge. The links to the first two articles are provided above. The following describes the constant pool.

Description of various data item types in the constant pool


The general content of the constant pool has been explained in a deep understanding of the Java Class file format (1). This article also introduces 11 Data Types in the constant pool. The task in this article is to explain in detail the 11 data types, and deeply analyze how various information in the source file is stored in the constant pool.


We know that the data items in the constant pool are referenced by indexes, and the data items in the constant pool are also referenced by each other. In these 11 S, there are two types of data items in the constant pool. The reason is that these two types of data items are referenced by other types of data items. The two data types are CONSTANT_Utf8 and CONSTANT_NameAndType. The CONSTANT_NameAndType data item (CONSTANT_NameAndType_info) also references the CONSTANT_Utf8 data item (CONSTANT_Utf8_info ). Different from other books or other materials that introduce the constant pool, we will first introduce these two basic types of data items based on the principle of gradual and orderly division, then we will introduce the data items in the other 9 in sequence.



(1) CONSTANT_Utf8_info
A CONSTANT_Utf8_info is a constant pool data item of the CONSTANT_Utf8 type. It stores a constant string. Almost all the literal values in the constant pool are described through CONSTANT_Utf8_info. The following describes the storage format of the CONSTANT_Utf8_info data item. In the previous article, we mentioned that the type of data items in the constant pool is determined by the tag of an integer. Therefore, all the info types in the constant pool must have a tag, and the tag value is located on the first byte of the data item. A constant pool data type in 11, so there are 11 tag values to indicate the type in 11. The tag value of CONSTANT_Utf8_info is 1, that is, if the VM wants to parse a constant pool data item, first read the tag value of the first byte of the data item. If the tag value is 1, this indicates that the data item is a CONSTANT_Utf8 data item. The two bytes next to the tag value are the length of the stored string, and the remaining bytes store the string. Therefore, the format is as follows: tag occupies one byte, length occupies two bytes, and bytes represents the stored string and length. Therefore, if the CONSTANT_Utf8_info stores the string "Hello", its storage format is as follows:
Now that we know the storage format of the CONSTANT_Utf8_info data item, what strings are stored for the CONSTANT_Utf8_info data item? CONSTANT_Utf8_info can include the following strings:
  • String constants in the program
  • Full qualified name of the current class (including interfaces and enumeration) where the constant pool is located
  • Full qualified name of the direct parent class of the current class where the constant pool is located
  • Fully qualified names of all interfaces implemented or inherited by the current type of the constant pool
  • Name and descriptor of the field defined in the current type of the constant pool
  • Name and descriptor of the method defined in the current type of the constant pool
  • Fully Qualified name of the type referenced by the current class
  • Name and descriptor of fields in other classes referenced by the current class
  • Name and descriptor of methods in other classes referenced by the current class
  • A string related to the attributes in the current class file, such as the attribute name.
To sum up, there are five types: string constants in the program, fully qualified names of types, names of methods and fields, descriptors of methods and fields, and attribute-related strings. The string constants in the program do not need to be said much. We often use them to create string objects and Attribute-related strings, which will be mentioned when we talk about the attribute information in class. You do not need to mention the method and field name. The rest is the type of fully qualified names, methods, and field descriptors. This is the "special string" mentioned in the previous article ", unfamiliar users can read the previous article to understand the Java Class file format (2 ). It also needs to be noted that full qualified names of types, names of methods and fields, methods and field descriptors can be defined in this type, it may also be referenced by other classes in this class. The following is an example. Sample source code:
package com.jg.zhang;public class Programer extends Person {static String company = "CompanyA";static{System.out.println("staitc init");}String position;Computer computer;public Programer() {this.position = "engineer";this.computer = new Computer();}public void working(){System.out.println("coding...");computer.working();}}


This class is simple, but after decompilation, there are as many as 53 constant pools. Among the 53 constant pool data items, there are various types of data items, including many CONSTANT_Utf8_info. Only the CONSTANT_Utf8_info data items in the decompiled constant pool are listed below:
#2 = Utf8 com/jg/zhang/Programer // full qualified name of the current class #4 = Utf8 com/jg/zhang/Person // full qualified name of the parent class #5 = Utf8 company // company field name #6 = Utf8 Ljava/lang/String; // company and position field descriptor #7 = Utf8 position // position field name #8 = Utf8 computer // computer field name #9 = Utf8 Lcom/jg/zhang/ computer; // The descriptor of the computer field #10 = Utf8 <clinit> // Method Name of the class initialization method (that is, the static initialization block) #11 = Utf8 () V // working method descriptor #12 = Utf8 Code // Code attribute name #14 = Utf8 CompanyA // constant string in the Program #19 = Utf8 java/lang/System/ /fully qualified name of the referenced System class #21 = Utf8 out // field name of the referenced out field #22 = Utf8 Ljava/io/PrintStream; // descriptor of the referenced out field #24 = Utf8 staitc init // constant string in the Program #27 = Utf8 java/io/PrintStream // fully qualified PrintStream class referenced name #29 = Utf8 println // Method Name of the referenced println method #30 = Utf8 (Ljava/lang/String ;) V // the descriptor of the referenced println method #31 = Utf8 LineNumberTable // attribute name of the LineNumberTable attribute #32 = Utf8 LocalVariableTable // attribute name of the LocalVariableTable attribute #33 = Utf8 <init> // Method Name of the constructor of the current class #41 = Utf8 com/jg/zhang/Computer // fully qualified name of the referenced Computer class #45 = Utf8 this // variable #46 = Utf8 Lcom/jg/zhang/Programer; // descriptor of the local variable this #47 = Utf8 working // Method Name of the woking method #49 = Utf8 coding... // String constant in the Program #52 = Utf8 SourceFile // attribute name of the SourceFile attribute #53 = Utf8 Programer. java // file name of the source file of the current class


Only the CONSTANT_Utf8_info data items in the constant pool in The Decompilation results are listed above. The third column is not the output result of javap decompilation, but the comments I added. Readers can compare the source code of the above program. In this way, we can clearly see how the various strings in the source file are stored in CONSTANT_Utf8_info.

It should be emphasized that almost all visible strings in the source file are stored in CONSTANT_Utf8_info. Other types of constant pool items are just references to CONSTANT_Utf8_info. For other constant pool items, you can combine the referenced CONSTANT_Utf8_info to describe more information. The following CONSTANT_NameAndType_info will be introduced to verify this conclusion.


(2) CONSTANT_NameAndType data items
A CONSTANT_NameAndType_info data item in the constant pool can be considered as an instance of the CONSTANT_NameAndType type. The data item Name describes two types of information: Name and Type ). The name here refers to the method name or field name, and Type is a generalized Type, which actually describes the field descriptor or method descriptor. That is to say, if the Name part is a field Name, the Type part is the descriptor of the corresponding field; if the Name part describes the Name of a method, the Type part is the descriptor of the corresponding method. That is to say, a CONSTANT_NameAndType_info indicates a method or a field.
Next, let's take a look at the storage format of the CONSTANT_NameAndType_info data item. Since it is a data item type in the constant pool, its first byte is also a tag, its tag value is 12, that is, when the Virtual Machine reads a constant pool data item whose tag is 12, you can confirm that the data item is a CONSTANT_NameAndType_info. The two bytes of the tag value are called name_index, which points to a CONSTANT_Utf8_info in the constant pool. The name of the method or field is stored in CONSTANT_Utf8_info. The two bytes after name_index are called descriptor_index, which points to a CONSTANT_Utf8_info in the constant pool. This CONSTANT_Utf8_info stores the descriptor of the method or field. Indicates its storage layout:

The following example shows the source code of the instance:
package com.jg.zhang;public class Person {int age;int getAge(){return age;}}


This Person class is very simple, with only one field age and one method getAge. After this code is decompiled using the javap tool, the constant pool information is as follows:
   #1 = Class              #2             //  com/jg/zhang/Person   #2 = Utf8               com/jg/zhang/Person   #3 = Class              #4             //  java/lang/Object   #4 = Utf8               java/lang/Object   #5 = Utf8               age   #6 = Utf8               I   #7 = Utf8               <init>   #8 = Utf8               ()V   #9 = Utf8               Code  #10 = Methodref          #3.#11         //  java/lang/Object."<init>":()V  #11 = NameAndType        #7:#8          //  "<init>":()V  #12 = Utf8               LineNumberTable  #13 = Utf8               LocalVariableTable  #14 = Utf8               this  #15 = Utf8               Lcom/jg/zhang/Person;  #16 = Utf8               getAge  #17 = Utf8               ()I  #18 = Fieldref           #1.#19         //  com/jg/zhang/Person.age:I  #19 = NameAndType        #5:#6          //  age:I  #20 = Utf8               SourceFile  #21 = Utf8               Person.java



There are 21 items in the constant pool. We can see that there are two CONSTANT_NameAndType_info data items, item #11 and item #19, respectively, the CONSTANT_NameAndType_info of item #11 references item #7 and item #8 in the constant pool. Both items referenced are CONSTANT_Utf8_info, the String constant values stored in them are <init> and () V. In fact, they add up to the constructor of the parent class Object. So why is the constructor of the parent class Object instead of the constructor of the class? This is because if the methods defined in the class are not referenced (that is, they are not called in the current class), there will be no corresponding CONSTANT_NameAndType_info in the constant pool, only when a method is referenced can the corresponding CONSTANT_NameAndType_info correspond to it. This is also why CONSTANT_NameAndType_info is part of the symbolic reference of the method. (A new concept called symbolic reference of a method is mentioned here. This concept will be explained in later blogs.) We can see that there are two methods in the source code, they are the default constructor method added by the compiler and the getAge method we define, because these two methods are not called as shown in the source code, therefore, CONSTANT_NameAndType_info corresponding to the two methods does not exist in the constant pool. CONSTANT_NameAndType_info corresponding to the construction method of the parent class Object exists because the non-parameter construction method of the parent class is called by default in the subclass construction method. We can remove other information from the constant to make it more intuitive:


The following describes CONSTANT_NameAndType_info of item #19 of the constant pool, which references item #5 and item #6 of the constant pool. These two items are also CONSTANT_Utf8_info, the stored strings are age and I, where age is the field name of age in the source code, and I is the descriptor of the age field. Therefore, CONSTANT_NameAndType_info indicates the reference of the field age in this class. In addition to other information in the constant pool, you can see it more intuitively:


As with the method, only one field is defined and not referenced (this variable is not accessed in the source code), so there will be no CONSTANT_NameAndType_info item corresponding to this field in the constant pool. This is why CONSTANT_NameAndType_info is part of the field symbol reference. (A new concept, called symbolic reference of a field, will be explained in the blog.) In this example, CONSTANT_NameAndType_info appears, this field is accessed in the getAge method of the source code:
int getAge(){return age;}


The following shows the actual memory layout of the two CONSTANT_NameAndType_info:
CONSTANT_NameAndType_info related to the Object construction method:


CONSTANT_NameAndType_info related to the age field:

The two graphs can reflect the data storage methods of the two constant pool data items CONSTANT_NameAndType_info and CONSTANT_Utf8_info. They can also reflect the reference relationship between CONSTANT_NameAndType_info and CONSTANT_Utf8_info.

Summary
So far, this blog introduces two data items in the constant pool: CONSTANT_NameAndType_info and CONSTANT_Utf8_info. CONSTANT_Utf8_info stores various strings in the source file, while CONSTANT_NameAndType_info represents a part of the symbol reference to a field or method in the source file (that is, the method name and method descriptor, or the field name and field descriptor ). In the next blog, we will continue to explain other types of data items in the constant pool.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.