Deep understanding of the JVM (vii)--CLASS file structure

Source: Internet
Author: User
Tags field table

What is the "extraneous" nature of the JVM?

Java is platform agnostic, which means that any operating system can run Java code. This can be achieved because Java runs on top of virtual machines, and different operating systems have their own Java virtual machines, so Java can "write once, run everywhere."

The JVM is not only platform-independent, but also language-independent.
Platform independence refers to the different operating systems have their own JVM, and language independence refers to the Java virtual function to run a code other than Java!

This sounds amazing, but the JVM has strict requirements for languages that can run. The first step is to understand the operation of the Java code.

The Java source code first needs to compile the class file using the Javac compiler, and then start the JVM execution class file, so the program starts running.
That is, the JVM only knows the class file, and it generates a class file in any language, as long as the class file conforms to the JVM's specifications.
As a result, languages such as Scala, JRuby, Jython and so on are available to run on the JVM. They have their own syntax rules, but their compilers can compile their own source code into a class file that conforms to the JVM specification, allowing them to run with the JVM.


Throughout the class file structure

Class file is a binary file, its content has strict specifications, the file does not have any spaces, all is a continuous 0/1. All content in the class file is divided into two types: unsigned number and table.
-Unsigned number
It represents a value in a class file that has no type, but has a different length. Depending on the length of these values are divided into: U1, U2, U4, U8, representing 1-byte unsigned number, 2-byte unsigned number, 4-byte unsigned number, 8-byte unsigned number.
-Table
All data in the class file (that is, unsigned) either exists individually or consists of a two-dimensional table with multiple unsigned numbers. That is, the data in the class file is either a single value or a two-dimensional table.

The organizational structure of the class file
    1. Magic number
    2. Version information for this file
    3. Constant pool
    4. Access flags
    5. Class Index
    6. Parent Class Index
    7. Interface Index Collection
    8. Field table Collection
    9. Method table Collection


class file composition 1: Magic number

The first 4 bytes of a class file are called Magic Numbers, which represent the type of the class file.

The function of magic number is equivalent to the file suffix name, but the suffix name is easy to be modified, unsafe, so it is appropriate to mark the file type in the class file.

The magic number of the class file is "Cafebabe" in 16 notation, very romantic, who says the programmer's EQ is very low!

class file Composition 2: Version information

The first 4 bytes of the magic number are the version number. It represents which version of the JDK is used in this class.

You can run a lower version of the class file on a later JVM, but you cannot run a higher version of the class file on a lower JVM, even if the class file does not have any high-version JDK features!

class file Composition 3: Constant Pool 1. What is a constant pool?

Immediately following the version number is the constant pool. The constant pool holds two types of constants:

    • Literal constants
      Literal constants are the strings that we define in the program, the values that are final decorated.
    • Symbol reference
      Symbolic references are all the names we define:
      1. Fully qualified names for classes and interfaces
      2. The name and descriptor of the field
      3. The name and descriptor of the method

2. Characteristics of the constant pool
    • Constant pool length is not fixed
      The size of a constant pool is not fixed, so the constant pool starts with an unsigned number of type U2, which is used to store the capacity of the current constant pool. The JVM knows the ends of a constant pool based on this value.
      Note: This value starts at 1, and if 5 indicates that there are 4 constants in the pool.
    • Constants in a constant pool are represented by a table
      Constant pool at the beginning of a constant pool capacity counter, followed by a constant, but the constants are composed of a two-dimensional table, in addition to record the value of the constant, but also record the current constant of the relevant information.
    • Chang is a resource repository for class files
    • Chang is the most relevant part of the other part of this class
    • Chang is one of the most space-intensive parts of a class file

3. Types of constants in a constant pool

Just now, constants in a constant pool are broadly divided into: literal constants and symbolic references. On this basis, it can be subdivided into 14 constant types, depending on the data type of the constant. Each of the 14 constant types has its own two-dimensional representation structure. The first 1 bytes of each constant type are tag, which is used to indicate which of the 14 types the current constant belongs to.

Taking the Constant_class_info constant as an example, its two-dimensional representation structure is as follows:
Constant_class_info table:

type name Quantity
U1 Tag 1
U2 Name_index 1

The tag represents the type of the current constant (the current constant is constant_class_info, so the value of tag should be 7, which represents the fully qualified name of a class or interface);
Name_index represents the location of the fully qualified name of this class or interface. Its value represents the number of constants that point to a constant pool. It points to a constant of type constant_utf8_info, and its two-dimensional table structure is as follows:
Constant_utf8_info table:

type name Quantity
U1 Tag 1
U2 Length 1
U1 bytes Length

Constant_utf8_info represents a string constant;
The tag represents the type of the current constant, which should be 1;
The length of this string indicates
Bytes the contents of this string (using the abbreviated UTF8 encoding)

Q: Why does the class and variable name defined in Java have to be less than 64K?
Names of classes, interfaces, variables, and so on are all symbolic references, and they are stored in a constant pool. Regardless of which symbol references, their names are represented by constants of type Constant_utf8_info, and constants of this type use U2 to store the length of the string. Since 2 bytes can represent a maximum of 65,535 digits, the maximum length of these names can be up to 64K.

Q: What is UTF-8 encoding? What is the abbreviated UTF-8 encoding?
The former uses 3 bytes per character, while the latter represents 128 askii codes in 1 bytes, some characters are expressed in 2 bytes, and some characters are expressed in 3 bytes.

class file Composition 4: Access flag

After the constant pool is a 2-byte access flag. The access flag is used to indicate whether the class file is an interface, a public modification, an abstract modification, a final decoration, and so on.
Since these flags are represented by Yes/no, they can be expressed in 0/1.
The access flag is 2 bytes and can represent a 16-bit flag, but the JVM currently only defines 8, undefined direct write 0.

class file Composition 5: Classes index, parent class index, interface index collection

The class index, the parent class index, and the interface index collection are used to represent the names of the classes represented by the current class file, the parent class name, and the interface names.
They are arranged sequentially, the class index and the parent index each use an unsigned constant of type U2, which points to a constant of type Constant_class_info, the bytes field of the constant that records the fully qualified name of this class, the parent class.
Because there may be multiple interfaces for a class, it is necessary to use a collection to represent the interface index, after the class index and the parent class index. The first two bytes of this collection represent the length of the interface index collection, followed by the name index of the interface.

Class file Composition 6: The collection of field tables 1. What is a collection of field tables?

Next is a collection of field tables. The Field table collection is used to store the member variables involved in this class, including instance variables and class variables, but not local variables in the method.
Each field table represents only one member variable, and all of the member variables in this class form a collection of field tables.

2. Definition of the Field table structure
name quantity
u2 access_flags 1
u2 name_index 1
u2 descriptor_index 1
u2 attributes_count 1
attribute_info attributes at Tributes_count
    • Access_flags
      The access flag for the field. In Java, each member variable has a series of modifiers, just like the access flags of the class file above, except that the access flags of the member variables differ slightly from the access flags of the class.
    • Name_index
      The index of the name of this field. A constant that points to a constant_class_info type, which stores information such as the name of the field.
    • Descriptor_index
      Descriptor. Information describing the data type of this field in Java (detailed below)
    • Attributes_count
      The length of the property sheet collection.
    • Attributes
      A collection of property sheets. To Descriptor_index is the fixed information of the field table, the information above may not completely describe a field, so use the attribute table collection to hold additional information, such as the value of a field. (details will be described below)
3. What is a descriptor?

member variables, including static member variables and instance variables, and methods have their own descriptors.
For a field, the descriptor is used to describe the data type of the field;
For methods, descriptors are used to describe the data type of a field, the parameter list, and the return value.

In the descriptor, the base data type is represented in uppercase letters, the object type is represented by the fully qualified name of the "L object Type", and the array is denoted by "[Fully qualified name of the array type]."
When describing a method, place the parameter according to the above rule in (), and the right side of () puts the return value in the above method. Also, there is no need for any symbols between the parameters.

4. Note Points for the collection of field tables
    1. A class file cannot appear in the Field table collection from the parent class/interface to inherit from the field;
    2. A field in a class file may appear in a field table set of fields that are not defined by the program ape
      For example, the compiler automatically adds a member variable of an external class object to the Field table collection in the class file of the inner class for the inner class to access the external class.
    3. Java cannot be compiled with the same name as two fields. In the JVM specification, however, two fields are allowed to have the same name but different descriptors, and they are considered to be two different fields.

class file Composition 7: Collection of method tables

In the class file, all methods are stored as two-dimensional tables, each representing a function, and all the methods in a class form a collection of method tables.
The structure of the method table is consistent with the structure of the field table, except that the options for access flags and property sheet collections are different.

type name Quantity
U2 Access_flags 1
U2 Name_index 1
U2 Descriptor_index 1
U2 Attributes_count 1
Attribute_info Attributes Attributes_count

There is a Code Property table in the property sheet collection of the method table that stores the bytecode instructions that the current method has been compiled by the compiler.

Note points for method table collections
    1. If this class does not have a method to override the parent class, then the method table in this class file will not appear in the parent/parent interface.
    2. The method table collection of this class may appear the method that the program ape does not define
      At compile time, the compiler adds class constructors and instance constructors to the collection of method tables in the class file.
    3. Overloading a method requires the same simple name and different signature signatures. The characteristics signature of the JVM differs from that of the Java feature signature:
      • Java Signature: A collection of field symbol references in a constant pool of method parameters
      • JVM Signature: Method parameter + return value
class file Composition 8: Collection of attribute tables

Deep understanding of the JVM (vii)--CLASS file structure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.