"Deep Java Virtual machine" bis: Class file structure

Source: Internet
Author: User
Tags deprecated field table

Platform agnostic

Java is a platform-independent language, thanks to the Java source code-compiled file that stores bytecode, the class file, and the implementation of the Java Virtual machine. Not only the Java compiler can compile Java code into a class file that stores bytecode, and compilers of other languages, such as JRuby, can also compile program code into class files, and the virtual machine does not care what language the source of the class is, as long as it conforms to a certain structure. You can run it in Java. The semantics of various variables, keywords, and operators in the Java language are ultimately composed of multiple bytecode commands, so the ability of the bytecode command to provide a semantic description is certainly more powerful than the Java language itself, which provides the basis for other languages to implement some language features that are different from Java. And that's why it's safe to validate when classes are loaded.

class file Structure

A class file is a set of binary streams that are based on 8-bit bytes, and each data item is tightly arranged in a class file in a compact order, with no delimiters added, which makes almost all of the content stored in the entire class file a necessary data for the program to run. According to the Java Virtual Machine specification, the class file format is stored in a pseudo-structure similar to the C language structure, with only two data types: unsigned number and table. The unsigned number is the base data type, with U1, U2, U4, U8 representing 1, 2, 4, and 8 bytes of unsigned numbers, respectively. A table is a conforming data type consisting of multiple unsigned numbers or other tables as data items, and all tables habitually end with "_info".

The entire class file is essentially a table that consists of data items like the one shown below.

As can be seen from the table, whether it is an unsigned number or a table, when it is necessary to describe the same type but a variable number of multiple data, often using a pre-capacity counter plus a number of consecutive data items in the form, said the series of continuous touch a type of data for a certain type of collection, for example, Fields_ The Count of Field_info table data forms a collection of field tables. It is important to note that the data items in the class file are strictly qualified according to the order and quantity in the above table, and the meaning, length and order of each byte are not allowed to change.

The following table lists the specific meanings of the individual data items in the class file:

As can be seen from the table, whether it is an unsigned number or a table, when it is necessary to describe the same type but a variable number of multiple data, often in front of it with a preceding capacity counter to record its number, and then followed by a number of consecutive data items, said that a series of a continuous type of data for a certain type of collection, such as: fields_count field_info table data is composed of a collection of method tables. It is important to note that the data items in the class file are strictly qualified according to the order and quantity of the above table, and the meaning, length and order of each byte are not allowed to change.

Magic and version

The first 4 bytes of each class file are called Magic Numbers (Magic), and its sole purpose is to determine whether the file is a class file that can be accepted by the virtual machine. Its value is fixed at 0xCAFEBABE. The 4 bytes of the magic immediately store the minor and major version numbers of the class file, and the higher versions of the JDK can be backwards compatible with the lower version of the class file, but not the higher version of the class file.

Constant_pool

Major_version is followed by the entrance to the constant pool (constant_pool), which is the data type that is most associated with other items in the class file, and is one of the largest data items in the class file space.

There are two main types of constants in a constant pool: literal and symbolic references. Literal comparisons are similar to the concept of constants at the Java level, such as text strings, constant values declared final, and so on. The symbolic references, in summary, include the following three types of constants:

    • The fully qualified name of the class and interface (that is, the class name with the package name, such as: Org.lxh.test.TestClass)
    • The name and descriptor of the field (private, static, and other descriptors)
    • The name and descriptor of the method (private, static, and other descriptors)

The virtual machine does not dynamically connect when the class file is loaded-that is, the final memory layout information for each method and field is not saved in the class file, so the symbolic references to these fields and methods cannot be directly used by the virtual machine without conversion. When the virtual runtime is running, the corresponding symbolic reference needs to be obtained from the constant pool, which is then replaced with a direct reference and translated into a specific memory address during the parsing phase of the class loading process.

Here is a description of the differences and associations between symbolic references and direct references:

    • Symbol Reference: A symbol reference is a set of symbols to describe the referenced target, the symbol can be any form of the literal, as long as the use can be used without ambiguity to locate the target. The symbolic reference is independent of the memory layout implemented by the virtual machine, and the referenced target is not necessarily loaded into memory.
    • Direct reference: A direct reference can be a pointer to a target directly, a relative offset, or a handle that can be indirectly anchored to the target. A direct reference is related to the memory layout implemented by the virtual machine, and a direct reference that is translated on a different virtual machine instance by the same symbolic reference will not generally be the same. If you have a direct reference, it means that the target of the reference must already exist in memory.

Every constant in a constant pool is a table with 11 different table structure data (before JDK1.7), and the first bit of the table starting is a U1 type flag bit (1-12, missing 2), representing the constant type that the current constant belongs to. The specific meanings represented by the 11 constant types are shown in the following table:

Each of the 11 constant types has its own structure. There is an Name_index property in the structure of the constant_class_info constant that holds an index value that points to a constant in the constant pool of a constant_utf8_info type that holds the fully qualified name string of the class. There is an index attribute in the structure of the Constant_fieldref_info, Constant_methodref_info, and Constant_interfacemethodref_info constants. The index entry that holds the descriptor Constant_class_info of the class or interface to which the field or method belongs. In addition, the final saved string such as class name, field name, method name, modifier, etc. is a constant of type constant_utf8_info, and therefore, the maximum length of the method and field names in Java is also the maximum length of the Constant_utf8_info type constant. There is a length property in the structure of the Constant_utf8_info constant, which is the U2 type, which occupies 2 bytes, then its maximum length is 65535. Therefore, if you define a variable or method name that exceeds 64KB English characters in a Java program, you will not be able to compile it.

The following table shows the structure of 11 types of data in a constant pool:

Constant

Project

Type

Describe

Constant_utf8_info

Tag

U1

Value of 1

Length

U2

The number of bytes consumed by the UF-8 encoded string

bytes

U1

UTF-8 encoded string of length

Constant_integer_info

Tag

U1

Value of 3

bytes

U4

int value stored in front of high

Constant_float_info

Tag

U1

Value of 4

bytes

U4

Float value stored in front of high

Constant_long_info

Tag

U1

Value of 5

bytes

U8

Long value stored in front of high

Constant_double_info

Tag

U1

Value of 6

bytes

U8

Double value stored in front of high

Constant_class_info

Tag

U1

Value of 7

Index

U2

Index to the fully qualified name constant entry

Constant_string_info

Tag

U1

Value of 8

Index

U2

Index that points to the string literal

Constant_fieldref_info

Tag

U1

Value of 9

Index

U2

The index entry that points to the class or interface descriptor Constant_class_info of the declaring field

Index

U2

Index entry that points to the field name and the type descriptor Constant_nameandtype_info

Constant_methodref_info

Tag

U1

Value of 10

Index

U2

The index entry of the class descriptor constant_class_info that points to the declaring method

Index

U2

Index entry that points to the method name and the type descriptor Constant_nameandtype_info

Constant_inrerfacemethodref_info

Tag

U1

Value of 11

Index

U2

Index entry that points to the interface descriptor Constant_class_info of the declaring method

Index

U2

Index entry that points to the method name and the type descriptor Constant_nameandtype_info

Constant_nameandtype_info

Tag

U1

Value of 12

Index

U2

Index to a field or method name constant item

Index

U2

Index that points to the field or method description constant item

Access_flag

After the constant pool ends, the next 2 bytes represent the access flag (Access_flag), which identifies the access information at some class or interface level, including whether the class or interface is defined as the public type, the abstract type, and, if it is a class, is declared final, and so on. Each access information is represented by a hexadecimal flag value, and if there are multiple access information at the same time, the resulting flag value is the logical OR of the flag value for these kinds of access information.

This_class, Super_class, interfaces

Both the class index (This_class) and the parent class index (SUPER_CLASS) are data of a U2 type, and the interface index collection (interfaces) is a set of data sets of U2 type, which is determined by the three data in the class file. The class index, parent index, and interface index collection are sorted sequentially after the access flag, the class index and the parent class index are represented by index values of two U2 types, each pointing to a class description constant of type Comnstant_class_info. The fully qualified name string that is defined in a constant of type comnstant_utf8_info is found by the index value in the constant. The interface index collection is used to describe which interfaces this class implements, and the interfaces that are implemented will be sorted from left to right in the index collection of the interface, following the implements statement (if the class itself is an interface, then the Extend statement).

Fields

The Field table (Field_info) is used to describe the variables declared in an interface or class. A field includes a class-level variable or an instance-level variable, but does not include a variable declared within a method. Field names, data types, modifiers, and so on are not fixed and can only be described by constants in a constant pool. The following are the most popular formats for the field table:

The access_flags is similar to the Access_flagsfei in a class and is a modifier that represents a data type, such as public, static, volatile, and so on. The following name_index and Descriptor_index are references to constant pools, which represent the simple names of fields and the descriptors of fields and methods. Here is a brief explanation of the concepts of the three special strings of "simple name", "descriptor" and "fully qualified name".

As mentioned earlier, the fully qualified name refers to the full name of a thing, such as the fully qualified TestClass class under the Org.lxh.test package named: Org/lxh/test/testclass, which is the "." In the package name. Instead of "/", in order to make no confusion between successive fully qualified names, a "," is usually added to the end of the fully qualified name when used. A simple name is a method or field name that does not have a type or parameter decoration, and if a class has a method such as a Boolean get (int name) and a variable private final static int m, their simple names are get () and M respectively.

The function of the descriptor is to describe the data type of the field, the parameter list of the method (including quantity, type, order, etc.) and the return value. According to the descriptor rules, the meaning of the detailed descriptor label Word is as follows in the following table:

For an array type, each dimension will be described with a preceding "[" character, such as an array of integers "int [] []" will be recorded as "[[I]", and a String type array "string[]" will be recorded as "[Ljava/lang/string"

When describing a method with a method descriptor, the order of the returned values after the first parameter is described, and the parameters are placed in a set of parentheses in strict order, as in the method int GetIndex (String name,char[] tgc,int start,int end,char target) The descriptor is "(LJAVA/LANG/STRING[CIIC) I".

The field table contains a fixed data item that ends with Descriptor_index, but after it is followed by a collection of attribute tables for storing some additional information. For example, if there is a declaration of the following field in the class: Staticfinalint m = 2; There might be an attribute named Constantvalue that points to constant 2. Detailed information about Attribute_info is detailed in the following article about the property sheet.

One last thing to note: Fields inherited from the parent class or interface are not listed in the Field table collection, but it is possible to list fields that do not exist in the original Java code. For example, in an internal class to maintain access to an external class, a field that points to an instance of an external class is automatically added.

Methods

The structure of the method table (Method_info) is the same as the structure of the property sheet, but more. The Java code in the method, after compiling the compiler into bytecode instruction, is stored in a property named "Code" in the collection of the method attribute table, and the item about the attribute table is also described in detail later.

Relative to the Field table collection, method information from the parent class does not appear in the Method table collection if the parent method is not overwritten in the subclass. However, it is also possible to have a method that is automatically added by the compiler, most typically the class constructor "<clinit>" method and the instance constructor "<init>" method.

In the Java language, to overload a method, in addition to having the same simple name as the original method, requires a feature signature that differs from the original method, which is a collection of field symbol references in a method for each parameter in a constant pool, that is, because the return value is not included in the signature. So the Java language cannot overload an existing method simply by relying on the difference in the return value.

Attributes

The attribute table (Attribute_info), which has many previous lines, can carry its own set of property sheets in class files, field tables, and method tables to describe the information that is proprietary to certain scenarios.

The restrictions on the collection of property sheets are less stringent, no longer require that the individual property sheets be in strict order, and as long as they are not duplicated with the existing property names, anyone-implemented compilers can write their own defined property information to the property sheet, but the Java virtual runtime ignores properties that it does not recognize. The Java Virtual machine specification has predefined attributes that should be recognized by the 9 virtual machines (JDK1.5 adds some new features, so there are more than 9 items below, but the following 9 are the most basic and necessary, the most frequent), as shown in the following table:

For each property, its name needs to refer to a represented representation of a constant_utf8_info type from the constant pool, and the structure of each property value can be fully customizable, simply by stating the length of the number of bits that the property value occupies. An attribute table that conforms to a rule should have at least "Attribute_name_info", "Attribute_length", and at least one information property.

1) Code Property

As we have already said, the code in the Java program method body says that after Javac compilation, the generated bytecode instruction is stored in the code attribute, but not all method tables must have this attribute, such as the method in an interface or abstract class, there is no code attribute. If the method table has a code attribute present, its structure will look like the following table:

Attribute_name_index is an index that points to a Constant_utf8_info constant whose constant value is fixed as "Code", which represents the name of the property. Attribute_length indicates the length of the property value, because the property name index and the property length are altogether 6 bytes, so the length of the property value is fixed to the length of the entire property sheet minus 6 bytes.

Max_stack represents the maximum value of the operand stack depth, max_locals represents the storage space required for the local variable table, its unit is the slot, not the number of local variables used in the method, the sum of the slots of these local variables as the max_locals value, The reason is that slots in the local variable table can be reused.

Code_length and code are used to store bytecode instructions generated by the Java source program after compilation. Code is used to store a sequence of bytes of byte-code instructions, which is a single byte of the U1 type, so the value range is 0x00 to 0xFF, then altogether can express 256 instructions, currently, the Java Virtual Machine specification has defined the instruction meaning of 200 coded values. Code_length Although it is a U4 type length value, theoretically can reach 2^32-1, but the virtual machine specification limits a method does not allow more than 65,535 bytecode instruction, if the limit is exceeded, the Javac compiler will refuse to compile.

The bytecode directive is followed by an explicit exception-handling table collection (exception_table) for this method, which does not have to exist for the Code property. Its format is shown in the following table:

It contains four fields that have the meaning: if the bytecode is from line start_pc to line end_pc (without end_pc rows), an exception of type Catch_type or its subclasses is present (Catch_type to point to a constant_class_ An info-type constant is indexed), then go to line handler_pc to continue processing, and when the value of CATCH_PC is 0 o'clock, the exception that represents the person and is transferred to HANDLER_PC for processing. The exception table is actually part of the Java code, and the compiler uses the exception table instead of the simple jump command to implement the Java exception, which is the finally processing mechanism, and therefore, the contents of finally are executed before the return statement in the try or catch. And before a try or catch jumps to a finally, it copies the value of the variable that it needs to return internally to the slot in the last local table gauge, and hence the http://blog.csdn.net/ns_code/article/ details/17485221 the case that appears in this article.

The code attribute is the most important attribute in a class file, and if the information in a Java program is divided into two parts: code and metadata, the code attribute is used in the entire class file to describe the tag, and all other data items are used to describe the metadata.

2) Exception Property

The purpose of the exception property here is to enumerate the exceptions that might be thrown in the method, that is, the exception that is enumerated after the throws keyword in the method description. Its structure is very simple, only Attribute_name_index, Attribute_length, Number_of_exceptions, exception_index_table four items, literally easy to understand, here no longer detailed.

3) Linenumbertable Property

It is used to describe the correspondence between the Java source line number and the byte code line number.

4) Localvariabletable Property

It is used to describe the correspondence between variables in a local variable table in a stack frame and variables defined in Java source code.

5) SourceFile Property

It is used to record the name of the source file that generated the class file.

6) Constantvalue Property

The purpose of the Constantvalue property is to notify the virtual machine to automatically assign a value to a static variable, which can be used only by static modified variables. In Java, the assignment of a variable of a non-static type (that is, an instance variable) is done in the instance constructor <init> method, and for a class variable (static variable), there are two ways to choose: Assign a value in the class construct, Or use the Constantvalue property to assign a value.

The current choice of the Sun Javac compiler is to use both final and static to modify a variable (that is, the global constant), and the data type of the variable is the base type or string. The Constantvalue property is generated for initialization (compile-time Javac will generate the Constantvalue property for that constant, and the virtual machine will set the corresponding value for the constant during the class load preparation phase based on the Constantvalue). If the variable is not final decorated, or is not a base type and string, the selection is initialized in the <clinit> method.

Although the final keyword is more consistent with the meaning of "constantvalue", there is no mandatory requirement for the field to be final decorated in the virtual machine specification, only the field must be static decorated, and the final keyword requirement is the Javac compiler's own restrictions. Therefore, in a real program, only fields that are both final and static modified are Constantvalue properties. and the attribute value of Constantvalue is limited to the base type and string, which is obvious because it is only able to reference the literal of the base type and string type from the constant pool.

The following is a brief description of the difference between the final, static, and static final decorated field assignments:

    • The static modified field is initialized to a default value such as 0 or null during the preparation phase of the class loading process, and then in the initialization phase (the trigger class constructor <clinit>) is given the value set in the code, and if there is no set value, its value is the default value.
    • The final decorated field is initialized at run time (can be assigned directly or assigned in the instance constructor), once the assignment is not changed;
    • The static final decorated field generates the Constantvalue property when Javac, and assigns a value to the field in the preparation phase of the class load based on the value of Constantvalue, which does not have a default value and must be assigned explicitly, otherwise the JAVAC will be error-free. It can be understood that the result is placed in a constant pool at compile time.

7) Innerclasses Property

This property is used to record the association between the inner class and the host class. If an inner class is defined in a class, the compiler generates innerclasses properties for it and the inner classes it contains.

8) deprecated properties and synthetic properties

This property is used to represent a class, field, and method that has been designated by the program author as deprecated and can be set by using @deprecated annotations in code.

9) Synthetic Property

This property represents that this field or method is not generated directly by the Java source code, but is added by the compiler itself, such as the This field and the instance constructor, the class constructor, and so on.

Turn from 17675609

"Deep Java Virtual machine" bis: Class file structure

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.