Deep understanding of JVM Reading notes: class file structure

Source: Internet
Author: User
Tags deprecated field table stack pop uppercase character

A class file is a set of binary streams that are based on 8-byte units. Using a structure similar to the C language struct to store data, there are only two types of data: unsigned number and table. The basic data type of the unsigned number data, with U1, U2, U4, U8 representing 1, 2, 4, 8 bytes of unsigned number, used to describe the number, index reference, quantity value or UTF-8 encoded string, table is composed of unsigned tree and other table compound data type, with _info suffix. The entire class file is essentially a single table:
parse the meaning of each data item in class file: Magic numberThe first 4 bytes areMagic number Magic numbers, the only function is to identify whether the file can be accepted by the virtual machine. Version numberImmediately after the 4 bytes of the class fileVersion number, 5, 6-bit This version number, 7, 8-bit major version number, as follows:
Constant PoolAfter the primary and secondary version number isconstant Pool Constant_poolPortal, which is the data type that is most associated with other items in the class file structure, and is the largest data that occupies the class file space. There are two main types of constants: literal and symbolic references. Literals such as text strings, final constants, and so on; Symbolic references include the following three types of constants: A. Fully qualified name of the class and Interface B. Field name and Descriptor C. Method name and descriptor because Java code does not have a "connect" step at Javac compile time, The class file that needs to be loaded in the virtual machine is dynamically connected, so the class file does not save the final memory layout information, it needs to get the corresponding symbol reference from the constant pool, and then parse and translate it into the specific memory address (class creation and dynamic connection contents, in the next section) when the class is created or run-time. personally, the role of symbolic references is to record the classes, fields, and methods of the file at compile time, which can be loaded when the JVM is running, when it is needed. Each item in a constant pool is a table, and each table starts with the first bit being a U1 type flag bit, meaning the following:
which
Length is the UTF-8 encoded string, followed by a length of UTF-8 slightly indented code(the difference between UTF-8 and normal UTF-8 encoding is that the abbreviated encoding of the characters from ' \u0001 ' to ' \u007f ' (equivalent to the ASCII code of 1~127) is represented by a byte, from ' \u0080 ' to ' \u07ff ' The abbreviated encoding of all the characters between them is expressed in two bytes, and the abbreviated encoding of all characters from ' \u0800 ' to ' \uffff ' is represented by the normal UTF-8 encoding rule using three bytes. )The length here is U2, which means that if a variable or method name that exceeds the 64KB lead character is exceeded, it will not compile, and the method, field, and so on in the class file need to refer to the Utf8_info type constant to describe the name. Represents a constant pool of all constant item structure tables

Access Flags (class)The 2 bytes after the Chang areaccess Flag (access_flags):
class index, parent class index, interface indexclass Index This_class, parent class index Super_class, are U2 type data, andInterface Index InterfacesA collection is a set of U2-type data sets that point to a symbolic reference to class_info in a constant pool, determined by three dataInheritance relationships for classes。 Field TableField TableUsed to describe an interface or a variable declared in a class. Fields include class-level variables and instance-level variables, but do not include local variables declared inside a method, as follows:
Where Access_flags is similar to Access_flags in a class, Name_index, Descriptor_index are references to constant pools, which represent field simple names and method descriptors.The descriptor rules are described here, first, the base data type and the void type are represented by an uppercase character, and the object type is represented by the L-plus object fully qualified name:
for the array type, each degree will be described with a pre-set "[" Character, "[[Ljava/lang/string;"] Represents string[][]. Describe the method using the first argument list after the return value order description, "([CII[CIII) I" to describe the int indexOf (char[],int, int, char[], int, int, int) "The attributes represents additional descriptive information. If you declare "final static int m = 123" Then there may be a constantvalue attribute pointing to vector 123.note here that fields inherited from the superclass or parent interface are not listed in the Field table collection, but may list fields that do not exist in the Java code, such as maintaining access to external classes in the inner class, and automatically adding fields to the external class instance; interface up declaration, inheritance down Declaration (interface differs from inheritance)
Method TableRightMethod TableThe description is almost identical to the description of the field, where the structure of the table is the same as the field table, which differs only in the options for the access flag and the property sheet collection.
Here, the code inside the method is compiled into bytecode instructions by the compiler and stored in a property named code in the collection of the method attribute table, which is described later. Similarly, if the parent class method is not overridden in a subclass, the method from the parent class does not appear in the Method table collection, and the method that the compiler automatically adds, typically such as class constructor <clinit> and instance constructor <init>, is described later.If you are overloading a method in addition to having the same simple name as the original method, you must also require a signature that is different from the original method, that is, the field symbol reference collection in the constant pool for each parameter in the method, and does not contain the return value. This is the Java language that relies only on the return value to be overloaded with an existing method. However, in the class file format is the bytecode level (front of the Java Code level), the method features also include the method return value and the exception table, so the 2 is not exactly the same method can also legally coexist with a class file. property sheetproperty sheet(Attribute_info) appears many times before, class files, field tables, method tables can carry their own set of property sheets to describe proprietary information. The property sheet collection limits are slightly looser, and the currently recognized properties of the property sheet implementations are as follows:
Where the property name is to be represented from the constant pool reference utf8_info, and the structure of the property value is the custom structure as follows:
Code PropertyThe Code of the program method weight is compiled and changed into a byte tag within the code attribute. The Code property appears in the property collection of the method table, and its structure if the method table has the Code property:
The first two are not discussed, Max_stack represents the maximum value of the operand stack (Operand Stacks) depth. The operand stack does not exceed the depth at any point in the execution of the method. The operating stack depth of the stack frame is allocated according to the virtual running line; Max_locals represents the storage space required for the local variable table, in slots, the slot is a local variable for the virtual machinethe smallest unit that is used to allocate memory. For data types that do not exceed 32 bits, the 1 slot,double, long, 64-bit data types require two slots. Slots in a local variable table can be reused. Code_length and code are used to store the compiled bytecode instructions,here code is U1, which means 8 bits 256, so you can express 256 instructions. The virtual machine specification already defines about 200 of the encoding directives that correspond. Also about Code_length although it is a U4 type length value, the virtual machine specification restricts a method that does not allow more than 65,535 bytecode instructions, which actually only uses the U2 length, which exceeds the compiler's rejection. The most important attribute of the code attribute class file is explained belowCode:
There are no local variables defined here, and no parameters are passed, locals=1,args_size=1 because this, any instance method can use this to point to the object described by the method, which is actually to turn the This keyword into a common method parameter access, The instance method in the virtual machine invocation is automatically passed in. The local variable table also leaves a slot to store the This reference. so change to static? The personal feeling is that args_size and locals are all 0. code is followed by an explicit exception-handling table, which does not have to exist:
These fields are represented as when the byte code in the START_PC line to END_PC (not included) has an exception of type Catch_type or its subclass (Catch_type points to a constant index of type class_info), then jumps to the HANDLER_PC line to continue processing. Catch_type is 0 and any situation needs to be shifted to handler_pc for processing. Exception PropertyProperty in the method table that is the same as the Code property. The function is to enumerate the exceptions that may be thrown in the method, as follows:
The method may throw number_of_exceptions, each of which is represented by a exception_index_table entry, and a constant index that points to the Class_info type Linenumbertable PropertyUsed to describe the corresponding relationship between the Java source line number and the bytecode line number, not the run must attribute,
Line_number_table is a collection of line_number_info, including Start_pc and Line_number two U2, the former is the bytecode line number, which is the Java source line number. Localvariabletable PropertyUsed to describe the relationship between a variable in a local variable table in a stack frame and a variable in a Java source code, not a run must attribute.
The Local_variable_info project represents a stack frame associated with a local variable of the source code, with the following structure:
The first two represent the byte-code offset and the range coverage length, followed by the Utf8_info Index, which represents the local variable name and descriptor, and index is the location of the local variable in the stack frame local variable table slot. SourceFile PropertyRecord the name of the source file that generated the class file. Optional, most class names and file names are identical, but there are exceptions
Where sourcefile is a constant index of type Utf8_info, the file name of the source file when the constant value. Constantvalue PropertyThe function is to notify the virtual machine to automatically assign a value to the static variable. For non-static type variables in the instance constructor <init>, for variables modified by the static keyword, the Sun Javac compiler is currently selected as: If you use the final Static and the data type is primitive, or java.lang.String generates the Constantvalue property to initialize, otherwise it is chosen to initialize in the class constructor <clinit> method.Constantvalue can only be limited to the base type and string, because the property's property value is only the index number of a constant pool, so other types cannot be supported. The structure is as follows:
Here we see constantvalue as the fixed-length property, so its attribute_length must be fixed to 2,constantvalue_index to represent a reference to a literal constant in the constant pool. Innerclasses PropertyUsed to record the association between an inner class and a host class, and if an inner class is defined in a class, the compiler will generate innerclasses properties for it and the inner classes it contains, structure
Number_of_classes represents the number of internal class information that needs to be recorded, and the information for each inner class is described by Inner_classes_info:
The first two points to the Class_info constant index, which represents the symbolic reference of the inner class and the host class, Inner_name_index represents the inner class name, and the anonymous class is 0. Inner_class_access_flags is an access flag for an inner class, similar to the access_flags of a class deprecated and Synthetic propertiesThese two properties have only the same and no difference without the concept of attribute values, deprecated means that a class, field, method is no longer recommended (annotations) Synthetic property means that this field or method is not directly generated by the Java source code, but is added by the compiler itself, You can also set their access flags in the acc_synthetic flag bit,all non-user code-generated classes, methods, and fields should be set at least synthetic property or acc_synthetic, except for the instance constructor and class constructor. Stackmaptable PropertyIt is a complex variable-length property that is located in the property sheet of the Code property. The bytecode validation phase that is loaded by the virtual machine class is used by the new type check validator (type Checker) in place of the data stream analysis-based type that previously compared the consumption performance to the validator. (Not much to delve into, to study when needed) Signature PropertyThe fixed-length property, which can appear in the property sheet of the class, Field table, and method table structure, JDK1.5 any class, interface, initialization method, or member's generic signature if it contains a type variable (type Variables) or a parameterized type (parameterized Types). The Signure property records the generic signature information for it.The use of such a property to record a generic type is due to the fact that the Java language generics are implemented with erasure (that is, type information that cannot be obtained by using generics), and in bytecode (code attribute), the generic information is erased after compilation, the benefit is simple, The runtime can save some types of memory space. The disadvantage is that generic types cannot be treated the same as user-defined generic types. So signature specifically to compensate for this flaw, the Java Reflection API can get generics, and the final source of data is this attribute. here parameterized types refers to the types of parameterization, such as arraylist<integer>, arraylist<e>, arraylist<? extends number> While type variable refers to a type variable, such as E in Arraylist<e>. The signature structure is as follows:
Where the Signature_index value must be a valid index to a constant pool and must be a utf8_info structure that represents a class signature, a method type signature, or a field type signature, depending on which table property the current signature property is. Bootstrapmethods PropertyThe complex edge length property, which is located in the property sheet of the class file. The boot method qualifier used to hold the invokedynamic instruction reference. If the Invokedynamic_info type constant has been present in a constant pool of a class file structure, then the attribute table in the class file must have an explicit bootstrapmethods attribute. In addition, even if multiple invokedynamic_info occur in a constant pool, there is at most one bootstrapmethods attribute in the attribute table. This property is closely related to invokedynamic execution and Java.lang.Invoke package, which is said later. The structure is as follows:
Bootstrap_method as follows:
Bootstrap_method array Each member contains an index value that points to a constant pool methodhandle structure, which represents a bootstrap method and also contains a sequence of method static parameters. Where Bootstrap_method_ref is a valid index that points to Methodhandle_info, bootstrap_arguments is a parameter variable and must be one of the following structures of the constant pool: String, Class, Integer, Long, Float, Double, Methodhandle, Methodtype
Introduction to bytecode directivesbecause the virtual machine uses a schema that targets the operand stack instead of the register, most directives do not contain operands, only one opcode,Because the opcode length is limited to 1 bytes (that is, 0~255), the disadvantage is that there is a limit to the total number of opcode, and because the compile-time is not aligned with the operand length, it is necessary to reconstruct the specific data structure when the data is more than 1 bytes. If 16 length unsigned number means: (byte1 << 8) | The disadvantage of Byte2 is that some performance is lost when executing bytecode, but the advantage is that it brings high transfer efficiency. If exception handling is not considered, then the virtual machine interpreter can use the following model to understand:
byte code and data ModelBecause the opcode has only one byte length, not every data type and each operation has a corresponding instruction. There are separate instructions that can be used to convert some unsupported types to a supported type when necessary. Enumerates the bytecode directives supported by the JVM for data types, replaces T in the instruction template of the opcode column with the special characters represented by the data type column, and can be given a specific bytecode directive, which, if empty, indicates that the JVM does not support this type of data execution. You can see that most of them do not support integer types, byte, char, short, or Boolean, and the compiler will extend these types to the corresponding int types at compile time or runtime.

Here, the bytecode operation is broadly divided into 9 categories by use:loading and storing, operation instruction, type conversion, object creation and access, operand stack management, control transfer, method invocation and return, exception handling, synchronization. Here is a brief description, detailed requirements for additional inspection specifications Loading and storage:Tload loads a local variable into the Operation stack. Tstore stores a number from the operand stack to a local variable table. Tconst loads a constant into the operand stack wide The access index of the local variable table. Operation:Tadd, Tsub, Tmul, Tdiv, Trem, Tneg, tshl|r displacement, Tor or, Tand, Txor xor, Iinc self-increment, tcmpg comparison. Type conversions:Used to resolve an issue where the display type conversion in user code or the data type directive in the bytecode instruction set mentioned earlier cannot correspond to data type one by one. Small to large stable security conversion, no instruction, big turn small need instructions such as i2b,i2c,i2s,l2i,f2i,f2l,d2i,d2l,d2f.floating point to integer rule:1. When floating point is Nan, the result is 0.2. The floating-point non-infinity is rounded to 0 and is within the shape representation range, which is 3. Otherwise the transition is represented by a maximum or minimum positive number. object creation and access directivesNew creates class instance directives NewArray, Anewarray, Multianewarray create array directives access class fields and instance fields GetField, Putfield, Getstatic, Putstatic loading an array element to the operand stack taload from the value of the operand stack to the array element Tastore take the array length arraylength check the class instance type instanceof, Checkcast operand stack Management directivesThe stack top 1 or 2 elements out of the stack pop, POP2 copy stack top one or two values and copy value or double copy value re-press into the top of the stack: DUP, dup2, dup_x1, dup2_x1, dup_x2, dup2_x2 the top two values of the stack are swapped: Swap Control Transfer DirectivesHere is a brief description of the value of the PC register, which can be broadly understood as conditional or unconditional modification. method calls and return directiveswill be explained in detail later: The invokevirtual directive is used to invoke an instance method of an object, which is most common depending on the actual type of object being dispatched. The invokeinterface invokes the interface method, and the runtime searches for an object that implements this interface method to find the appropriate method call Invokespecial The instance method that requires special handling, including instance initialization, The private method and the parent class method Invokestatic Call the static method invokedynamic at run time to dynamically parse out the method referenced by the call Point qualifier, the first 4 are cured within the JVAV virtual machine, this logic is the user set the boot method. The call instruction is independent of the data type, and the return instruction is distinguished by the return value type, Xreturn, and return for void, instance initialization, and class initialization methods for classes and interfaces. Exception Handling CommandsAthrow explicitly throws an exception. Here the catch handling exception is accomplished using the exception table instead of the bytecode directive. Synchronization InstructionsThe JVM can support method-level synchronization and synchronization of a sequence of instructions within a method, both of which are supported using enhancement (monitor). Method-level synchronization is implicit and does not need to be controlled by bytecode directives. The virtual machine can determine whether it is a synchronous method from the Acc_synchronized access flag. when the method is called, the calling instruction first checks the access flag, then executes the line routines requires the first to hold the pipe, then executes the method, and then releases the pipe after completion. Of course, no other thread can get the same pipe at any other time. If an exception is thrown during the execution of a synchronization method and the exception is not handled internally, the method holds a process that is automatically released outside of the synchronization method. The JVM instruction set has monitorenter and monitorexit two instructions to support the Synchronized keyword. The compiler guarantees that, regardless of the way it is done, each monitorenter must execute the corresponding monitorexit, whether it is the normal end or the end of the exception.
public design and private implementationsMy understanding of this paragraph is that the class file is designed to define the structure and differentiate it from the implementation of the underlying hardware, system, and virtual machines, ensuring that different virtual machines are implemented using the Uniform class bytecode file specification on different platforms. The virtual machines are also encouraged to optimize their internal implementations in accordance with the specifications. The virtual machine is implemented in the following two ways: translating the input Java Virtual machine code into an instruction set of another virtual machine when it is loaded or executed. Translates the input Java Virtual machine code into the local instruction set of the Cheng host CPU (that is, JIT code generation technology) when it is loaded or executed.

Deep understanding of JVM Reading notes: class file structure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.