How to implement a Java Class parser

Source: Internet
Author: User

Original source: Tinylcy

Recently writing a private project, the name Classanalyzer,classanalyzer is designed to give us a deep understanding of the design and structure of Java class files. The main frame and basic functions have been completed, and some details have been added later. The JDK has actually provided the command-line tool JAVAP to decompile the class file, but this article will illustrate my idea of implementing the parser.

class file

As a carrier of class or interface information, each class file has a complete definition of a category. In order for Java programs to "write once, run everywhere", the Java Virtual Machine specification has strict rules for class files. The basic data units that make up the class file are bytes, and there are no delimiters between the bytes, which makes the content stored in the entire class file almost all the necessary data for the program to run, and the data that cannot be represented by a single byte is represented by multiple contiguous bytes.

According to the Java Virtual Machine specification, the class file stores data in a pseudo-structure similar to the C language structure, with only two data types: unsigned number and table. The Java Virtual Machine specification defines U1, U2, U4, and U8 to represent 1-byte, 2-byte, 4-byte, and 8-byte unsigned numbers, and the unsigned number can be used to describe numbers, index references, quantity values, or strings. A table is a composite data type consisting of multiple unsigned numbers or other tables as data items, and tables are used to describe the data of a hierarchical composite structure, so the entire class file is essentially a table. In Classanalyzer, Byte, short, int, and long correspond to the U1, U2, U4, and U8 data types, and the class file is described as the following Java class.

123456789101112131415161718 public class ClassFile {    public U4 magic;                            // magic    public U2 minorVersion;                     // minor_version    public U2 majorVersion;                     // major_version    public U2 constantPoolCount;                // constant_pool_count    public ConstantPoolInfo[] cpInfo;           // cp_info    public U2 accessFlags;                      // access_flags    public U2 thisClass;                        // this_class    public U2 superClass;                       // super_class    public U2 interfacesCount;                  // interfaces_count    public U2[] interfaces;                     // interfaces    public U2 fieldsCount;                      // fields_count    public FieldInfo[] fields;                  // fields    public U2 methodsCount;                     // methods_count    public MethodInfo[] methods;                // methods    public U2 attributesCount;                  // attributes_count    public BasicAttributeInfo[] attributes;     // attributes}
How to Parse

The individual data items that make up the class file, such as the magic number, the version of the class file, the access flag, the class index, and the parent class index, occupy a fixed number of bytes in each class file and only need to read the corresponding number of bytes when parsing. In addition, there are 4 parts that need to be handled flexibly: constant pool, Field table collection, Method table collection, and property sheet collection. Fields and methods can have their own properties, and the class itself has corresponding properties, so the parsing of the collection of field tables and method tables also includes the parsing of the set of property tables.

A constant pool occupies a large portion of the class file to store all the constant information, including numeric and string constants, class names, interface names, field names, method names, and so on. The Java Virtual Machine specification defines a number of constant types, each of which has its own structure. The constant pool itself is a table, and there are a few things to be aware of when parsing.

    • Each constant type is identified by a tag of type U1.
    • The constant pool size given by the header (Constantpoolcount) is 1 larger than the actual value, for example, if Constantpoolcount equals 47, then there are 46 constants in the constant pool.
    • The index range for a constant pool starts at 1, for example, if Constantpoolcount equals 47, the index range for the constant pool is 1 ~ 46. The purpose of the designer to empty the No. 0 item is to express "no reference to any constant pool item".
    • If an item in a constant_long_info or CONSTANT_DOUBLE_INFO structure has an index of n in a constant pool, the index of the next valid item in the constant pool is n+2, at which point the item indexed to n+1 in the constant pool is valid but must be considered unavailable.
    • The structure of the Constant_utf8_info constant consists of a U1 type of tag, a length of U2 type, and a bytes of length U1 type, which is a continuous data of length byte is a use MUTF-8 (Modified UTF-8) the encoded string. MUTF-8 and UTF-8 are not compatible, the main difference is two points: first, the null character will be encoded into 2 bytes (0xc0 and 0x80), and the second is the supplementary character is divided into agents according to UTF-16 separate code, the relevant details can be seen here (variant UTF-8).

The property sheet is used to describe certain scenarios that are proprietary to the class file, the field table, and the method table with the corresponding set of property sheets. The Java Virtual Machine specification defines a variety of properties, and Classanalyzer currently implements the parsing of common properties. Unlike data items of a constant type, properties do not have a tag to identify the type of the property, but each property contains a U2 type of attribute_name_index,attribute_name_index that points to a constant_ in the constant pool. A constant of type utf8_info that contains the name of the property. When parsing a property, Classanalyzer the type of the property by the property name corresponding to the constant that Attribute_name_index points to.

A field table is used to describe variables declared in a class or interface, including class-level variables and instance-level variables. The structure of the field table contains a U2 type of Access_flags, a U2 type of Name_index, a U2 type of Descriptor_index, a U2 type of Attributes_count, and Attributes_ Count of attribute_info types of attributes. We have introduced the parsing of the attribute table, and the parsing method of attributes is consistent with the parsing of the attribute table.

The class's File method table uses the same storage format as the field table, except that the access_flags corresponds to a different meaning. The method table contains an important attribute: the Code property. The code attribute stores the bytecode instructions compiled by the Java code, and in Classanalyzer, the Java class corresponding to code is shown below (only the class attributes are listed).

123456789101112131415161718 public class Code extends BasicAttributeInfo {    private short maxStack;    private short maxLocals;    private long codeLength;    private byte[] code;    private short exceptionTableLength;    private ExceptionInfo[] exceptionTable;    private short attributesCount;    private BasicAttributeInfo[] attributes;    ...    private class ExceptionInfo {        public short startPc;        public short endPc;        public short handlerPc;        public short catchType;        ...    }}

In the code attribute, Codelength and code are used to store byte code lengths and bytecode instructions, each of which is one byte (u1 type). When the virtual machine executes, it reads the bytecode in code and translates the bytecode into the corresponding instructions. In addition, although codelength is a value of type U4, in practice a method does not allow more than 65,535 bytecode instructions.

Code implementation

Classanalyzer's source code has been placed on GitHub. In the Classanalyzer of the Readme, I take a class file as an example, the class file of each byte is analyzed, I hope that everyone's understanding to help.

Reference

Deep understanding of Java virtual machines

From:http://www.importnew.com/25302.html

How to implement a Java Class parser

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.