How to implement a Java class Byte parser (version Golang)

Source: Internet
Author: User
Tags field table
This is a creation in Article, where the information may have evolved or changed.

Recently writing a private project, named SMALLVM, is SmallVM designed to Java deepen the Java knowledge and understanding of virtual machines by implementing a lightweight virtual machine. In the Java Virtual machine load class process, need to Class parse the file, I have implemented a single Java version of the Class byte parser Classanalyzer, compared to the Java version, the new version ( Golang version) more robust, more clear thinking. This paper describes my Class idea of implementing a byte parser.

class file

As a carrier of class or interface information, each Class file defines a class in its entirety. In order for the Java program to "write once, run everywhere", the Java Virtual Machine specification makes Class strict rules about the file. Classthe basic unit of data that makes up a file is bytes, and there is no delimiter between those bytes, which makes the content stored in the entire Class file almost all the data necessary for the program to run, and the data that cannot be represented by a single byte is represented by multiple contiguous bytes.

According to the Java Virtual machine specification, the Class file stores data in a pseudo-structure similar to the C language structure, with only two data types: unsigned number and table. The Java Virtual machine specification defines U1 , U2 , U4 , and U8 to represent 1 bytes respectively , 2 bytes, 4 bytes, and 8 bytes of unsigned numbers, the unsigned number can be used to describe a number, an index reference, a quantity value, or a string. A table is a composite data type that is composed of multiple unsigned numbers or other tables as data items, and tables are used to describe the data of a hierarchical composite structure, so the entire Class file is essentially a table. In Classanalyzer , U1 , U2 , U4 , and U8 correspond to byte , short , int , and long , the class file is described as the following Java class.

public class Classfile {public U4 magic;                     Magic public U2 minorversion;                     Minor_version public U2 MajorVersion;                Major_version public U2 Constantpoolcount;           Constant_pool_count public constantpoolinfo[] Cpinfo;                      Cp_info public U2 AccessFlags;                        Access_flags public U2 ThisClass;                       This_class public U2 superclass;                  Super_class public U2 Interfacescount;                     Interfaces_count public u2[] interfaces;                      Interfaces public U2 Fieldscount;                  Fields_count public fieldinfo[] fields;                     Fields public U2 Methodscount;                Methods_count public methodinfo[] methods;                  Methods public U2 Attributescount; Attributes_count Public basicattributeinfo[] AttributeS Attributes

How to Parse

Classeach data item that makes up a file, such as a magic number, Class a file's version, an Access flag, a class index, and an index of a parent class, Class consumes a fixed number of bytes in each file and only needs to read the corresponding number of bytes when parsing. In addition, the main parts that need to be handled flexibly include 4 : constant pool, Field table collection, Method table collection, and property sheet collection. Fields and methods can have their own properties, and they also have corresponding properties, so the parsing of the table Class of fields and the collection of method tables also includes the parsing of the attribute table.

A constant pool occupies a Class large portion of the file's data and is used to store all constant information, including numeric and string constants, class names, interface names, field names, and method names. The Java virtual machine specification defines a number of constant types, each of which has its own structure. The constant pool itself is a table, and there are a few things to be aware of when parsing.

  • Each constant type is identified by a u1 type tag .

  • The constant pool size given by the header is constantPoolCount larger than the actual, 1 for example, if constantPoolCount equal 47 , there is 46 an item constant in the constant pool.

  • The index range of a constant pool 1 starts from, for example, if constantPoolCount equal 47 , the index range of the constant pool 1~46 . The designer 0 vacated the item for the purpose of expressing "no reference to a constant pool item".

  • If the index of one or struct item in a constant pool is the index of the next valid item in the constant pool CONSTANT_Long_info CONSTANT_Double_info , the n n+2 item indexed in the constant pool is n+1 valid but must be considered unavailable.

  • CONSTANT_Utf8_infoThe structure of a type constant consists of a type, a type, u1 tag u2 length and length a u1 type bytes , and this length byte of continuous data is a used MUTF-8 ( Modified UTF-8) encoded string. MUTF-8and UTF-8 incompatible, the main difference is two points: first null , the characters are encoded into 2 bytes ( 0xC0 and), and the 0x80 second is that the supplementary characters are UTF-16 coded separately according to the split agent, and the relevant details can be seen here (variant UTF-8).

Property sheets are used to describe certain scenarios that are proprietary to information, and Class the file, Field table, and method tables have corresponding sets of property tables. The Java virtual machine specification defines a variety of properties, and SmallVM currently implements the parsing of common properties. Unlike data items of constant type, properties do not have a tag type that identifies a property, but each property contains a constant of a u2 type attribute_name_index that attribute_name_index points to a constant in the pool of constants CONSTANT_Utf8_info that contains the name of the property. When you parse a property, you know the type of the SmallVM attribute_name_index property by pointing to the property name of the constant.

A field table is used to describe variables declared in a class or interface, including class-level variables and instance-level variables. The structure of a field table contains one type, one type, u2 access_flags u2 name_index one u2 type descriptor_index , one type, u2 attributes_count and attributes_count attribute_info attributes one type. We have introduced the parsing of the attribute table, which is resolved in the attributes same way as the attribute table.

ClassThe File method table uses the same storage format as the field table, but access_flags the corresponding meanings are different. The method table contains an important property: a Code property. CodeProperty Stores Java code-compiled bytecode directives, in ClassAnalyzer which the Code corresponding classes are Java shown below (only class properties are listed).

type Code struct {    pool                 []constantpool.ConstantInfo    attributeNameIndex   uint16    attributeLength      uint32    maxStack             uint16    maxLocals            uint16    codeLength           uint32    code                 []byte    exceptionTableLength uint16    exceptionTable       []ExceptionInfo    attributesCount      uint16    attributes           []AttributeInfo}type ExceptionInfo struct {    startPc   uint16    endPc     uint16    handlerPc uint16    catchType uint16}

In Code attributes, codeLength and code respectively, for storing bytecode lengths and bytecode instructions, each instruction is a byte ( u1 type). When the virtual machine executes, it reads code the bytecode in the code and translates the bytecode into the corresponding instructions. In addition, although codeLength it is a u4 type of value, in fact a method does not allow more than a 65535 bytecode instruction.

Code implementation

The source of the entire Class byte parser has been placed on GitHub, the byte parser is just SmallVM a small module, corresponding to the directory src/classfile . In addition, you can refer to Classanalyzer's readme, I take a class of Class files as an example, the Class file of each byte is analyzed, I hope that everyone's understanding to help.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.