Javac Compiler principle

Source: Internet
Author: User

    1. What is Javac?
Javac is a compiler that can speak a language specification into another language specification. such as c,c++ is the source code directly compiled into the target machine code, the target machine code is directly executed by the CPU instruction set. These instruction sets are the underlying language specification, and the machine is able to directly identify the language specification, but people cannot directly write the target machine code.
in a sense, the compiler has the programming language boom, because compilers are a link between human and machine communication. The Javac compiler also compiles Java, a very friendly programming language, into a language that is very friendly to all machines.
Javac's task is to compile Java code into Java bytecode, which is the binary code that the JVM can recognize. On the surface, turning the. java file into a. class file actually translates the Java source code into a binary number. 2.JAVAC Basic Structure
Source Code-- Lexical analyzer -- Token stream -- parser-- syntax tree -- Semantic parser -- Annotated syntax tree -- code generator -- byte code

1) Lexical analysis

Read the source code, read one byte at a time, find out which of these bytes are defined syntax keywords, such as if in Java, else, for, while and other keywords, to identify which if is a legitimate keyword, which is not.

Find some normalized token flow from the source code, just like in human language, give you a word to distinguish what is a word, which is punctuation, which is a verb, which is a noun, etc.

Scanner is responsible for the specific reading and collation of different lexical operations, to determine which character combination is a token. Javacparser defines which words are in line with the Java Language Specification: Package syntax, import syntax, class definition, field definition, method definition, variable definition, expression definition, and so on, each syntax expression ending with a semicolon.


2) syntax analysis

Parsing the token stream, checking that these keywords are grouped together is not in accordance with the Java language specification, such as if the following is not followed by a Boolean judgment expression. Just like the human language, is not the main predicate, the main predicate is not the right combination of the correct grammar is not correct.

To form an abstract syntax tree that conforms to the Java language Specification, the abstract syntax tree is a structured form of grammatical expression that organizes the main lexical forms of the language together in a structured form. This syntax tree can be re-organized behind us in accordance with the new rules.

The token stream is formed into a more structured syntax tree, in which words are assembled into a single sentence, a complete statement. Which words are grouped together is the subject, which is the predicate, which is the object, which is the attribute, to make a further distinction. Java syntax tree makes Java source code more structured: Each syntax tree node is an instance of Com.sun.tools.javac.tree.JCTree, ① each syntax node will implement an interface Xxxtree, which is inherited from Com.sun.source.tree . The tree interface, such as the Iftree syntax node, represents an If-type expression, and the Binarytree syntax node represents a two-dollar action expression, and so on ② Each syntax node is a subclass of Com.sum.tools.javac.tree.JCTree and implements the Xxxtree interface class, the class name of this class is similar to jcxxx, such as implementing Iftree interface implementation Class JCIF, implementing Binarytree interface class Jcbinar y etc; ③ all jcxxx classes are used as a static inner class in the Jctree class.

3) Semantic Analysis

turn some difficult and complicated grammar into a simpler grammar. This step is similar to the hard-to-understand classical Chinese into the vernacular or annotated some idioms, so that people can better understand.

The complex syntax is translated into the simplest syntax, which corresponds to Java, such as turning foreach into a For loop structure, as well as annotations, and finally forming an annotated abstract syntax tree that is closer to the grammar rules of the target language.

Com.sum.tools.javac.comp.Enter: The construction of symbol table

1) Enter symbols from the Java class into the symbol table

1) Add a default constructor to the class

Com.sun.tools.javac.processing.JavacProcessingEnvironment:annotation processing

2) Handling annotation annotations

Com.sun.tools.javac.comp.Attr: Labeling and grammar checking

3) Check whether the action variable type matches, operand | method return value type match Com.sun.tools.javac.comp.Check

3) Check that the variable, method, or class access is valid, that the variable is a static variable, that the variable has been initialized before use Com.sun.tools.javac.comp.Resolve

3) Derive the type of the parameter in the generic method Com.sum.tools.javac.comp.Infer

3) combine some constants to handle Com.sum.tools.javac.comp.ConstFold

Com.sun.tools.javac.comp.Flow Data Stream Analysis

4) Check if the variable is correctly assigned before use

4) package final modified variables are not re-assigned

4) The return value type of the method must be determined

4) Check that all operations are up to

4) Check if the checked exception exception has been caught or thrown

5) Remove the syntax sugar from Java

5) Remove useless code, such as permanent if code block

5) automatic conversion of variables, such as int automatically wrapped into an integer type or the opposite operation;

4) Code generator

generates bytecode from annotated abstract syntax trees, transforms one data structure into another, similar to translating all Chinese sentences into English words and assembling them into English sentences in English grammar.

Com.sun.tools.javac.jvm.Gen

① the code blocks in the Java method into a command form that conforms to the JVM syntax, the JVM operates on a stack-based basis, and all operations must be done through the stack and the stack.

② installs the JVM's file organization format to output the bytecode to a file with the class extension.

2 classes:

①items: Any addressable action item that includes local variables, class instance variables, or user-defined constants in a constant pool, which can appear as a unit on the Operation Stack

②code: Store generated bytecode and provide some way to map the opcode


The realization of visitor pattern in Javac

Lexical analysis, parsing, semantic analysis, and code generation all have to traverse the syntax tree many times, but each traversal of the syntax tree will have different processing actions. Javac is designed with a visitor pattern, and each traversal is a process of visitor execution.

The visitor pattern can decouple the data structure from the operation of the data structure, which makes it unnecessary to modify the data structure or modify the original operation, and then define the new visitor implementation. Different implementations of the visitor pattern are defined in different compilation stages in Javac.

①treescanner, Enter, Attr, Gen, flow, and so on are all role-specific visitors, and each visitor defines their own access rules.

②ejcif, Jctry, Jcbreak, Jcreturn are concrete node elements, and exist as a stable data structure.


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Javac Compiler principle

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.