Four Javac compilation processes
?? The Javac compilation process is roughly divided into four processes:
Lexical Analysis syntax analysis semantic analysis code generation lexical analysis
?? Lexical analysis is to transform the Token stream of the source code into a Token set. A single character is the smallest element in the programming process, while a tag is the smallest element in the compilation process, the keyword, variable name, literal, and operator can all be edited. For example, the code "int a + B = 2" contains six tags, they are int, a, =, B, +, and 2. Although the keyword int is composed of three characters, it is only a Token and cannot be split. In the source code of Javac, the lexical analysis process is implemented by the com. sun. tools. javac. parser. parser class.
Syntax analysis
?? The Lexical analyzer converts the Token stream of the Java source file into the corresponding Token stream. The syntax analyzer divides the tokens of the lexical analyzer into a more structured syntax tree, that is, assembling words into one sentence and a complete statement. The words that are combined are the subject, the predicates, the object, and the attribute are not further differentiated.
?? Syntax analysis refers to the process of constructing an abstract syntax tree based on the Token sequence. The abstract syntax tree is a tree expression that describes the syntax structure of program code, each node in the syntax tree represents a syntax structure in the program code, such as package, type, modifier, operator, interface, return value, and even code comment. The syntax analysis process is composed of com. sun. tools. javac. parser. parser class implementation. The abstract syntax tree generated at this stage is composed of com. sun. tools. javc. tree. the JCTree class indicates that after this step, the compiler will basically no longer operate on the source code file, and subsequent operations will be built on the abstract syntax tree.
Semantic Analysis
?? After the syntax analysis, the compiler obtains the abstract syntax tree representation of the program code. The syntax tree can represent the abstraction of a source program with the correct structure, but cannot ensure that the source program is logical. Semantic Analysis requires some Processing Based on the syntax tree, such as adding a default constructor to the class, checking whether the variables have been initialized before use, and merging some constants, check whether the operation variable type matches, check whether all operation statements are reachable, and check whether the checked exception is correctly handled.
?? Semantic Analysis stages include filling symbol table, labeling check, data and control flow analysis.
Fill symbol table
?? A symbolic table consists of a set of symbolic addresses and symbolic information. You can think of it as a hash table K-V value pair. The information registered in the symbol table must be used at different stages of compilation. In semantic analysis, the content registered in the symbol table is used for semantic detection and intermediate code generation. In the generation phase of the target Code, when a symbolic name is allocated to an address, the symbol table is the basis for address allocation. In the Javac source code, the process of filling the symbol table is implemented by the com. sun. tools. javac. comp. Enter Class.
?? In addition to the class, a class defines some symbolic variables such as the class name, variable name, and method name, and some symbols reference other classes, these symbols call methods or variables of other classes, and some classes may inherit or implement superclasses and interfaces. These symbols are defined in other classes, so you need to resolve the symbols of these classes to the symbol table.
?? In the step of entering class parsing, another important step is to add the default constructor. If no constructor is provided in the Code, the compiler adds a default constructor that has no parameters and is consistent with the current one.
Annotation check
?? The checked content includes: whether the types of variables match, whether the variables have been initialized before use, the parameter types that can be used to export generic methods, and the merging of string constants (constant folding ). An important action in the labeling check step is called constant folding. If we write the following definition in the Code:
int a=1+2;
?? Then we can still see the literal 1, 2, and operator + in the syntax tree, but after constant folding, they will be folded to the literal 3. the implemented class is com. sun. tools. javac. comp. attr class and com. sun. tools. javac. comp. check class.
Data Stream Analysis
?? Data Flow mainly completes the following tasks:
Check whether all variables have been correctly assigned values before use. Make sure that the final variable is not assigned a value repeatedly. Determine the return value type of the method. Here, you need to check whether the return value type of the method is determined and whether the reference type that accepts the return value of this method matches. If no return value exists, there cannot be any reference type pointing to the return value of the method. All Checked exceptions must be captured or thrown up. All statements must be executed. Here, we will check whether a statement appears after a return method, because the statement after the return method will never be executed.
Control Flow Analysis
?? The control flow mainly completes the following tasks:
Remove useless code, such as permanent if code blocks. Automatic conversion of variables, such as automatic packing and unpacking. Remove syntactic sugar. The de-syntactic sugar process is triggered by the desugar () method and completed in the com. sun. tools. javac. comp. TransTypes and com. sun. tools. javac. comp. Lower classes.
The entrance to data flow and control Flow analysis is the flow () method. The specific operations are completed by the com. sun. tools. javac. comp. Flow class. Bytecode generation
?? Completed by the com. sun. tools. javac. jvm. Gen class. The bytecode phase not only converts the information (syntax tree and symbol table) generated by the previous steps into bytecode and writes it to the disk, but also adds and converts a small amount of code.
The instance constructor method and class constructor method are added to the syntax tree at this stage.
?? To generate a java bytecode, follow these two steps:
The code block in the java method is converted into a command form that complies with the JVM syntax. JVM operations are stack-based, and all operations must go through the outbound and progressive processes. Output bytecode to a file with the class extension according to the JVM file organization format.
?? After jdk1.5, the java language provides support for Annotation. These annotations, like common Java code, play a role during runtime. Jdk1.6 provides a set of standard APIs for the plug-in Annotation processor to process the annotation during compilation. we can regard it as a set of compiler plug-ins. In these plug-ins, attackers can read, modify, and add any element in the abstract syntax tree. If these plug-ins modify the syntax tree during annotation processing, the compiler will return to the process of parsing and filling the symbol table for re-processing, the syntax tree is not modified until all plug-in Annotation processors. The annotation processing is performed after the symbol table is filled and before the annotation is marked.