1. Compiler 1.1. Compile period classification
A *.java file is generally compiled and run, and involves two types of compilation periods:
① Compile-time compilation: The process that generally represents *.java->*.class (contains bytecode)-also called front-end compilation.
② Run-time compilation: generally represents the process of *.class-> machine code-also called back-end compilation.
1.2. Compiler classification
Front-end compilers
Role: The *.java->*.class is loaded by the loader for type loading, and is encoded in the compile-time optimizer.
Category:Sun 's Javac, Eclipse's JDT.
Back-end compiler (JIT compiler)
Role: The *.class-> machine code is used to explain the execution of the interpreter and run during the run-time optimization program. Just in time Compiler, which refers to the compiler that runs on the VM.
Category: C1, C2 compiler in HotSpot VMs.
Static advance compiler (AOT compiler)
Function: Compiles the *.java file directly to the cost of the machine code. Ahead of Time Compiler.
2. Javac
Javac is Sun, a Java Compiler developed in the Java language, compiled into *.class files only for *.java files. Javac is very representative, so take Javac as an example to explain the entire front-end compilation process.
Note : Source code files written in other languages can also be compiled into *.class files by a specific compiler. Source code files, such as the JRuby, groovy language, can also be run on the JVM after the compiler compiles the *.class file.
2.1. Javac a compiled logical process 2.1.1. Javac Compiled logic source code
Because Javac is written by Java, you can see its source code to understand the logical process of compiling *.java. From the source code of Javac can comb out the process of compiling, roughly divided into three major processes:① parsing and filling symbol table,② annotation processing,③ semantic analysis and bytecode generation. The entire process is done primarily by the compile () and Compile2 () methods in its API. The logical source code is as follows:
2.1.2. Javac compiled logic diagram
The annotation processing------is returned to the interpretation of the------> parse and fill symbol table: Because the syntax tree may have been modified during annotation processing, go back to the "parse and populate symbol table" procedure to re-process until there are no modifications to the syntax tree.
2.2. Javac compilation of detailed process 2.2.1. Parse and populate the symbol table procedure 2.2.1.1. Parsing (lexical, grammatical analysis) phase
Parsing includes: Lexical analysis and parsing, completed in the Parsefiles () method.
Lexical analysis: The process of converting a stream of characters from a source code into a token collection.
Parsing: The process of constructing a *.java Abstract Syntax tree based on the token sequence.
Abstract syntax tree (sat-abstract Syntax tree):① is used to describe the syntax structure of the Code of the program, and② represents the correct abstraction of a source code file structure.
The structure view of the abstract syntax tree is as follows:
Note : After the parsing process, the compiler will basically no longer operate on the *.java file, and subsequent operations are based on its abstract syntax tree.
2.2.1.2. Fill symbol Table Stage
After completing the lexical and syntactic parsing, the fill symbol table is then completed in the Entertrees () method.
Symbol table: A table consisting of a set of symbolic addresses and symbolic information.
Role: The table can be used for different stages of the compilation period.
Semantic Analysis phase: Semantic checking and intermediate code generation.
Bytecode generation phase: The address assignment is the basis for assigning a symbol name.
■ For example, the default constructor method is added at this stage.
If no construction method is provided in the code, the compiler adds a default constructor that has no parameters, access type (public, protected, or private), and the current class.
2.2.2. The callout process is processed by the annotation processor
The initialization process of the callout is done in the Initprocessannotations () method, and the execution process
The Processannotations () method is completed.
Annotations (Annotations): As with Java code, it works during run time.
In the process of annotating annotations, if annotations modify the abstract syntax tree (for example, by re-adding some code), the compiler goes back to the parse and populate symbol table stage. Example: the use of the annotation processor in code.
before annotations are processed
1 Public@DataclassLombokpojodemo {2 PrivateString name;3 }4 5 //From project Lombok6 @Target (Elementtype.type)7 @Retention (Retentionpolicy.source)8 Public@InterfaceData {9 String staticconstructor ()Ten default""; One}
View Code
After the annotations are processed
2.2.3. Semantic analysis and bytecode generation process 2.2.3.1. Semantic Analysis phase
After parsing, the compiler obtains a *.java abstract syntax tree that indicates that the *.JAVA structure is correct. However, it is not known whether the source code is logically correct. The task of semantic analysis is to examine the structure of the correct *.java logically (context-sensitive review: such as type Review), the specific check operation is done on the abstract syntax tree. Like what:
The following three expression expressions are correct in the structure of the Java language, but the latter two are not in accordance with the logic of the Java language, which is semantically incorrect (note: In C, the latter two expressions are in C-language semantics).
During the compilation of Javac, semantic analysis includes:① annotation checking,② Data flow and control flow analysis.
Label check
The callout check is done in the attribute () method.
The labeling checks include whether the① variable was declared before it was used, whether the data type of the② variable matches the assignment,③ constant folding, and so on.
● For example, constant folding: for constants that can be determined during compilation, if there is a "+" operator, you do not have to wait for the program to run and then do "+" operations, but in the semantic analysis phase of the compilation, the compiler directly determines the result value after the constant "+".
For example, the folding of numeric type constants: int a = 1 + 2, the literal "1", "2" and the operator "+" can still be seen on the syntax tree, but after the constants in the label check are collapsed, they are collapsed to the literal "3", labeled on the syntax tree.
Note : Because of constant folding at compile time, the code defines an int a = 1 + 2, and does not define a = 3, but increases the CPU's computational capacity.
For example, the folding of a string type constant:
1 Public class test26{2 Public Static void Main (string[] args) {3 String a = "A9"; 4 String B = "a" + 9; 5 System.out.println (A = = b); 6 }7 }
View Code
The result of the operation is: true
The compiled Java class file is as follows:
Description: Looking at the Constant_pool in the compiled Java class file, you will find that there is only one string literal "A9", that is, "a" + 9 has been optimized to collapse into "A9" during the compilation phase.
A variable of type string does not have a so-called collapse:
1 public class test26{ 2 public static void Main (string[] args) { 3 String a =" Qinfen " 4 String b = "Qin" ; 5 String c = b + "Fen"; 6 System.out.println (a == c); 7 8 }
View Code
The result of the operation is: false
Note: In a "+" operation, a constant collapse cannot be made because a string reference B exists, and the compiler cannot determine the value of B during compilation. The "+" action is replaced by using the StringBuilder append () method during the program's run.
As a comparison:
1 Public classtest26{2 Public Static voidMain (string[] args) {3String a = "Qinfen";4 FinalString B = "Qin";5String C = B +"Fen";6System.out.println (A = =c);7 }8}
View Code
The result of the operation is: true
The compiled Java class file is as follows:
Description: For a final modified variable, it is stored in its own constant pool or embedded in its byte stream at compile time by a local copy that is resolved to a constant value. So at this point the B + "fen" and "Qin" + "fen" effect is the same.
Data flow and control flow analysis
The data flow is done in the flow () method with control flow analysis.
It is a further validation of the context logic of the program, including whether① local variables have been assigned before use, whether each path of the② method has a return value, and whether all of the checked exceptions are handled correctly③ ;④ Checks that the final modified variable is not duplicated and so on.
The data flow of the compile period and the data flow of the control flow analysis and the class loading are basically consistent with the purpose of the control flow analysis.
2.2.3.2. Solution (except) grammar sugar phase
Syntactic sugar: Refers to a computer language added to a grammar, the function of the language has no effect, but more convenient for programmers to use. Done by the Desugar () method.
The syntactic sugars are: generics , variable-length variables, auto-boxing unboxing, assertions (assertion),foreach loops , switch of enum type, string-type switch (Java 7), Named inner class/Anonymous inner class/class literal, and so on.
Syntax sugar: The JVM runtime does not support syntactic sugars, so they are reverted back to a simple basic syntax structure during the compilation phase.
● For example: Generics
Generics: The essence is the parameterized type, specifying the data type of the operation as a parameter. This type of parameter can be used in the creation of classes, interfaces, and methods, respectively, as generic classes, generic interfaces, and generic methods.
Generics in the Java language exist only in the source code and revert to the native type at compile time. If there are:arraylist<int> and Arraylist<string> in the source code, the two will revert to the native type of ArrayList and ArrayList at compile time. Therefore, arraylist<int> and arraylist<string> are the same type for the run time.
For example: truncate if (false) { ...} form of useless code
Code that satisfies all of the following conditions is considered a useless code for conditional compilation and is purged at that stage.
The conditional expression for an If statement is a constant expression defined by the Java language Specification.
And the value of the constant expression is false then the then block is useless code, and otherwise the else block is a useless code.
● For example: Generics + automatic packing and unpacking
Before type conversion (to solve grammar before sugar)
Public void Desugargenerictorawandcheckcastdemo () { list<Integer> list = arrays.aslist (1, 2, 3); List.add (4); int i = list.get (0);}
After type conversion (to solve grammar before sugar)
Public void Desugargenerictorawandcheckcastdemo () { = arrays.aslist (1, 2, 3); List.add (4); int i = (Integer) list.get (0);}
After solving the grammatical sugars
foreach Loops
Before you solve the grammatical sugar
Public void Desugardemo () { = {1, 2, 3}; for (int i:array) { System.out.println (i); } assert array[0] = = 1; }
After solving the grammatical sugars
Public void Desugardemo () { = { integer.valueof (1), integer.valueof (2), integer.valueof (3) }; for (integer[] arr$ = array, len$ = arr$.length, i$ = 0;i$ < len$; + +i$) {int i = arr$[i$].intvalue (); { System.out.println (i); } } if (! $assertionsDisabled &&! (Array[0].intvalue () = = 1)) Throw New assertionerror ();}
2.2.3.3. Build phase (bytecode->*.class file)
The main task of this phase is to generate Java bytecode by the compiler, but before Java bytecode generation, the compiler
Some code generation additions and transformations are done on the abstract syntax tree.
Adding and converting code
generation of type initialization methods <clinit> () and class instantiation Methods <init> () add
Build of type initialization method <clinit> () add
At this stage, if the source code contains:① static variable assignment statement,② or static{} code block, the compiler creates 1 initialization methods for the class (or interface) <clinit> () (with only one), and the static variable An assignment statement or static{} code block is collected into the <clinit> () method.
Note: There are 3 scenarios in which the compiler does not generate class initialization methods at this stage <clinit> (): No static variables are declared in the ① class, there are no static{} blocks of code, static variables are declared in the ② class, but there are no assignment statements; The ③ class contains only compile constants.
Generation of class instantiation Methods <init> () add
At this stage, the compiler generates a <init> () method for each of the constructor methods in the class. If no constructor method is explicitly declared in the class, the compiler generates a default parameterless construction method that calls only the parameterless construction method of the superclass, or generates a <init> () method.
If the source code contains an assignment statement for a① instance variable, or a② or {} code block, the compiler collects them into the <init> () method. So the class instantiation method corresponding to the constructor method <init> () usually contains 3 kinds of code:① Another <init> () method,② The instance variable assignment statement or {} code block;③ all the code statements in the constructor method body.
Conversion of code (optimizer)
Replace the "+" operation of the string with the append () operation of StringBuffer or StringBuilder.
x++/x-is optimized to ++x/--x when conditions permit
Finally, after the completion of the syntax Tree traversal (post-process traversal) and adjustment, will fill all the required information of the symbol table into the Com.sun.tools.javac.jvm.ClassWriter class hand, and then the class of WriteClass () method output byte code, Generate the final *.class file.
Java-jvm_01_ Front-end compiler