Introduced
The Java compiler first compiles it as a class file, which is the bytecode, which is called Javac compilation, and then the bytecode is interpreted and executed by the JVM (Java Virtual machine), so many places say "Java is a semi-compiled, semi-interpreted execution" language. Of course, due to the advent of JIT, this statement is partly wrong. The next two blogs will briefly introduce the Javac compilation process and the JIT mechanism.
JAVAC compilation process
The Javac compiler compiles . java files into a. class file, where the Javac compiler is called the front-end compiler, and the back-end compiler, which translates bytecode into machine code while the program is running.
Javac compilation (front-end compilation) includes lexical, parsing, filling symbol table , semantic parsing , bytecode generation .
Lexical and grammatical analysis
Lexical analysis is the transformation of a stream of characters from a source code into a token (token) collection. A single character is the smallest element in the process of programming, and the tag is the smallest element of the compilation process, and the keyword, variable name, literal, operator, and so on can be tokens. For example, the following line of code
int num = a + 6;
This line of code contains 6 tags, namely int, num, =, a, +, 6;
This integer flag, int, is made up of three characters, but it is only a token and cannot be split.
The function of lexical analysis is to convert the character stream of the Java source file into the corresponding token series . Syntax analysis is the process of building a more structured abstract syntax tree from the token sequence generated by lexical analysis.
Abstract Syntax Tree is a kind of tree representation to describe the syntax structure of program code, each node of the syntax tree represents a grammatical structure in program code, such as package, type, modifier, operator, interface, return value and even code comment can be a grammatical structure. The parsing process is implemented by the Com.sun.tools.javac.parser.Parser class, and the abstract syntax tree produced by this stage is represented by the Com.sun.tools.javc.tree.JCTree class.
After this step, the compiler will basically no longer manipulate the source files, the subsequent operations are based on the abstract syntax tree.
Fill symbol table
A symbol table is a table that consists of a set of symbolic addresses and symbolic information . In the target code generation phase, the symbol table is the basis of the address assignment when the symbol name is assigned to the address.
You can think of it as a hash table k-v value pair, or the form of a key-value pair in JavaScript.
A class in addition to the class itself will define some symbolic variables such as class name, variable name and method name, and some symbols refer to other classes, these symbols call other classes of methods or variables, and some classes may inherit or implement superclass and interface, etc. These symbols are defined in other classes, then the symbols of these classes need to be parsed into the symbol table as well.
The information that is written is usually information about identifiers (identifier), such as type, scope, and so on. It is often applied in the semantic analysis (Semantic) phase, as we do not deal with such situations in the syntactic analysis (Syntax) phase:
a;a"Hello,World!";
In this case, we can use the symbol table to process the semantic analysis and identify the type mismatch.
Semantic analysis
The syntax tree can represent an abstraction of a properly structured source program, but there is no guarantee that the source program is logical. and the main task of semantic analysis is to read the structure of the correct source program for the context-related nature of the review.
The semantic analysis process is divided into labeling inspection and data and control flow analysis two steps:
- The label check step examines the contents of whether the variable is declared before it is used, whether the data type between the variable and the assignment matches, and so on.
- Data and control flow analysis is a further verification of the context logic of the program, it can check whether the program local variables are assigned before use, whether each path of the method has a return value, whether all the checked exception is handled correctly and so on.
Byte code generation
The bytecode generation phase is not just about converting the information generated by the previous steps into a byte-code write to disk, but the compiler also makes a small amount of code additions and conversions. The instance constructor () method and the class constructor () method are added to the syntax tree at this stage.
After the bytecode generation is finished, the. java file becomes a. class file, and the bytecode is interpreted by the JVM as an article by clause when executing the Java program. But this is before the JIT appears, the next article will introduce the JIT.
I also learn to write, if there are errors, please correct me in the comments.
JAVAC compilation process