Chapter 2 Javac compilation principles and javac compilation principles
Note: This article mainly records from the "in-depth analysis of java web technology insider" Chapter 4 javac compilation principles"
1. Functions of javac
- Convert *. java source code file to *. class File
2. compilation process
Process:
- Lexical analyzer: converts source code into a Token stream
- Divide the source code into tokens (refer to 3.2 for the element types contained in tokens)
- Syntax analyzer: converts tokens into a syntax tree.
- Combine the preceding tokens into a sentence (or a sentence code block) to check whether the sentence conforms to the Java language specification.
- Semantic analyzer: converts a syntax tree to an annotation syntax tree.
- Convert complex syntaxes into simple syntaxes (for example, annotations, and foreach into for loops), perform some checks, and add some code.
- Code Generator: Convert the annotation syntax tree into bytecode
3. Lexical Analysis
3.1 Role
- Convert source code to a Token stream.
3.2 Process
Read Source Code in one byte and one byte to form a canonicalized Token stream. The canonicalized Token includes:
- Java keywords: package, import, public, class, int, etc.
- Custom Words: package name, class name, variable name, Method Name
- Symbols: =,;, +,-, *,/, %, {,}, etc.
3.3 examples
Code:
1 package compile; 2 3/** 4 * lexical 5 */6 public class Cifa {7 int a; 8 int c = a + 1; 9}View Code
The above code is converted to the Token stream:
Note: To complete the preceding exampleJavacParserThe parseCompilationUnit () method. For the source code, see the book at the beginning of the article.
Note: The above token stream conforms to the java language specification.
3.4 questions
- How can I determine whether a package is a java keyword or a custom variable?
- JavacParser controls the sequence and tokens Based on the java language specifications. (check the source code of parseCompilationUnit (). Therefore, the package appears at the beginning of the file, we will know that it is a Token. PACKAGE type, not custom Token. IDENTIFIER type.
- One practice: When writing a program, do not use java keywords to define variable names, class names, package names, and method names, but use meaningful words to define them. Of course, when you write code in eclipse, if you use java keywords to define variables, eclipse will remind you that this is an incorrect definition.
- How can we determine that package is a Token, but packa is not?
- In my understanding, it mainly looks at space and symbols (see symbol 3.2). For package, it is a word, and there is no space or symbol in the middle, so it is a Token.
- One practice: When writing code, for example: int a = B + c; // there is a space between a and =, a space between = and B, a space between B and +, and a space between + and c. Of course, there is no space here, because every variable is exactly separated by symbols, but I saw a video saying that if the above sentence does not contain these spaces, the compilation may fail, so we 'd better add spaces. Of course, after adding spaces, the entire code is clear.
4. syntax analysis
4.1 Role
- Combine tokens in the tokens generated after lexical analysis to check whether the tokens comply with the Java language standards.
4.2. syntax analysis:
- Package
- Import
- Class (including class, interface, and enum). The class mentioned above generally refers to these three types, not just class.
4.3 Examples
Code:
1 package compile; 2 3/** 4 * syntax 5 */6 public class Yufa {7 int a; 8 private int c = a + 1; 9 10 // getter11 public int getC () {12 return c; 13} 14 // setter15 public void setC (int c) {16 this. c = c; 17} 18}View Code
Final syntax tree:
Note:
- All classes under each package are placed under a JCCompilationUnit node, which contains the package syntax tree (as pid) and the syntax tree of each class.
- Each branch sent from JCClassDecl is a complete code block. The above four branches correspond to the two row attribute operation statements in our code and the two method block code blocks, in this way, the function of the syntax analyzer is completed: the tokens constitute a sentence (or a sentence code block)
- In the preceding syntax tree, attribute operations are complete, but some syntax nodes are omitted for the two method blocks. For example: the public modifier, return type, and parameters of the method.
Question:
The syntax tree of the import node is similar to that of the package, but where is the import syntax tree?
5. Semantic Analysis
5.1 role
- Convert the syntax tree to the annotation syntax tree
5.2 steps
- Add the default parameter-free Constructor (if no parameter constructor is specified)
- Process Annotation
- Annotation: Check semantic validity and perform logical judgment
- Check whether the variable types in the syntax tree match (eg. String s = 1 + 2; // The types at both ends of "=" do not match)
- Check whether the access to variables, methods, or classes is legal (eg. One class cannot access the private method of another class)
- Whether the variable has been declared and initialized before use
- Constant folding (for example, in the Code: String s = "hello" + "world", after semantic analysis, String s = "helloworld ")
- Derivation of parameter types of generic methods
- Data Stream Analysis
- Deterministic value assignment of variables (for example, methods with return values must be determined with return values)
- The final variable can only be assigned a value once, and an error will be reported if the value is assigned again during compilation.
- Whether all checked exceptions are thrown or caught
- All statements must be executed (the statements after return will not be executed, except for finally blocks)
- Further Semantic Analysis
- Remove the permanent false code (eg. if (false ))
- Automatic Variable conversion (eg.int and Integer)
- Remove the syntactic sugar (eg. foreach is converted to a for loop, assert is converted to if, and the internal class is parsed into an external class associated with the external class)
- Finally, the processed syntax tree is converted into the final annotation syntax tree.
6. generate bytecode
6.1 role
- Convert the annotation syntax tree into bytecode and write the bytecode into the *. class file.
6.2 steps
- Converts a java code block to a command that complies with the JVM syntax. This is the bytecode.
- Output bytecode to the *. class file according to the JVM file organization format
For detailed source code and procedures, see com. sun. tools. javac. jvm. Gen and distributed Java applications: basics and practices P42
6.3. Content contained in the class file
The generated *. class file contains not only bytecode information, but also:
- Structure Information
- Class file format version number
- Quantity and size of each part
- Metadata
- Class, parent class, implementation interface declaration information
- Attribute declaration information
- Method declaration information
- Constant pool
- Method Information
- Bytecode
- Exception processor table
- Local variable area size
- Size of the operand Stack
- Type record of the operand Stack
- Debugging Symbol Information
The local variable area and the operand stack mentioned here constitute the method stack. For details, refer to Chapter 1 JVM memory structure.
Summary:
For compiling this code, we will not directly operate on this code in actual operations. Unlike the classloader mechanism, we may need to write our own class loading tools, unlike Java memory management, we will directly configure the stack method area space on the server, configure the GC collector, etc. But understanding javac compilation will be helpful for us to understand the class file structure and class loading mechanism in the future, it is also helpful for us to master the execution process of the entire Java code. It is also helpful for us to understand some checks performed by the compiler during compilation, understanding these checks helps us to be more careful when writing code. For example, check exceptions must be captured or thrown, and each statement must be executed (that is, reachable. Although eclipse will automatically check for us when we write code, including checking whether the statements are reachable, it is good to understand this.