Deep understanding of JVM Reading Notes 4: (early) compiler optimization

Source: Internet
Author: User
10.1 Overview

The "compilation period" of the Java language is actually an "uncertain" operation process, because it may refer to a front-end compiler (actually called the "front-end of the compiler ").. Java file. The process of class files. It may also be the process of converting bytecode into machine code by the JIT Compiler (Just In Time compiler) in the backend runtime of the virtual machine; it may also be that the AOT Compiler (ahead of time compiler) directly calls *. the process of compiling local machine code for java files. The following lists some representative compilers in these three types of compilation processes.

  • Front-end compiler: Incremental Compiler (ECJ) in Sun Java and eclipse jdt ).
  • JIT compiler: C1 and C2 compilers of hotspot VM.
  • AOT Compiler: GNU Compiler for the Java (gcj), Excelsior jet.

Javac has made many optimization measures for the Java language coding process to improve the coding style of programmers and improve the coding efficiency. Many new Java syntax features are implemented by the compiler's "syntactic sugar" instead of relying on underlying improvements of virtual machines. It can be said that, the optimization process of the real-time compiler in Java at runtime is more important for the program running, while the optimization process of the front-end compiler at the compilation stage is more closely related to the program coding.

10.2 javac compiler source code and debugging for 10.2.1javac

The virtual machine specification strictly defines the format of class files, but in Java virtual machine specification (version 2nd), although there is a special chapter "compiling for the Java Virtual Machine ", however, they are all described in the form of examples, and there is no strict definition of how to convert the Java source code file into the compilation process of the class file, this causes the class file compilation to be related to the specific JDK implementation to some extent. In some extreme cases, the javac compiler may be able to compile a piece of code, but the ECJ compiler cannot compile it. From the sun javac code, the compilation process can be roughly divided into three processes:

  • Parsing and filling the symbol table process.
  • Annotation processing process of the plug-in Annotation Processor.
  • Analysis and bytecode Generation Process.

The javac compilation action entry is com. sun. tools. javac. main. javacompiler class. The Code logic of the above three processes is concentrated in the compile () and compile2 () methods of this class, as shown in the main code 10-5, the most critical processing of the entire compilation is done by the eight methods marked in the figure. Let's take a look at the functions of these eight methods.

10.2.2 parsing and filling symbol table

The parsing steps are completed by the parsefiles () method in Figure 10-5 (process 1.1 in Figure 10-5). The parsing steps includeLexical AnalysisAndSyntax analysisTwo processes.

1. Lexical and syntax analysis
Lexical analysis is to transform the token stream of the source code into a token set. A single character is the smallest element in the programming process, while a tag is the smallest element in the compilation process, keyword, variable name, literal, and operator can all be markedFor example, the code "int A = B + 2" contains six tags: int, A, =, B, +, and 2, although the keyword int is composed of three characters, it is only a token and cannot be split. In the source code of javac, the lexical analysis process is implemented by the COM. Sun. Tools. javac. parser. parser class.

2. Fill symbol table

After completing the syntax analysis and lexical analysis, the next step is to fill the symbol table, that is, the entertrees () method in Figure 10-5 (process 1.2 in Figure 10-5).Symbol table is a table composed of a group of symbol addresses and symbol information.Readers can think of it as the form of K-V value pairs in the hash table (in fact, the symbol table is not necessarily a hash table implementation, it can be an ordered symbol table, a tree symbol table, or a stack structure symbol table ). The information registered in the symbol table must be used at different stages of compilation. In semantic analysis, the content registered in the symbol table is used for semantic checks (such as checking whether a name is used in the same way as the original description) and generating intermediate code. In the generation phase of the target Code, when a symbolic name is allocated to an address, the symbol table is the basis for address allocation.

10.2.3 annotation Processor

After JDK 1.5, the Java language provides support for annotation. These annotations, like common Java code, play a role during runtime. The JSR-269 specification (JSR-269: pluggable annotations processing API) is implemented in JDK 1.6 )), provides a set of standard APIs for the plug-in Annotation processor to process the annotation during compilation. we can regard it as a set of compiler plug-ins. In these plug-ins, attackers can read, modify, and add any element in the abstract syntax tree. If these plug-ins modify the syntax tree during annotation processing, the compiler will return to the process of parsing and filling the symbol table for re-processing, the syntax tree is not modified by all plug-in Annotation processors. Each cycle is called a round, that is, the loop process in Figure 10-4.

With the standard API processed by the compiler annotation, our code can interfere with the compiler's behavior. Because any element in the syntax tree or even the code annotation can be accessed in the plug-in, therefore, plug-ins implemented through the plug-in Annotation processor have a lot of room for function. With enough creativity, programmers can use plug-in Annotation processors to implement many things that can only be done in encoding.

In the javac source code, the initialization process of the plug-in Annotation processor is completed in the initprocessannotations () method, and its execution process is completed in the processannotations () method, this method is used to determine whether a new annotation processor needs to be executed. If yes, use Com. sun. tools. javac. processing. the doprocessing () method of the javacprocessingenvironment class generates a new javacompiler object to process the subsequent compilation steps.

10.2.4 Semantic Analysis and bytecode generation

After the syntax analysis, the compiler obtains the abstract syntax tree representation of the program code. The syntax tree can represent the abstraction of a source program with the correct structure, but cannot ensure that the source program is logical. The main task of semantic analysis is to review the context-related nature of the source program with the correct structure, such as type review. For example, suppose there are three variable definition statements:

int a = 1;  boolean b = false;  char c = 2;  

Possible subsequent assignment operations:

int d = a + c;  int d = b + c;  char d = a + c;  

If the above three values are involved in subsequent code, they can all constitute a correctly structured syntax tree. However, there are no semantic problems with only 1st types of statements, by compiling, the remaining two types are not logical in the Java language and cannot be compiled (whether the Semantic Logic must be defined in the language and the specific context environment makes sense. For example, in C language, the context definitions of A, B, and C are unchanged, and 2nd and 3 statements can be correctly compiled ).

1. labeling check
During javac compilation, the semantic analysis process consists of two steps: labeling check and data and control flow analysis.

The content of the labeling check step check includes, for example, whether the variable has been declared before use, and whether the data type between the variable and the value assignment can match. In the labeling check step, another important action is called constant folding. If we write the following definition in the code.

int a = 1 + 2; 

We can still see the literal "1", "2", and operator "+" in the syntax tree, but after constant folding, they will be folded to the literal value "3", 10-7, and the value of this plug-in expression (infix expression) has been marked in the syntax tree (constantexpressionvalue: 3 ). Since constant folding is performed during compilation, defining "A = 1 + 2" in the Code is better than defining "A = 3" directly ", it does not increase the amount of computing for a single CPU command during the running period.

2. Data and Control Flow Analysis

Data and control flow analysis further verifies the context logic of the program, it can detect problems such as whether the program's local variables are assigned a value before use, whether each path of the method has a return value, and whether all the checked exceptions are correctly handled. The data and control flow analysis during compilation is basically the same as the data and control flow analysis during class loading, but the verification scope is different, some verifications can only be performed during compilation or runtime. The following is an example of final modifier data and control flow analysis. For details, see the code list 10-1.

// The method carries the final modifier public void Foo (final int Arg) {final int Var = 0; // do something} // method without final modification public void Foo (INT Arg) {int Var = 0; // do something}

In the two Foo () methods, the final modifier is used for the parameters and local variable definitions of the first method, while the second method does not, during code writing, the program will certainly be affected by the final modifier. You cannot change the values of Arg and VAR variables, however, there is no difference between the class files compiled by the two codes. Through chapter 1, we know that local variables and fields (instance variables and class variables) there is a difference. It does not have the constant_fieldref_info symbol reference in the constant pool, and naturally there is no access flag (access_flags) information, even the names may not be retained (depending on the options during compilation). Naturally, it is impossible to know whether a local variable is declared as final in the class file. Therefore,Declaring local variables as final does not affect the runtime. the immutability of variables is only guaranteed by the compiler during compilation.

3. Syntactic sugar Decoding
Syntaxes sugar (syntactic sugar), also known as the sugar clothing syntax, is by the British computer scientist Peter John Highlander (Perter J. A term invented by landin refers to a syntax added to a computer language, which has no effect on the functions of the language, but is more convenient for programmers. In general, the use of syntactic sugar can increase the readability of the program, thus reducing the chance of program code errors.

4. bytecode generation

Bytecode generation is the last phase of javac compilation, which is completed by the COM. Sun. Tools. javac. JVM. gen class in the javac source code.The bytecode generation phase not only converts the information (syntax tree and symbol table) generated by the previous steps into bytecode and writes it to the disk, the compiler also performs a small amount of code addition and conversion.

If no constructor is provided in your code, the compiler will add a default constructor without parameters (public, protected, or private) and the current class, this is done in the fill symbol table stage ). In addition to the constructor, there are other code replacement tasks used to optimize the implementation logic of the program, for example, replace the string addition operation with the append () Operation of stringbuffer or stringbuilder (depending on whether the version of the target code is greater than or equal to JDK 1.5.

10.3 syntactic sugar taste 10.3.1 generics and type Erasure

The usage of generic technology in C # and Java seems the same, but there are fundamental differences in implementation, C # whether it is in the program source code, in the compiled il (intermediate language, intermediate language, at this time, the generic is a placeholder), or in the runtime CLR, list and list are two different types. They are generated at system runtime and have their own virtual method tables and type data. This implementation is called type expansion, the generics implemented based on this method are called real generics.

The generics in the Java language are different. They only exist in the program source code. In the compiled bytecode file, they are replaced with the original native type (raw type, it is also called the bare type), and the mandatory Type Code is inserted in the corresponding place. Therefore, for Java language at runtime, arraylist and arraylist are the same class, so the generic technology is actually a syntactic sugar in the Java language. The generic Implementation Method in the Java language is called type erasure, and the generic implementation based on this method is called pseudo-generic.

Code List 10-2 is a simple Java generic example. Let's take a look at the compiled result.

Code List 10-2 examples before generic Erasure

Public static void main (string [] ARGs) {Map <string, string> map = new hashmap <string, string> (); map. put ("hello", "hello"); map. put ("how are you? "," Have you eaten? "); System. Out. println (Map. Get (" hello "); system. Out. println (Map. Get (" how are you? "));}

Compile this Java code into a class file, and then decompile it with the bytecode decompilation tool, you will find that all generics are gone (when you use JD-Gui to view and find the declaration, the generics are still present, and the others become forced type conversion ), the program has changed back to the writing method before the emergence of Java generics, and all the generic types have changed back to the native type, as shown in the code list 10-3.

Code List 10-3 Example of generic Erasure

When a wildcard is overloaded
1. When the generic type encounters heavy load 1

public class GenericTypes {      public static void method(List<String> list) {          System.out.println("invoke method(List<String> list)");      }      public static void method(List<Integer> list) {          System.out.println("invoke method(List<Integer> list)");      }  }  

Please think about whether the above Code is correct and whether it can be compiled and executed? Maybe you already have an answer. This Code cannot be compiled, because the list and list parameters are erased after compilation and become the same native type list, the erasure action causes the feature signatures of the two methods to be identical. It seems that the cause of overload cannot be found, but is that true? It can only be said that generic erasure to the same native type is only part of the reason that it cannot be reloaded. Please take a look at the content in the code list 10-5.

public class GenericTypes {      public static String method(List<String> list) {          System.out.println("invoke method(List<String> list)");          return "";      }      public static int method(List<Integer> list) {          System.out.println("invoke method(List<Integer> list)");          return 1;      }      public static void main(String[] args) {          method(new ArrayList<String>());          method(new ArrayList<Integer>());      }  }  

Execution result:

invoke method(List<String> list)  invoke method(List<Integer> list)  

The difference between code listing 10-5 and code listing 10-4 is that different return values are added to the two method methods. As the two return values are added, the method overload is successful, that is, this code can be compiled and executed (Note: during the test, use Sun JDK 1.6 (1.7 and 1.8 cannot be compiled) for compilation. Other compilers, for example, the ECJ compiler of eclipse jdt may still reject this code. Is this a challenge to the basic cognition that the return values in the Java language do not participate in the heavy-load selection?

The reload in code list 10-5 is certainly not determined based on the returned value. The reason for this compilation and execution is successful is that two methods () methods can be included in a class file only after they are added with different return values. The data structure of the method table (method_info) in the class file was mentioned earlier,Method overloading requires that the method has different feature signatures, and the return value is not included in the feature signature of the method. Therefore, the return value is not included in the reload selection, but in the class file format, as long as the descriptors are not completely consistent, the two methods can coexist. That is to say, if two methods have the same name and feature signature, but their return values are different, they can also coexist legally in a class file.

10.3.2 automatic packing, unboxing, and traversal loop: slightly 10.3.3 condition Compilation

I personally feel that C # is much more elegant than Java in terms of language.

Deep understanding of Java Virtual Machine-JVM advanced features and best practices (version 2nd) PDF download:
Http://download.csdn.net/detail/xunzaosiyecao/9648998

Jiankunking Source: http://blog.csdn.net/jiankunking

Deep understanding of JVM Reading Notes 4: (early) compiler optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.