"Deep Java Virtual Machine" VII: JAVAC compilation and JIT compilation

Source: Internet
Author: User

Reprint Please specify source:http://blog.csdn.net/ns_code/article/details/18009455


compilation Process

Whether it is a physical or virtual machine, most of the program code starts with the compilation to the target code of the physical machine or the instruction set executed by the virtual function, as shown in the various steps:


The green module can be implemented selectively. It is easy to see that the middle branch is the process of interpreting execution (that is, a bytecode that interprets execution, such as JavaScript), while the following branch is the traditional compilation principle from the source code to the target machine code generation process.

Today, based on the language of physical machines, virtual machines, and so on, most of them follow this idea based on the modern classical compiling principle, before executing the process of lexical parsing and parsing of the source code into the abstract syntax tree. for a specific language implementation, lexical and syntactic analysis and even the following optimizer and the target code generator can choose to be independent of the execution engine, to form a complete meaning of the compiler to implement, such representatives are C/s language. It is also possible to implement a semi-independent compiler with the abstract syntax tree or the steps preceding the instruction flow, which is the Java language. Or you can put all of these steps together with the execution engine, like most JavaScript actuators.


Javac Compiling

When it comes to "compiling" in Java, it's natural to think that the Javac compiler compiles *.java files into *.class files, where the Javac compiler is called the front-end compiler, and the other front-end compilers have the eclipse Incremental compiler ECJ in JDT. There is also a back-end compiler , which translates bytecode into machine code while the program is running (now the Java program basically interprets execution and compile execution at runtime), such as the JIT that comes with the Hotspot virtual machine (Just in time Compiler) Compiler (sub-client and server side). In addition, it is sometimes possible to encounter static pre-compiler (Aot,ahead of time Compiler) directly to the *.java files compiled cost to machine code, such as GCJ, Excelsior jet, such as such compilers we should be less encountered.

The following is a brief description of the Javac compilation (front-end compilation) process.

lexical and grammatical analysis

Lexical analysis is the transformation of a stream of characters from a source code into a token (token) collection. A single character is the smallest element in the process of programming, and the tag is the smallest element of the compilation process, and the keyword, variable name, literal, operator, and so on can be tokens, such as an integer flag int consisting of three characters, but it is only a token and cannot be split.

Parsing is the process of constructing an abstract syntax tree based on a token sequence. Abstract syntax tree is a kind of tree representation to describe the syntax structure of program code, each node of the syntax tree represents a grammatical structure in program code, such as Bao, type, modifier, operator, etc. After this step, the compiler will basically no longer manipulate the source files, the subsequent operations are based on the abstract syntax tree.

Fill Symbol Table

After parsing and lexical analysis are completed, the next step is to populate the symbol table process. A symbol table is a table that consists of a set of symbolic addresses and symbolic information. The information that is registered in the symbol table is used in different stages of compiling, in the semantic analysis (later steps), the contents of the symbol table will be used for the semantic check and produce the intermediate code, in the target code generation stage, the symbol table is the basis of the address assignment when the party assigns the symbolic name to the address.

Semantic Analysis

The syntax tree can represent an abstraction of a properly structured source program, but there is no guarantee that the source program is logical. and the main task of semantic analysis is to read the structure of the correct source program for the context-related nature of the review. The semantic analysis process is divided into labeling inspection and data and control flow analysis two steps:

    • The label check step examines the contents of whether the variable is declared before it is used, whether the data type between the variable and the assignment matches, and so on.
    • Data and control flow analysis is a further verification of the context logic of the program, it can check whether the program local variables are assigned before use, whether each path of the method has a return value, whether all the checked exception is handled correctly and so on.
byte code generation

Bytecode generation is the last phase of the Javac compilation process. The bytecode generation phase is not just about converting the information generated by the previous steps into a byte-code write to disk, but the compiler also makes a small amount of code additions and conversions. The instance constructor <init> () method and the class constructor <clinit> () method are added to the syntax tree at this stage (the instance constructor here does not refer to the default constructor, but rather to our own overloaded constructors, If no constructors are provided in the user code, the compiler automatically adds a default constructor that does not have parameters and access rights that are consistent with the current class, which is already done in the fill symbol table phase.


JIT compilation

Java programs are initially interpreted only by the interpreter, that is, the execution of bytecode-by-article interpretation, which is relatively slow to execute, especially when a method or block of code is running particularly frequently, and this approach is inefficient. So later in the virtual machine introduced the JIT compiler (instant compiler), when the virtual machine found a method or block of code run particularly frequently, it will be identified as "Hot Spot Code", in order to improve the efficiency of hot code execution, at run time, The virtual machine will compile this code into the machine code associated with the local platform and optimize it at all levels, and it is the JIT compiler that accomplishes this task.

Now almost all of the mainstream commercial virtual machines (such as Sun HotSpot, IBM J9) contain both interpreters and compilers (the jrockit of one of the three commercial virtual machines is an exception, there is no interpreter inside, so there are drawbacks such as the startup time, but it is mainly service-oriented applications, This type of application typically does not focus on startup time). Each has the advantage: when the program needs to start and execute quickly, the interpreter can play a role first, save the time of compiling, immediately execute; When the program runs, the compiler will gradually return to work over time, and more and more code will be compiled with cost code to achieve higher execution efficiency. Interpreting execution can save memory, and compiling execution can improve efficiency.

There are two JIT compilers built into the hotspot virtual machine: Client complier and Server complier, which are used on both clients and servers, and in the current mainstream hotspot virtual machine, the default is to use an interpreter to work directly with one of the compilers.

There are two types of "hotspot code" that will be compiled by the instant compiler during the run:

    • The method that was called multiple times.
    • The loop body that is called multiple times.

In both cases, the compiler takes the entire method as the compilation object, and this compilation is also the standard way to compile the virtual machine. To know if a piece of code or method is not hot code, it is necessary to trigger the immediate compilation, need to do hot spot Detection(hotspot detection). At present, the main hot spot determination methods have the following two kinds:

    • Sampling-based hotspot detection: the virtual opportunity of this method periodically checks the top of each thread's stack, and if some methods are found to be present at the top of the stack, then this method code is the "Hotspot code". The advantage of this detection method is that it is simple and efficient, and it is easy to get the method call relationship, the disadvantage is that it is difficult to accurately confirm the heat of a method, it is easy to disturb the hotspot detection because of the thread blocking or other external factors.
    • Hot spot detection based on counter: the virtual opportunity of this method is to set up counter for each method, even code block, the number of execution times of statistic method, if the number of executions exceeds certain threshold, it is considered as "hot method". This kind of statistical method is more complicated, it needs to establish and maintain counter for each method, and can't get the call relation of method directly, but its statistic result is more precise and rigorous.

The second, counter-based hotspot detection method is used in the hotspot virtual machine, so it prepares two counters for each method: The method call counter and the back edge counter.

The method call counter is used to count the number of method calls, and in the default setting, the method call counter counts not the absolute number of times the method was called, but rather a relative frequency of execution, that is, the number of times the method was called over a period of time.

The back edge counter is used to count the number of times the loop body code is executed in a method (to be precise, it should be the number of back edges, because not all loops are back edges), and the instruction to jump after controlling the flow in the bytecode is called "Back edge".

JIT Compilation. After the JIT compilation is triggered, the execution engine does not wait for the compilation request to complete synchronously, but continues to enter the interpreter to execute the bytecode as interpreted, until the submitted request is compiled by the compiler (the compilation work is done in the background thread). The compiled version is used the next time the method or code is called when the compilation work is completed.

Because the process of triggering the instant compilation by the method counter is similar to the process of triggering an immediate compilation by the back-side counter, only the method call counter that triggers the immediate compilation process is given here:


The execution of the Javac bytecode compiler with the JIT compiler within a virtual machine is actually equivalent to the compilation process performed by a traditional compiler.

"Deep Java Virtual Machine" VII: JAVAC compilation and JIT compilation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.