Java Virtual machine JVM performance Optimization (ii): Compiler _java

Source: Internet
Author: User
Tags data structures

This article will be the second article in the JVM Performance Tuning series (first: Portal), and the Java compiler will be the focus of this article.

In this paper, the author (Eva Andreasson) first introduces different kinds of compilers, and compares the performance of client-side compilation, server-end compiler and multi-layer compilation. Then, at the end of the article, we introduce several common JVM optimization methods, such as dead Code elimination, code embedding and loop body optimization.

Java's most proud feature, platform independence, stems from the Java compiler. Software developers do what they can to write the best Java applications, followed by a background-running compiler that produces efficient executable code based on the target platform. Different compilers are suitable for different application requirements and thus produce different optimization results. Therefore, if you can better understand how the compiler works and learn more kinds of compilers, you can better optimize your Java programs.

This article highlights and explains the differences between the various Java Virtual machine compilers. At the same time, I will also explore some just-in-time compiler (JIT) commonly used optimization scheme.

What is a compiler?

In simple terms, the compiler takes a programming language program as input and then outputs it as a different executable language program. Javac is the most common type of compiler. It exists in all the JDK. Javac the Java code as output, converting it into a JVM executable code-byte code. These bytecode are stored in a file that ends with. Class and are loaded into the Java runtime environment when the Java program starts.

Bytecode is not directly readable by the CPU, it also needs to be translated into the machine instruction language that the current platform understands. There is another compiler in the JVM that is responsible for translating bytecode into instructions that the target platform can execute. Some JVM compilers need to pass through several levels of bytecode code stages. For example, a compiler may need to go through several different forms of intermediate stages before translating bytecode into machine instructions.

From a platform agnostic standpoint, we want our code to be as platform-independent as possible.

To do this, our translation at the last level-from the lowest byte code to the real machine code-really binds the executable code to a particular platform's architecture. From the highest level, we can divide the compiler into static compilers and dynamic compilers. We can choose the right compiler based on our goal execution environment, the optimization results we desire, and the resource constraints we need to meet. In the last article we briefly discussed static compilers and dynamic compilers, and we'll explain them in more detail in the next section.

Static compilation VS Dynamic compilation

The Javac we mentioned earlier is an example of a static compilation. For static compilers, the input code is interpreted once, and the output is the form that the program will be executed in the future. Unless you update the source code and recompile (through the compiler), the execution result of the program will never change: this is because the input is a static input and the compiler is a static compiler.

By statically compiling, the following program:

Copy Code code as follows:

Staticint add7 (int x) {return x+7;}

will be converted to a byte code similar to the following:

Copy Code code as follows:

Iload0 Bipush 7 Iadd Ireturn

Dynamic compiler dynamically compiles a language into another language, the so-called dynamic is to compile when the program is running--run while compiling! The benefit of dynamic compilation and optimization is that you can handle some of the changes in application loading. The Java runtime is often run on unpredictable and even changing environments, so dynamic compilation works well for the Java runtime. Most JVMs use dynamic compilers, such as the JIT compiler. It is noteworthy that dynamic compilation and code optimization require the use of some additional data structures, threads, and CPU resources. The more advanced the optimizer or bytecode context Analyzer, the more resources it consumes. But these costs are negligible relative to a significant performance boost.

JVM types and Java platform Independence

A common feature of all JVM implementations is the encoding of bytecode into machine instructions. Some JVMs interpret code while loading an application and find "hot" code through performance counters, while others are compiled to implement it. The main problem with compiling is that it requires a lot of resources, but it can also lead to better performance optimization.

If you're a novice in Java, the intricacies of the JVM are sure to make you dizzy. But the good news is that you don't need to make it particularly clear! The JVM will manage the compilation and optimization of code, and you don't need to worry about machine instructions and how to write code that best matches the architecture of the program's running platform.

From Java byte code to executable

Once you compile your Java code into bytecode, the next step is to translate the bytecode instructions into machine code. This step can be implemented by an interpreter or by a compiler.

Explain

Interpretation is the simplest way to compile bytecode. The interpreter finds the hardware instructions for each byte-code instruction in the form of a look-up table and sends it to the CPU for execution.

You can imagine the interpreter as a dictionary: each specific word (byte code instruction) has a specific translation (machine code instruction) corresponding to it. Because the interpreter executes the instruction as soon as it reads an instruction, it cannot optimize a set of instructions. At the same time, each byte code must be interpreted immediately, so the interpreter is running at a very slow speed. The interpreter executes the code in a very precise way, but it may not be the optimal result for the processor on the target platform because the output instruction set is not optimized.

Compile

The compiler loads all the code that will be executed into the runtime. This way, when it translates bytecode, it can refer to all or part of the run-time context. The decisions it makes are based on the results of the Code graph analysis. such as comparing different execution branches and reference run-time context data.

After the byte code sequence is translated into the machine code instruction set, it can be optimized based on the machine code instruction set. The optimized instruction set is stored in a structure called a code buffer. When these bytecode are executed again, the optimized code can be obtained and executed directly from this code buffer. In some cases the compiler does not use the optimizer for code optimization, but instead uses a new optimization sequence-"performance count".

The advantage of using a code buffer is that the result set instruction can be executed immediately and no longer needs to be redefined or compiled!

This can greatly reduce execution time, especially for Java applications that are invoked multiple times for a method.

Optimization

With the introduction of dynamic compilation, we have the opportunity to insert performance counters. For example, the compiler inserts a performance counter, and each time a byte code block (corresponding to a specific method) is invoked, the corresponding counter is added one. The compiler finds "hot blocks" through these counters to determine which code blocks are optimized to maximize performance for the application. Run-time profiling data can help the compiler get more optimized decisions online, further improving the efficiency of code execution. With more and more accurate code profiling data, we can find more optimization points to make better optimization decisions, such as how to better sequence instruction, whether to replace the original instruction set with a more efficient instruction set, and whether to eliminate redundant operations.

For example

Consider the following Java code

Copy Code code as follows:

Staticint add7 (int x) {return x+7;}

Javac will statically translate it into the following byte code:
Copy Code code as follows:

Iload0

Bipush 7

Iadd

Ireturn


When the method is invoked, the bytecode is dynamically compiled into machine instructions. The method may be optimized when the performance counter (if present) reaches the specified threshold. The optimized result may resemble the following machine instruction set:
Copy Code code as follows:

Lea Rax,[rdx+7] Ret

Different compilers apply to different applications

Different applications have different requirements. Enterprise Server-side applications typically take a long time to run, so you typically want to perform more performance tuning on them, while client programs may want faster response times and less resource consumption. Let's discuss three different compilers together with their pros and cons.

Client compiler (Client-side compilers)

C1 is a well-known optimization compiler. When you start the JVM, you can start the compiler by adding the-client parameter. By its name we can find that C1 is a client-side compiler. It works well for client applications that have very few resources available to the system or require a quick start. C1 code optimization by using performance counters. This is a simple way to optimize the source code intervention less.

Server-side compilers (Server-side compilers)

For long-running applications such as server-side enterprise applications, using the client compiler may not be enough to meet the requirements. At this point we should choose a server-side compiler like C2. You can start the optimizer by adding –server to the JVM startup row. Because most server-side applications are typically long-running, you will be able to collect more performance-optimized data than those with short, lightweight client applications using the C2 compiler. So you will also be able to apply more advanced optimization techniques and algorithms.

Tip: Preheat your server-side compiler

For server-side deployments, it may take some time for the compiler to optimize those "hotspot" codes. So server-side deployments often require a "heating" phase. So when performing a performance measurement on a server-side deployment, be sure to ensure that your application has reached a steady state! Giving the compiler enough time to compile will bring a lot of benefits to your application.

Server-side compilers are able to get more performance tuning data than client compilers, allowing for more complex branching analysis to find better tuning paths. The more profiling data you have, the better your application analysis results. Of course, doing a lot of profiling also requires more compiler resources. If the JVM uses the C2 compiler, it will need to use more CPU cycles, larger code buffers, and so on.

Multi-layer compilation

Multi-tier compilation mixes client and server-side compilation. Azul the first to implement multi-tier compilation in his zing JVM. Recently, this technology has been adopted by the Oracle Java Hotspot JVM (after Java SE7). Multi-tier compilation combines the advantages of client and server-side compilers. The client compiler behaves more actively in the following two scenarios: When the application starts, and when the performance counter reaches a lower level threshold, performance optimization. The client compiler also inserts performance counters and prepares the instruction set for subsequent advanced optimizations-server-side compilers-to use. Multi-layer compilation is a kind of performance analysis method with high resource utilization. Because it can collect data at low impact on compiler activity, the data can continue to be used in later, more advanced optimizations. This is a way to provide more information than using an explanatory Code Analysis counter.

Figure 1 describes the performance comparisons of the interpreter, client-side compilation, server-end compilation, and multi-tier compilation. X axis is the execution time (time unit), and the Y-axis is performance (operands per unit of time)


Figure 1. Compiler Performance Comparisons

Using a client-side compiler can bring a performance boost of 5 to 10 times times relative to pure explanatory code. The amount of performance gains depends on the compiler's efficiency, the type of optimizer available, and how well the application is designed to match the target platform. But the last one is often overlooked by the corresponding program developer.

Server-side compilers tend to deliver 30% to 50% performance improvements relative to the client compiler. In most cases, performance increases are often at the expense of resource consumption.

Multilayer compilation combines the advantages of two compilers. Client-side compilation has a shorter start time and can be quickly optimized, and server-level compilation allows for more advanced optimizations during the next steps.

Some common compiler optimizations

So far, we've talked about the meaning of optimizing code and how and when the JVM will be optimized for code. Next I'll end this article with an introduction to some of the optimizations actually used by the compiler. JVM optimizations actually occur in the bytecode phase (or at the lower level of the language representation phase), but the Java language is used to illustrate these optimizations. It is not possible to cover all JVM optimizations in this section; Of course, I hope these introductions will inspire you to learn hundreds of more advanced ways of optimizing your interest and innovate in compiler technology.

Dead Code Elimination

Dead code elimination, as the name implies is to eliminate those who will never be executed to the code-that is, "dead" code.

If the compiler discovers some extra instructions during the run, it will remove the instructions from the execution instruction set. For example, in Listing 1, where a variable is never used after it is assigned to it, all the assignment statements can be completely ignored during the execution phase. The operation that corresponds to the bytecode level is that you never need to load the variable value into the register. Not loading means consuming less CPU time, so it can speed up code execution and eventually cause the application to accelerate-if the load code is invoked several times per second, the optimization will be more pronounced.

Listing 1 uses Java code to enumerate an example of a variable that will never be used.


Listing 1. Dead Code

Copy Code code as follows:

int Timetoscalemyapp (Boolean endlessofresources) {
int rearchitect = 24;
int patchbyclustering = 15;
int usezing = 2;
if (endlessofresources)
return rearchitect + usezing;
Else
return usezing;
}

In the bytecode phase, if a variable is loaded but never used, the compiler can detect and eliminate the dead code, as shown in Listing 2. If you never perform this load operation, you can save CPU time and improve the execution speed of the program.

Listing 2. Optimized code

Copy Code code as follows:

int Timetoscalemyapp (Boolean endlessofresources) {
int rearchitect = 24; Unnecessary operation removed ...
int usezing = 2;
if (endlessofresources)
return rearchitect + usezing;
Else
return usezing;
}

Redundancy elimination is a kind of optimization method that is similar to removing duplicate instructions to improve application performance.

Many optimizations attempt to eliminate machine instruction-level jump instructions (such as the x86 architecture jmp). The jump instruction changes the instruction pointer register, thus transferring the program execution stream. This jump instruction is a resource-intensive command relative to other assembly directives. That is why we want to reduce or eliminate such directives. Code embedding is a very useful and well-known way to eliminate transfer instructions. Because of the high cost of executing a jump instruction, embedding some of the frequently called small methods into the function body will have many benefits. Listing 3-5 demonstrates the benefits of embedding.

Listing 3. Call method

Copy Code code as follows:

int whentoevaluatezing (int y) {return daysleft (y) + daysleft (0) + daysleft (y+1);}

Listing 4. The method being invoked
Copy Code code as follows:

int daysleft (int x) {if (x ==0) Return0; else return x-1;}

Listing 5. Inline method
Copy Code code as follows:

int whentoevaluatezing (int y) {
int temp = 0;
if (y ==0)
Temp +=0;
Else
temp = y-1;
if (0==0)
Temp +=0;
Else
Temp +=0-1;
if (y+1==0)
Temp +=0;
Else
Temp + = (y + 1)-1;
return temp;
}

As we can see in Listing 3-5, a small method is called three times in the body of another method, and we want to illustrate that the cost of embedding the invoked method directly into the code is less than the cost of executing the three jump instructions.

Embedding a method that is not often invoked may not make much difference, but embedding a so-called "hot" method (often called a method) can lead to a lot of performance improvements. Inline code can often be further optimized, as shown in Listing 6.


Listing 6. After code is embedded, further optimization is implemented

Copy Code code as follows:

int whentoevaluatezing (int y) {if (Y ==0) return y;   ElseIf (y ==-1) return y-1; Elsereturn y + y-1;}

Cycle optimization

Cyclic optimization plays an important role in reducing the additional consumption caused by the execution loop body. The extra consumption here refers to expensive jumps, a large number of conditional detections, and an optimized pipeline (that is, a series of instruction sets that have no actual operations and consume additional CPU cycles). There are many kinds of cyclic optimizations, followed by some of the more popular cycle optimizations:

Cyclic body Merging: The compiler attempts to merge the two loop bodies when the two adjacent loop bodies perform the same number of loops. If the two loop bodies are completely independent of each other, they can also be executed concurrently (in parallel).

Inversion loop: Most fundamentally, you use a do-while loop instead of a while loop. This do-while loop is placed in an if statement. This substitution reduces the two jump operations, but increases the conditional judgment, thus increasing the amount of code. This optimization is a great example of an appropriate increase in resource consumption for more efficient code-the compiler measures costs and benefits and makes decisions dynamically at run time.

Reorganization Cycle Body: the reorganization of the circulation body, so that the entire cycle of physical energy stored in the buffer.

Expand the loop body: Reduce the number of cycles and the number of jumps. You can think of this as "inline" a few iterations without having to perform conditional detection. Circular body expansion also poses a risk because it can degrade performance by affecting pipelining and a large number of redundant command extraction. Again, whether to expand the loop body is determined by the compiler at runtime, and it is worth expanding if it brings greater performance improvements.

The above is an overview of how the compiler can improve the performance of the application on the target platform at the bytecode level (or lower level). What we are talking about are some common, popular ways of optimizing. Because of the limited space, we only give some simple examples. Our aim is to stimulate your interest in further research on optimization through simple discussion above.

Conclusion: Reflection Point and emphasis

For different purposes, choose a different compiler.

1. An interpreter is the simplest form of translating bytecode into machine instructions. Its implementation is based on a command query table.
2. Compilers can be optimized based on performance counters, but need to consume some additional resources (code caching, tuning threads, etc.).
3. The client compiler can deliver a performance boost of 5 to 10 times times relative to the interpreter.
4. Server-side compilers can generate 30% to 50% performance improvements relative to the client compiler, but require more resources.
5. Multilayer compilation combines the advantages of both. Use client-side compilation to get faster response times, and then use the server end compiler to optimize those code that is frequently invoked.

There are a number of possible ways to optimize code. An important task of the compiler is to analyze all possible optimizations and then weigh the costs of the various optimizations against the performance improvements resulting from the resulting machine instructions.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.