Turn: What is instant compilation (JIT )!? OpenJDK HotSpot VM analysis, jitopenjdk

Source: Internet
Author: User

Turn: What is instant compilation (JIT )!? OpenJDK HotSpot VM analysis, jitopenjdk
Key Points

  • Applications can select an appropriate instant compiler to optimize performance close to machines.
  • Hierarchical compilation consists of five layers of compilation.
  • Hierarchical compilation provides excellent Startup Performance and provides guidance on the compilation of the next compiler to provide high performance optimization.
  • Provides the JVM switch for real-time compilation of relevant diagnostic information.
  • Optimization such as inner join and vectoring further enhances performance.

OpenJDK HotSpot Java Virtual Machine is affectionately called a Java Virtual Machine or JVM. It consists of two main components: execution engine and runtime. JVM and Java APIs form a Java Runtime Environment, also known as JRE.

In this article, we will discuss the execution engine, especially instant compilation, and runtime Optimization of OpenJDK HotSpot VM.

JVM execution engine and Runtime

The execution engine consists of two main components: the Garbage Collector (which recycles garbage objects and provides automatic memory or heap management) and the real-time Compiler (which converts bytecode into executable machine code ). In OpenJDK 8, "hierarchical compiler" is the default server compiler. HotSpot can also disable the hierarchical Compiler (-XX:-TieredCompilation) to still select a non-hierarchical server Compiler (also known as "C2 "). Next we will learn more about these compilers.

The JVM runtime controls class loading, bytecode verification, and other important functions listed below. One of the functions is "explanation", which we will discuss in depth immediately. You can click here to learn more about the JVM runtime.

Related vendor content

Through probe technology, Java applications can be self-protected. New Java, and distributed storage practices for Container service in the new future. The Distributed Relational database architecture explores performance micro-innovation in internet finance!

Sponsors

  • The server or C2 compiler has a high compilation critical value of 10000, which helps to generate highly optimized code for key performance methods, these methods are determined by the key execution path of the application to determine whether they are a key performance method.
  • Five layers of hierarchical Compilation

    By introducing hierarchical compilation, OpenJDK HotSpot VM users can improve the startup time by using the server compiler. Hierarchical compilation has five layers. Starting at Layer 2 (Interpretation layer), the instrument provides information on key performance methods at this layer. It will soon reach the 1st layer, a simple C1 (client) compiler that optimizes this code. There is no information about performance optimization at the first layer. The following comes to the 2nd layer. Only a few methods have been compiled here (the client compiler is used ). On the 2nd layer, collect performance analysis information for these few methods for entry times and cyclic branches. The 3rd layer will see all the methods compiled by the client compiler and all their performance optimization information. The final 4th layer is only valid for C2 itself and is the server compiler.

    Effects of hierarchical compiler and code Cache

    When the client is used for compilation (before layer 2nd), the code is optimized through the client compiler during startup, and the key execution path is kept preheated. This helps to generate better performance optimization information than interpreted code. The compiled code is stored in a cache called "code cache. The Code cache has a fixed size. If it is full, the JVM stops the method compilation.

    Hierarchical compilation can set its own critical values for each layer, such as-XX: Tier3MinInvocationThreshold,-XX: Tier3CompileThreshold,-XX: Tier3BackEdgeThreshold. The minimum call threshold for the third layer is 100. The critical value of a non-hierarchical C1 is 1500. In comparison, you will find that hierarchical compilation occurs very frequently and more performance analysis information is generated for client-side compilation methods. Therefore, the Code cache used for hierarchical compilation must be much larger than the code cache used for non-hierarchical compilation. Therefore, the default size of the Code cache used for hierarchical compilation in OpenJDK is 240 MB, the default cache size for non-hierarchical code is 48 MB.

    If the code cache is full, JVM will provide a warning mark to encourage you to use the-XX: ReservedCodeCacheSize option to increase the size of the Code cache.

    Understand Compilation

    OpenJDK HotSpot VM provides a very useful command line option, called-XX: + PrintCompilation, to visualize when the Code cache is full, and when compilation stops.

    Example:

    567  693 % !   3       org.h2.command.dml.Insert::insertRows @ 76 (513 bytes)656  797  n    0       java.lang.Object::clone (native)  779  835  s           4       java.lang.StringBuffer::append (13 bytes)

    The output format above is:

    timestamp compilation-id flags tiered-compilation-level class:method <@ osr_bci> code-size <deoptimization>

    Here,

    Timestamp (timestamp) is the time from when JVM starts.

    Compilation-id (compiler id) is the internal reference id

    Flags can be one of the following:

    %: Is_osr_method (whether the osr method @ indicates the bytecode for the OSR method)

    S: is_synchronized (synchronous or not)

    ! : Has_exception_handler (with exception processor)

    B: is_blocking (blocking or not)

    N: is_native (native or not)

    Tiered-compilation (hierarchical compiler) indicates the compilation layer when hierarchical compilation is enabled.

    Method indicates the class name and Method class name in the following format:

    @ Osr_bci (osr bytecode index) is the bytecode index in OSR

    Code-size (code size) total size of bytecode

    Deoptimization (Inverse Optimization) indicates whether a method is Inverse Optimization and will not be called or botnets (for more details, see the "Dynamic Inverse Optimization" section ).

    Based on the above keywords, we can determine the first line in the example.

    567  693 % !  3  org.h2.command.dml.Insert::insertRows @ 76 (513 bytes)

    The timestamp of is 567, and the compilation-ide is 693. This method has a "!" Indicates the exception processor. We can also conclude that the hierarchical compilation is at Layer 3rd, which is an OSR method (identified by "%") and the bytecode index is 76. The total bytecode size is 513 bytes. Note that the size of the 513 bytes is the size of the bytecode rather than the size of the compilation code.

    The following figure shows the 2nd rows of the sample:

    656  797  n 0 java.lang.Object::clone (native) 

    JVM makes it easier to call a native method. The 3rd rows are:

    779  835  s 4 java.lang.StringBuffer::append (13 bytes)

    It is shown that this method is compiled at Layer 3 and synchronized.

    Dynamic Inverse Optimization

    We know that Java performs dynamic class loading, and JVM checks internal dependencies for each dynamic class loading. When you no longer need a previously optimized method, OpenJDK HotSpot VM will execute Dynamic Inverse Optimization for this method. Adaptive Optimization is helpful for Dynamic Inverse Optimization. In other words, a code with Dynamic Inverse Optimization should be restored to the previous compilation layer or transferred to the new compilation layer, as shown in. (Note: The following information is output when PrintCompilation is enabled in the command line ):

     573  704 2 org.h2.table.Table::fireAfterRow (17 bytes)7963 2223 4 org.h2.table.Table::fireAfterRow (17 bytes)7964  704 2 org.h2.table.Table::fireAfterRow (17 bytes) made not entrant33547 704 2 org.h2.table.Table::fireAfterRow (17 bytes) made zombie

    The output shows that the timestamp is 7963, and fireAfterRow is compiled at Layer 2. Later, the timestamp is 7964, And the fireAfterRow compiled in layer 2nd is not entered. After a while, fireAfterRow is marked as a zombie, that is, the previous code is recycled.

    Understand inline

    One of the biggest advantages of adaptive optimization is the ability to inline key methods of performance. By replacing the call with the actual method body, this helps avoid the indirect overhead of calling these key methods. For inline, there are many "coordination" Options Based on scale and call threshold values, inline has been fully researched and optimized, and almost has found the maximum potential.

    If you want to take a look at the inline decision, you can use a JVM diagnostic option called-XX: + PrintInlining. When understanding decisionsPrintInliningWill provide a lot of help, the example is as follows:

    @ 76 java.util.zip.Inflater::setInput (74 bytes) too big@ 80 java.io.BufferedInputStream::getBufIfOpen (21 bytes) inline (hot)@ 91 java.lang.System::arraycopy (0 bytes)   (intrinsic)@ 2  java.lang.ClassLoader::checkName (43 bytes) callee is too large

    Here you can see the position of the inline and the total number of bytes to be inline. Sometimes you see a tag like "too big" or "callee is too large", which indicates that no inline is performed because it has exceeded the critical value. The output information in Row 3 shows an "intrinsic" label. Let's take a closer look at intrinsics (internal functions) in the next section ).

    Internal functions

    Usually OpenJDK HotSpot VM instant compiler will execute code generated for key performance methods, but sometimes some methods have very common patterns, such as java. lang. System: arraycopy, as shown in the previous sectionPrintInliningOutput result. These methods can be manually optimized to provide better performance. The optimized code is similar to having your native method, but there is no indirect overhead. These internal functions can be efficiently inline, just like the general JVM inline methods.

    Vectoring

    When discussing internal functions, I like to emphasize a common compilation optimization, that is, vectorization. Vectorization can be used on any potential platform (processor) to process special parallel computing or vector commands, such as "SIMD" commands (single instructions, multiple data ). SIMD and "vectoring" Help to perform parallel operations on the data layer on a large cache row size (64 bytes.

    HotSpot VM provides vector support at two different levels:

    In the first case, the pile in the internal cycle can provide vector support for the internal cycle, and the internal cycle can be optimized and replaced by vector instructions. This is similar to internal functions.

    In HotSpot VM, the theoretical basis supported by the SLP is a paper by the MIT lab. Currently, HotSpot VM only optimizes the target array with fixed expansion times. Vladimir Kozlov, as an example, is a senior member of the Oracle compilation team and has made outstanding contributions to various compiler optimizations, this includes support for automatic vectoring.

    A [j] = B + c * z [I]

    The code above can be automatically vectorized after being expanded.

    Escape Analysis

    Escape analysis is an additional benefit of adaptive optimization. To determine whether any memory allocation is "escape", the escape analysis (EA) considers the entire intermediate diagram. That is to say, whether any memory allocation is not in one of the following:

    If the allocated object is not escaped, the compilation method and object are not passed as parameters, the memory allocation can be removed, and the value of this field can be stored in registers. If the allocated object does not escape the compiled method but is passed as a parameter, JVM can still remove the lock associated with the object, when you use it to compare other objects, you can use the optimized comparison command.

    Other common Optimizations

    Some other OpenJDK HotSpot VM optimizations brought about by adaptive instant compilers:

    1. Expand loop -- expand the loop to reduce the number of iterations. This enables the JVM to apply other common optimizations (such as cyclic vectorization) wherever necessary.

    Reference: http://www.infoq.com/cn/articles/OpenJDK-HotSpot-What-the-JIT

     

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.