Introduction
This article is the second article in the Java Performance Analysis tool series, the first article: Operating system Tools. In this article, you will learn more about Java applications and the JVM itself using built-in Java monitoring tools. There are many built-in tools in the JDK, including:
- Jcmd: Prints the class, thread, and virtual machine information for a Java process. Suitable for use in scripts. Use Jcmd-h to see how to use it.
- Jconsole: Provides a graphical display of JVM activity, including thread usage, class usage, and garbage collection (GC) information.
- Jhat: Helps analyze memory heap storage.
- Jmap: Provides JVM memory usage information for use in scripts.
- Jinfo: Accesses JVM System Properties and can modify these properties dynamically.
- Jstack: Provides thread stack information within the Java process.
- Jstat: Provides Java garbage collection and class loading information.
- JVISUALVM: Monitor JVM visualization tools, dissect running applications, and analyze JVM heap storage.
These tools are described in detail below based on functional partitioning.
Back to top of page
VM Basic Information
The JVM tool provides basic information about a running JVM process, such as elapsed time, JVM parameters in use, and JVM system properties.
- Uptime
JVM run time, Jcmd process_id vm.uptime
- System Properties
The system properties that can be obtained by system.getproperties () can also be obtained by using the following command:
Jcmd process_id vm.system_properties or Jinfo–sysprops process_id
These properties include all properties that are set through the command line-d option, the properties that the application dynamically adds, and the default properties of the JVM.
- JVM version
Obtained by Jcmd process_id vm.version.
- JVM command Line
The JVM command line can be found in VM summary in Jconsole, or through jcmd process_id vm.command_line command.
- JVM Tuning Parameters
Obtained by command jcmd process_id vm.flags [-all] command or all the tuning parameters that are in effect.
Working with tuning parameters (Tuning Flags)
Because of the variety of tuning parameters, it is necessary to use the JVM command line and JVM tuning parameters. The Command_line command allows you to obtain the tuning parameters specified on the command line, and the flags command obtains the tuning parameters set by the command and the tuning parameters of the JVM settings.
The Jcmd command allows you to obtain a tuning parameter that is in effect within a running JVM. The following command allows you to obtain a tuning parameter that is in effect within the specified platform.
Java other_options–xx:+printflagsfinal–version
We need to include the other options in this command, especially the GC-related tuning parameters. Part of the output of this command is shown below, and the colon in the first row indicates that the first row of tuning parameters is not using the default value, but is set in the following three ways:
- Set by command line
Other options change the value of this tuning parameter indirectly
- The JVM calculates the default value
The second line, because it does not contain a colon, indicates that the tuning parameter for this row is the default for the current JVM version, and the last column of product indicates that the value of the tuning parameter for this row is the same on different platforms, and PD product illustrates that the value of the tuning parameter for this row depends on the platform.
Uintx initialheapsize: = 4169431040 {Product}intx Inlinesmallcode = $ {PD product}
Other options for the last column:
Manageable: The value of this flag can dynamically change C2 diagnostic at run time: This flag provides help for engineers to understand how the compiler works.
The Jinfo command can view the value of a single flag by using the following command:
Jinfo-flag pringgcdetails Process_id–xx:+printgcdetails
- The following command allows you to set the manageable property of a flag to control whether it can be changed at run time:
Jinfo-flag-printgcdetails process_id # Close Printgcdetails's Manageable property
Although the Jinfo command can change the value of any flag, it is not certain that the JVM will accept these changes. For example, many flags that affect the execution of garbage collection algorithms are set at the start of the JVM, and changing the value of flag through the Jinfo command during JVM operation does not affect algorithm execution. All this command only works on flags that are manageable true.
Thread Information
The Jconsole and JVISUALVM commands help developers to profile information about the threads in the application's running process. The Jstack process_id command allows you to view the run-time stack information for a thread, which can be used to determine whether the current thread is blocked. The same result can be obtained by command Jcmd process_id Thread.print.
Class information
The Jconsole and Jstat commands allow you to obtain information about all classes in the application's run, and the Jstat command also provides information about class compilation.
Garbage Collection Information
Jconsole shows how the JVM heap is used, and the dynamic graphs it draws can help developers understand the internals of the heap. Jcmd supports garbage collection operations. Jmap provides an overview of the heap information. Jstat shows how garbage collection works from a different perspective.
Post-processing of Heap Dump files
The Heap Dump file can be obtained through the JVISUALVM user interface, which is also available through Jcmd and Jmap. The heap Dump file is a snapshot of the heap, typically using JVISUALVM and jhat to parse the snapshot.
Back to top of page
Performance analysis Tools
The performance analyzers provided by Java are the most important analysis tools. It has a wide variety of strengths, and using different analyzers may find different problems when analyzing the same application. The use of the process needs to take the director, so as to the application of a comprehensive analysis.
Basically all Java performance analyzers are implemented in Java, and the socket is used to communicate with the analyzed application to get the running information of the application being analyzed. It is important to note that the Performance Analyzer's own performance needs to be focused while tuning the profiling application using the Profiling tool. If the application being analyzed generates very large amounts of information and sends it to the Performance analyzer, the analysis will not work if the Performance Analyzer does not have the space to adequately manage the efficient memory heap to handle this information. Using the parallel garbage collection algorithm for memory management is a popular practice for performance analyzers, which minimizes the possibility of memory overflow.
The performance analysis is divided into sampling mode and detection mode. The two modes are described below.
Sampling analysis
Sampling mode is the most common pattern in performance analysis, which is important because it minimizes the impact on the application being analyzed. Only when the performance analysis process minimizes the impact of the application can you obtain valuable performance analysis results.
In the sampling analysis mode, the parser is timed to trigger the work. During the work cycle, the parser examines each thread in turn and records the methods that are running in the thread, and in some specific scenarios, sampling analysis often results in incorrect analysis. For example, in Figure 1, a thread alternately executes method A and method B over a period of time, each time the parser is triggered to work, the thread executes method B, and the parser considers that the thread is executing method b all the time, but this is not the case, when the thread executes method a much longer than when it executes method b is not sampled by the analyzer.
Figure 1. A and B example graphs for the alternate execution of a time segment inline
This is the most common error in the sampling mode, and by increasing the sampling time interval of the sample Analyzer helps us to effectively reduce the occurrence of such errors because the time interval is too small often increases the performance impact of the sample analyzer on the application being analyzed, resulting in distortion of the analysis results. Therefore, the time interval needs to be determined according to the characteristics of the analyzed application through many experiments and experience, and the tradeoff is too large or too small to be set after the impact.
Figure 2. Sample Pattern Analysis Example diagram
Figure 2 shows the results of an application server GlassFish startup process using the sampling mode. As you can see, the method DefineClass1 () uses 19% of the time, followed by the method Getpackagesourceinternal (), which takes 10% of the time. Classes defined in Java applications can affect performance during application startup, and in order to increase the speed of application startup, you must increase the speed of class loading to achieve the goal of increasing startup speed. We may mistakenly think that the way to improve performance is defineClass1 (), but DefineClass1 () is actually a method in the JDK, and we can't improve its performance by rewriting the JVM. Even if you override this method to optimize its execution time to 60% of the original time, it can only reduce the overall run time of 10% applications, which is obviously not worth the candle.
Detection and analysis
Compared to the sampling mode, the detection mode is to invade the inside of the application being analyzed, although it is not efficient and friendly, but it can obtain very valuable information. Figure 3 analyzes the results of the same application server GlassFish using the same profiling tool's detection mode.
Figure 3. Sample test Analysis Diagram
The following information is available in the diagram:
- The most time-consuming method is getpackagesourcesinternal (), which takes up to 13%, not 4% in the sampling mode;
- Method DefineClaass1 () does not appear in the analysis results.
- The analysis results include the number of times each method executes and the average time-consuming.
The information in these analysis results is very helpful in discovering more time-consuming code. In this example, although the method Immutablemap.get () consumes 12% of the time, it is called 4.7 million times more. If the number of calls to this method is reduced, the performance of the application will be greatly improved.
The instrumentation parser obtains application run data by changing its bytecode order when the class is loaded, such as by increasing the number of times the record method is called. This approach has a greater impact on the performance of the application itself than in the sampling mode. For example, the JVM will inline a method with a small method body based on the code block size of the method, so that method calls are not made when the inline method executes. After the detection Analyzer has added its code to the inline method, this method is magnified because the method body is too large to be inline with the JVM. Inline is just one example, and when more and more code is changed, the probability of the results being distorted is quite large.
The reason that the method Immutablemap.get () does not appear in the sample mode analysis results is the presence of a security point (SafePoint). The sample parser parses a thread only if it obtains more memory than the security point. The method Immutablemap.get () does not appear in the results because the thread has not reached the security point. When using the sampling mode security point is too high, it underestimates the performance impact of some methods.
In this case, both sampling analysis and detection analysis can find that the performance bottleneck of the application is the loading and parsing of the class. In practice, however, it is not possible for different analyzers to produce exactly the same analysis results. Analyzers are good at sizing, but also just sizing, some errors or even errors are unavoidable, so in the performance analysis process also requires us to more flexible use of analyzers.
Timeline for blocking methods and threads
4 shows the results of using NetBeans Profiler (another detection analyzer) to analyze the application server GlassFish boot process. In this result, the method Park (), Parknanos (), and read () occupy most of the application run time. These methods are blocked and do not consume the CPU, so these times should not be counted when calculating the CPU usage of the app. The thread in the app does not use 632 seconds to execute the Parknanos () method, but instead waits 632 seconds for the other operation to complete. The park () and read () methods are similar to this.
Figure 4. NetBeans Test Analysis Sample diagram
As a result, most analyzers do not count blocked methods and idle threads in the results. In NetBeans, you can set the analysis results to include all the methods, so in this case the methods are counted into the results. In this example, the thread that executes the park () method is in the server thread pool, which processes the request when the server receives the request. When there is no request, these threads are in a blocking state and wait until the new request does not occupy the CPU. This is the normal state of the application server.
The vast majority of Java-based parsers can provide filter functionality to view or hide the time of blocked method calls, which can be used if needed. In general, it is more helpful to view the health of a thread than to view the blocking time of a blocked method.
Figure 5. Running a sample diagram of the threads in Oracle Solaris Studio
Figure 5 is the operation of a thread in Oracle Solaris Studio. Each horizontal area represents a different thread, so there are two threads (1.3 and 1.2). Columns of different colors represent different methods of execution; The white space indicates that the thread did not execute any methods. In general, thread 1.2 executes a piece of code first and then waits for thread 1.3 to complete execution, and thread 1.3 waits for thread 1.2 to execute another piece of code after it finishes execution. Deep down, you can see how these threads interact.
There are some empty areas in the diagram that are not thread-executed, because the graph shows only two of them, so in that white space there are two threads in the diagram waiting for the other threads to finish.
Local parser
A local parser is a tool used to analyze the JVM itself. The local parser allows you to observe what the JVM is doing or see if an application contains a local library of the JVM, or you can observe the inside of the code. Any local parser can parse the JVM implemented using the C language (including all local libraries), but some local analytics cannot parse applications that use Java and C + + implementations.
Figure 6. Sample map for local parser analysis
Figure 6 shows the results of profiling the GlassFish boot process using the Oracle Solaris Studio parser. Oracle Solaris Studio is a local parser that can parse both Java and C + +. It can be found that the CPU time consumed by the application is 25.1 seconds. Where Jvm-system consumes 20 seconds, including the JVM compiler thread, the garbage collection thread, and some worker threads. Because there is a lot of code to compile during startup, the JVM compiler thread consumes most of the time, and the garbage collection thread consumes only a small amount of time.
With the local parser, we can not only analyze and optimize the JVM's own functions, but more importantly, we can get the time of the application for garbage collection. In the Java analysis tool, information about the garbage collection thread is not available.
After analyzing the JVM native code, we will analyze the startup process of the application. 7, following the sampling mode analysis, the method Defineclass1 () is again analyzed as the most time-consuming method. It is worth noting that in the analysis results again, extracting the jar file is a relatively time-consuming process. These methods are used in class loading, so it proves that the direction of optimization is correct. Because native code referenced in the Java Zip library is called as a blocking method in other analysis tools, this method is not found in the various tools above.
Figure 7. Sample Pattern Analysis Example diagram
Regardless of the performance analysis tools used, it is important to familiarize yourself with the strengths and weaknesses of each tool. This can be used to complement each other. Developers must learn how to use performance analyzers to find performance bottlenecks, find code that needs to be optimized, rather than simply focus on the most time-consuming individual methods.
Back to top of page
Summarize
Sampling-based performance analysis is one of the most common, because its relative ability to do analysis is limited, or the analysis process can gather information is summarized, often does not really represent the operation of the internal application, but the analysis process of the introduction of the workload is usually low. Different sampling analysis tool behavior is different, make full use of its advantages, do targeted analysis is the most meaningful.
Detection analysis can get a lot of information about the inside of the application, but the preparatory work is often very large. The method of detection and analysis should be applied in a section of code, or in a few classes, packages. This approach in fact limits the overall application performance analysis, only suitable for use in the program unit, point-to-point, strong-targeted analysis, the use of detection and analysis, more time to ask developers to clearly know where the potential for performance bottlenecks.
Thread blocking is not necessarily the result of code writing, and when a thread is blocked, more advice is to think about why it is blocked instead of looking directly at the code. Try to use thread execution time-axis analysis methods.
Local analysis provides the ability to drill down inside the JVM while also viewing application code execution.
If the local analysis shows a large amount of CPU resources being used during GC, then tuning the collector is necessary. It is necessary to remind everyone that the compilation thread does not usually affect the performance of the application.
Java Performance analysis Tools, part 2nd: Java built-in monitoring tools