[Java] System-level performance monitoring and optimization

Last Update:2015-10-22 Source: Internet

Author: User

Tags apm oracle solaris

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

for Java performance More concerned about the students probably know the "Java performance" this book, in general, many students in the daily writing Java Code is seldom to care about performance issues, but in the process of writing Code we must take into account the impact of performance on the program. As small as we use bit operations to achieve arithmetic operations, as large as our overall architecture of Java code design, "performance" is actually very close to us. This article mainly mentions several points, hoped can have the inspiration to everybody.

for performance tuning, we typically need to go through the following three steps: 1, performance monitoring, 2, profiling, 3, performance tuning

as an APM manufacturer with a leading technical level in China,AI products from one APMprovides a very good indicator for Java application performance optimization:

Performance Monitoring: Impact Java performance Multidimensional metrics monitoring

Performance Profiling:Application Performance Analysis

Performance Tuning: Optimize application by analyzing the root causes of application performance problems;

our performance concerns for the operating system are mainly at the following points: CPU utilization, CPU scheduling execution queue, memory utilization, network I/O, disk I/O.

1.CPU Utilization

for an application to achieve the best performance and scalability of the application, we not only take advantage of the parts available in the CPU cycle, but also make the use of this part of the CPU more valuable, not wasteful. The ability to make the cycles of the CPU more fully operational on multi-processor and multi-core systems is challenging for multi-threaded applications. In addition, when the CPU is saturated, it does not show that the performance and scalability of the CPU has reached its optimal state.

to differentiate how the application utilizes CPU resources, we must detect it from the operating system level. On many operating systems, the CPU utilization Statistics report typically includes the use of the operating system by the user and the system or the kernel. The user's use of the CPU refers to the time it takes for the app to execute the application code execution. In contrast, the CPU usage of the kernel and system refers to the amount of time that the application uses to execute the operating system kernel code lock. High kernel or system CPU usage can indicate a tight sharing resource, or a large number of I/O device interactions. Ideal state in order to improve the performance and scalability of the application, let the kernel or system CPU time be 0%, because the time spent executing kernel or system code can be used to execute the application code. So one of the right directions for CPU usage optimization is to minimize the time that the CPU spends executing kernel code or system code.

for compute-intensive applications, performance monitoring is deeper than monitoring user CPU usage and kernel or system CPU usage, and in compute-intensive applications we need to monitor the number of execution execution bars (instructions per CLOCK;IPC) in the CPU clock cycle, or each CPU cycles (cycles per INSTRUCTION;CPI) used by CPUs. For compute-intensive applications, it is a good idea to monitor the CPU from these two dimensions, because the packaged CPU performance reporting tool for modern operating systems typically only prints CPU utilization, and does not print the CPU time spent executing instructions in CPU cycles. This means that when the CPU is waiting for data in memory, the operating system CPU performance Reporting tool also considers the CPU to be in use, and we call this scenario "stall", a scenario that often occurs, such as when the CPU is executing instructions, The "stall" scenario occurs whenever the data required by the instruction is not ready, that is, not in the register or in the CPU cache.

when the "stall" scene occurs, the CPU wastes the clock cycle because the CPU must wait for the data required by the instruction to reach the register or buffer. And in this scenario, it's normal for the hundreds of CPU clock cycles to be wasted, so in compute-intensive applications, the strategy to improve performance is to reduce the occurrence of "stall" scenarios or to increase the CPU's cache usage so that fewer CPU cycles are wasted waiting for data. This type of performance monitoring knowledge is beyond the content of this book and requires the help of a performance expert. However, the profiling tool, described later in Oracle Solaris Studio performance Analyzer, will include this kind of data.

2.CPU Dispatch Queue

In addition to monitoring CPU usage, we can also check whether the system is fully loaded by monitoring the CPU execution queue. The execution queue is used to store lightweight processes that are usually ready to execute but are waiting for CPU scheduling while waiting in the dispatch queue, and the dispatch queue will be generated when the lightweight process does not have more time to process the current processor. The relatively deep CPU scheduling queue indicates that the system is fully loaded. The execution queue depth of the system equals the number of waits not performed by the virtual processor, which equals the number of hardware threads of the system. We can use the JAVA API to get the number of virtual processors.

runtime.avaliableprocessors (). When the execution queue depth is four times times or more than the number of virtual processors, the operating system will be unresponsive.

a general guideline for the detection of CPU dispatch queues is to notice when we find that the queue depth is higher than the number of virtual processes, but there is no need to take immediate action. When it is greater than three times times or four times times or higher, attention must be paid to solve the problem urgently.

There are usually two optional ways to observe the depth of the queue, the first one is to load or reduce the load on the existing CPU by increasing the CPU. This approach essentially reduces the number of load threads per execution unit, thus reducing the depth of execution queues.

another way is to increase CPU usage by dissecting the applications that run the system, or by looking for a CPU cycle that can reduce the amount of garbage collected, or to find a better algorithm to execute CPU instructions with less CPU cycles. Performance experts typically focus on one of the following approaches: reducing the execution path length of code and better CPU instruction selection. Java programmers can improve the efficiency of code execution by better execution of algorithms and data structures.

3. Memory Utilization

in fact, in addition to CPU usage, the memory properties of the system also need to be monitored, such as paging, switching, locking, multi-threaded context exchange and so on.

switching typically occurs when the application requires more memory than the actual physical memory, and the operating system typically configures a corresponding area called the swap area. Swap areas are typically located on physical disks, and when the application runs out of physical memory, the operating system temporarily swaps a portion of the memory data to disk space, which is typically the least frequently accessed area, without affecting the comparison " A memory area where the memory that is swapped to the disk area is accessed by the application, and this time it needs to be read into memory from the disk swap area, which affects the performance of the application.

The garbage collector for a virtual machine performs poorly in exchange because most of the areas that the garbage collector accesses are unreachable, which means that the garbage collector causes the exchange activity to occur. The scene is dramatic, if the garbage collected heap area has been swapped to disk space, this time will be in the page exchange, so that can be scanned by the garbage collector, in the exchange process will dramatically trigger the garbage collector collection time extended, this time if the garbage collector is "stop the world"(which causes the application to stop responding), the time will be extended.

4. Network I/O

the performance and scalability of distributed Java applications can be limited by network bandwidth and network performance. For example, if we send more packets to the network interface than he can handle, the packets will accumulate in the buffer area of the operating system, which will cause the application delay, and other conditions will cause the delay of the network application.

differentiated and monitored tools are often difficult to find in the packaging tools of the operating system. Although Linux provides the Netstat command, both Linux and Solaris provide the implementation of network usage, and they all provide statistics that include packets per second, packet, packet error, conflict, and so on. In Ethernet, a small amount of packet collisions is a normal phenomenon. If the wrong package is more than that, it might be a problem with the NIC. At the same time, although Netstat can count the network interface to send and receive data, it is difficult to determine whether the NIC is fully utilized. For example, if Netstat-i shows that there are now 2,500 packets sent from the network card per second, but we still cannot tell whether the current network utilization is 100% or 1%, we can only know the current traffic. This is only a conclusion that can be obtained without knowing the size of the network packet. Simply put, we can't tell if the current network is impacting performance with Linux and Solaris-provided Netstat. We need some other tools to monitor the network as our Java application runs.

5. Disk I/O

If the application has operations on the disk, we need to monitor the disk to monitor possible disk performance issues. Some applications are I/O intensive, such as databases. The use of disks usually also exists in the Application log system, which is usually used to record important information during the operation of the system.

[Java] System-level performance monitoring and optimization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More