Java performance analysis tools, Part 1: Operating System Tools, java operating systems

Source: Internet
Author: User

Java performance analysis tools, Part 1: Operating System Tools, java operating systems
Introduction

The premise of performance analysis is that the running status of the application and the running environment of the application are displayed more directly in a visual manner. How can we achieve this visual display? We need to use the Integrated Program monitoring tool in the operating system and the built-in monitoring and analysis tool in Java to analyze the performance of Java programs. This article is a series of articles, three of which describe these tools respectively. This article describes the performance monitoring tools in the operating system.

The program performance monitoring tool in the operating system is not only applicable to Java programs, but also to all programs running. In UNIX-based operating systems, many command line tools can be used to monitor the running status of programs, such as sar, vmstat, iostat, and prstat. In Windows, there are both graphical user interface resource Monitor Perfmon (Performance Monitor) and typeperf command line tools.

For performance testing, we need to use the tools provided by the operating system to collect various types of resource monitoring data in the operating system, including CPU, memory, and hard disk usage data. If the tested program uses the network, you also need to collect network usage data. Only when the collected data is complete and sufficient can the performance test results be more accurate and performance analysis be easier. The following describes how to monitor and analyze resources in UNIX/UNIX-like systems.

CPU usage monitoring for Linux system resources

CPU usage Time can be divided into two types: User Time and System Time. System Time is called Privileged Time in Windows ). The user time is the time when the CPU executes the application code, while the system time is the time ratio when the CPU executes the kernel code of the operating system. The system time is related to the application itself. For example, when an application performs an I/O operation, the operating system kernel will execute the code to read files from the hard disk, or execute the code that writes data to the network data cache. Any behavior in the application that requires the use of the underlying resources of the operating system will cause the application to occupy more system time.

The ultimate goal of performance tuning is to maximize CPU usage per unit time. CPU usage is the average value within a specific interval, which can be 30 seconds or 1 second. For example, a program needs to be executed in 10 minutes, during which the CPU usage of the program is 50%. After the program code is optimized, the CPU usage is increased to 100%, and the performance of the program will be doubled, which takes 5 minutes to complete. When the program optimizes the code again using two CPUs and the CPU usage is still 100%, the program will only take 2.5 minutes to complete. From this example, we can see that the CPU usage reflects the CPU usage efficiency of the program. The higher the CPU usage, the better the program performance, and vice versa.

Run the vmstat 5 command in the Linux operating system to obtain the data in Listing 1 (add a line every five seconds ). For ease of understanding, the program in this example only runs in a single thread and is also applicable in a multi-threaded environment. The first line of data in the sample data shows that the CPU usage is 2.25 seconds (5 * (37% + 8%) in five seconds )), 37% of the time is used to execute user code, and 8% of the time is used to execute system code. The remaining 2.75 seconds of CPU is idle (idle ).

Listing 1. vmstat 5 command results
 procs -----------memory--------------- ----swap---- ---io--- -----system------ ----------CPU------- r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 236456 2259632 200052 730348 0 0 1 6 1 1 37 8 55 0 0 2 0 236456 2259624 200052 730348 0 0 0 10 179 332 40 7 53 0 0 2 0 236456 2259624 200052 730348 0 0 0 20 180 356 56 7 37 0 0

The CPU is idle due to the following three reasons:

  1. The application is blocked by the synchronization operation of the thread until the lock is released;

  2. Applications are waiting for the response of certain requests, such as the response of data query requests;

  3. Applications have nothing to do;

The first two cases are easy to understand and have corresponding tuning methods. For cause 1, if the lock competition can be reduced or the database returns the requested resource performance, the application runs faster. For cause 2, the request responder is optimized, improves the response speed. When other conditions remain unchanged, the application runs faster and the CPU usage increases.

In the third case, when the application has something to do, the CPU will use the CPU cycle to execute the application code. This is a general rule. When you execute an infinite loop code (as shown below), it consumes another 100% of the CPU time. If the CPU usage does not reach 100%, it means that the operating system should execute an infinite loop, but it is not done but idle. This situation does not have much impact on infinite loops, but if our program is used to calculate the results of an expression, this situation will lead to slower computing speed.

Listing 2. Infinite Loop example
#!/bin/bashwhile truedo echo“In the loop…”done

When you run the code in Listing 2 on a single-core machine, most of the time we don't notice that it is running. However, if you start another program or monitor the performance of another program, this effect will be reflected. The operating system is good at using time slice programs to compete for the CPU cycle, but the latest programs can only obtain a very small number of available CPU cycles. There is a solution to the problem, that is, set aside a certain percentage of idle CPU cycles to prevent other programs from using the CPU. However, the problem exposed by this solution is that the operating system cannot know the next operation. The operating system can only perform all the current operations without setting aside idle CPU cycles.

Java and single CPU usage

Let's go back to the Java application. What does periodic idle CPU mean? This depends on the type of the application. For batch processing programs with a fixed workload, the CPU does not have idle time unless all the jobs are completed. Increasing CPU usage can make batch processing faster. If the CPU usage has reached 100%, We can optimize the program from other aspects while keeping the CPU usage 100% faster.

For applications of the server type that receive requests, when no request arrives, the CPU is idle. For example, when the Web server finishes processing all the current HTTP requests and is waiting for the next request, the CPU is idle. Here we can understand why the CPU usage is an average value within a certain period of time. The data in the above vmstat example is collected from the running process of an application server. The server receives a request every five seconds, which takes 2.25 seconds to process, this means that the CPU usage is 2.25 In these 100% seconds, and the usage is 0 in the remaining 2.75 seconds. The calculated CPU usage is 45%.

This situation always happens within a very short interval, so it is difficult to find it, but such a program similar to the Application Server Always runs in this way. When we reduce the interval, the above-mentioned application service takes 2.5 seconds to process a request every 1.125 seconds, and the remaining 1.375 seconds of CPU is idle. On average, the average CPU usage is still 45%, and 55% of the time is idle.

After the application server is optimized, it takes only 2 seconds to process each request, and the CPU usage will be reduced to 40%. Reducing CPU usage is the goal of optimizing program code. The application load is fixed only when no external resource constraints are imposed within the unit time. On the other hand, optimizing such applications can appropriately increase program loads to improve CPU usage. As a result, we can see that this optimization strategy still follows the rules above, that is, to make the CPU usage as high as possible in the shortest time.

Java and multi-CPU usage

The goal of optimizing multi-threaded programs is to increase the usage of each CPU as much as possible, so that the CPU is blocked as little as possible. In multi-core and multi-threaded environments, when the CPU is idle, you need to consider that the CPU is still idle even if the application has unfinished jobs, because there is no thread available in the application to process the job. The most typical example is a job with a fixed thread pool where the number of application programs changes. Each thread can only process one task at a time. If this thread is blocked by some operations, this thread cannot process another task. In this case, there will be no threads available to process unfinished tasks. Therefore, the CPU is idle. In this case, we should consider how to increase the thread pool size to complete more tasks.

Monitoring CPU usage is only the first step to understand the application performance. It can only determine whether the CPU usage of the Code meets the developers' expectations, or find the synchronization and resource problems in the code.

CPU Run Queue

In Windows and UNIX systems, You can monitor the number of threads that can execute tasks. In UNIX systems, this is called Run Queue. Many tools can find this data. For example, in vmstat, the first number in each line is the length of the Run Queue. In Windows, Processor Queue can be found through the typeperf command.

The difference between Windows and UNIX is that in UNIX, the Run Queue length is the number of currently running and runable threads, so the minimum length is 1. in Windows, the length of the Processor Queue does not include the number of running threads. Therefore, the minimum length of the Processor Queue is 0.

When the number of available threads exceeds the number of available CPUs, the performance will decrease. Therefore, in Windows, the Processor Queue length is 0, and in UNIX, when the Run Queue length is equal to the number of CPUs, the performance is the best. However, this is not absolute, because the system program runs cyclically, which leads to an increase in the number, but does not affect the application. If the Run Queue length is much longer than the number of CPUs, it indicates that the server load is too large. You should appropriately reduce the workload of the current machine.

Hard disk usage

There are two important goals for monitoring hard disk usage. One is the application itself. If the application performs many hard disk I/O operations, it is easy to infer that the performance bottleneck of the application is I/O.

You need to perform detailed monitoring to find out that the application's performance bottleneck lies in I/O. When the application does not efficiently use the cache for hard disk write operations, the hard disk I/O data will be very low. However, when the I/O operations performed by the application exceed the number of hard disk operations, the hard disk I/O data will be very high. Both cases require optimization.

Run the iostat-xm 5 command in Linux to obtain the data in listing 3:

Listing 3. iostat-xm 5 command result 1
 avg-CPU: %user %nice %system %iowait %steal %idle 18.20 0.00 40.20 0.00 0.00 51.60 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s  sda 0.00 0.20 0.00 34.60 0.10 0.23 avgrq-sz avgqu-sz await svctm %util  8.35 0.00 5.04 0.04 2.02

The application writes data to the hard disk sda. It seems that the hard disk write time is good. The wait time for each write (await) is 5.04 milliseconds, and the hard disk usage is only 2.02%. However, taking a closer look, it takes 40.2% of the time for the system to execute kernel code, which means there are inefficient write operations in the application. The system writes 34.60 (w/s) times per second, but only 0.23 MB (wMB/s) of data is written. It can be determined that I/O is the performance bottleneck of the application. The next step is to analyze how the application performs write operations.

Looking at another set of data (listing 4), the hard disk usage (% util) reached 100%, and the waiting time for the hard disk accounted for 49.81% (% iowait). The application writes 60.45 mb of data per second, the data proves that I/O is the performance bottleneck of the application, and so many I/O operations must be reduced.

Listing 4. iostat-xm 5 command result 2
 avg-CPU: %user %nice %system %iowait %steal %idle 40.20 0.00 5.70 49.81 0.00 54.10 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s  sda 0.00 0.20 0.00 134.60 0.10 60.45 avgrq-sz avgqu-sz await svctm %util  727.24 68.46 798.04 5.67 100

Another function of monitoring hard disk usage is to know whether the system is performing swapping, and the computer has a fixed amount of physical memory, but it can run some applications that use much larger memory than its physical memory. Applications usually occupy more memory than they actually need. In this case, the operating system moves these unused memory into the hard disk, when necessary, they are swapped into the physical memory through the form. For most applications, this memory management method is good, but for server-type applications, this method is particularly bad, because of the existence of Java memory heap, server applications often require a large amount of physical memory.

Because the data in the hard disk needs to be exchanged with the data in the physical memory, it will seriously affect the system performance. In the result of the vmstat command, the si and so columns indicate the amount of data exchanged into the physical memory and the physical memory. Through this data, you can know whether the system is exchanging data.

Network usage

If the application is running on a network, you must monitor the system's network transmission usage during performance monitoring. Network Transmission is similar to hard disk transmission. inefficient use of network transmission results in insufficient network bandwidth. If the amount of data transmitted over the network exceeds the maximum load, it also causes network transmission performance bottlenecks.

The built-in network monitoring tool of the operating system can only obtain the number of packets and number of segments received and sent by a network interface. This information is insufficient to determine whether the network load is normal or heavy.

In UNIX systems, the basic network monitoring tool is netstat.

Of course, there are a lot of third-party network monitoring tools, nicstat is a widely used command line tool in UNIX systems, through which you can get the usage of the specified network interface.

Run the nicstat 5 command to obtain the data in listing 5. From the data, we can see that the network interface e1000g1 is 1000 MB, and the interface usage is only 2.98% (% Util ), this interface is used to read KB data and write KB data every second. The bandwidth and usage of the network interface can be clearly obtained through this data.

Listing 5. nicstat 5 command results
 Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat  17:05:17 e1000g1 156.4 256.9 875.0 909.5 215.4 175.3 2.98 0.00

If you only use the netstat command, you can obtain the amount of data read and write per second, but you must know the network bandwidth and use additional scripts to calculate the network interface usage. During the computing process, note that the bandwidth unit is bit per second (bps), So 1000 Mb of bandwidth can transmit 125 MB of data per second. Nicstat has already helped us perform similar calculations.

Network Transmission cannot support 100% of the usage. In a local Ethernet network, over 40% of the usage is considered as saturated interfaces. To use other media for network transmission saturation usage, consult the network architect. Java programs only use the network interface of the operating system for transmission, and cannot determine the network usage saturation value.

Windows system resource monitoring

The following describes the Windows System Monitoring Tool Perfmon. Perfmon is a Windows performance monitoring tool that monitors usage of various system resources and provides a graphical user interface. Perfmon consists of four parts: Performance Monitor, counter log, trace log and alarm.

1. Performance Monitor

Run the perfmon. msc command in the command line of Windows to start the user interface of Perfmon Performance Monitor. Performance Monitor allows you to monitor CPU, hard disk, and network resources in real time. The specific analysis method is similar to that in Linux. You can use the counter logs described below to save the monitoring data.

Another function of Performance Monitor is to display the data stored in the counter logs to users in a graphical form. You can use the "view current activity" or "view log data" function to specify monitored resources.

2. Counter logs

Although the performance monitor can monitor system resources in real time, it cannot save monitoring data. If you need to continuously sample the system monitoring data, you must use the Perfmon counter log function. You can use the system monitor or other tools to analyze the data stored in the counter logs.

3. Tracking logs

By tracking logs, you can track important system events and specific applications. Trace logs are saved as. etl binary files by default. You can use the tracert command to analyze the files and generate Dump files in CSV format.

Currently, you must configure the tracing application and the path to save the log file by editing the system registry.

4. Alarm

When the performance monitoring data of a counter reaches the preset threshold, the Perfmon alarm is triggered. The alarm refers to the preset action, such as sending emails and running specified commands. You can also set the alarm action to record the alarm as a system event so that you can view the alarm content in the event viewer. You can specify different alert policies for different applications. For example, if an alarm is triggered when the CPU idle time is less than 80%, an email is sent to the system maintenance personnel. If the memory usage is higher than 90%, an alarm is triggered. Run the typeperf command to collect and save data.

Perfmon can be used only as an administrator. Perfmon has two deployment modes: local monitoring mode and remote monitoring mode.

In local monitoring mode, log files are stored in the C: \ perflogs directory by default. You can modify the directory under "Log Files. Log files generated by local monitoring can be analyzed by using performance monitor on the local machine or transmitted to other Analysis Platforms for analysis.

In the remote monitoring mode, you can perform centralized sampling and monitoring on multiple target monitoring machines in the LAN on the premise of Establishing a trust relationship between the monitoring host and the monitored host and enabling remote access control. However, this is followed by security risks. Therefore, the remote monitoring mode is difficult to implement in environments with strict access control.

During deployment, you also need to consider the log file storage problem and set the appropriate sampling interval. If the value is too small, the log file will increase rapidly. If the value is too large, large errors may occur in monitoring data.

Perfmon has two management methods: Console management and command line management. As described above, you can run perfmon. msc to open the console manager and manage the console according to the monitoring policy. You can use the Logman command to create, start, and stop a log Session in the command line. These parameters include create, start, stop, delete, query, and update. For details about how to use these parameters, see the Logman help document. Besides the Logman command, the Typeperf command is also a common system function monitoring command in Windows. This command can be used to obtain the current performance data of all resources in Perfmon, but cannot generate logs and set alarms. We redirect the output of Typeperf to a text file and use a third-party tool for analysis. Typeperf can be used together with other performance tools. By executing this command for customized scheduled tasks, you can regularly obtain system performance data.

Summary

This article provides a variety of methods for performance monitoring and optimization, and describes the factors and causes of performance impact based on the CPU time characteristics. In general, performance monitoring should start with the CPU time consumed when the application is running, and analyze the running status and resource consumption bottlenecks. Second, optimize the Code with the goal of improving CPU usage.

Performance monitoring can also be achieved by monitoring the usage of hard disks. A large number of hard disk reads and writes may cause performance problems of applications, this requires our program designers to minimize disk read/write operations and read/write data during program design, and improve application performance by using cached and exchanged data.

Monitoring the overhead of network transmission for network-based applications is also a method for performance monitoring. The exchange of a large amount of data at the network layer also brings performance overhead.

Finally, this article explains how to use the performance analysis tools that come with Java. Please remember one thing when using these tools,

There is no perfect tool to help you fully understand the overall performance of the application. In actual work, we may need to combine multiple tools to complete the performance analysis of an application. Different tools have different directions, so that you can better complete the analysis tasks only by understanding and using them in all aspects.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.