http://www.ibm.com/developerworks/cn/aix/library/es-Javaperf/es-Javaperf4.html maximizing Java performance on AIX, Part 4: monitoring traffic
This five-part series provides several techniques and techniques that are often used to optimize Java™ applications for optimal performance on AIX®. This article discusses the scenarios where I/O and networks can be bottlenecks.
See more in this series | 0 Reviews:
Amit Mathur ([email protected]), senior Technical consultant and solution implementation Manager, IBM
January 03, 2008
Develop and deploy your next application on the IBM Bluemix cloud platform.
Get started with your trial
This is the fourth article in a five-part series about the Java performance optimization on AIX. It is strongly recommended that you read the 1th part of this series before proceeding further (if you have not done so).
This article discusses two other areas that can be a performance bottleneck:
These two aspects are often present as AIX-specific issues and need to be optimized independently of the Java application. Therefore, instead of using the form used in parts 2nd and 3rd, this article focuses on finding the information needed to complete the optimization effort. Therefore, this article provides only a small number of tips, but we want to combine the overall performance tool discussion with a handful of tips here to provide you with enough information to get started with performance tuning.
Back to top of page
I/O and network bottlenecks
The purpose of this article is to discuss situations where I/O or networks can become bottlenecks.
If you've read each of the previous articles in this series, we want you to start to understand how each of the smaller parts fits into the global. We have tried to classify them based on the common areas of application of these techniques, but this classification is by no means mutually exclusive. For network and I/O, you will not see the actual cause of the problem so easily, but you will eventually feel the impact on your application. Only an adequate understanding of the application will guide you in determining the root cause of the problem. For example, in front of this series, we discussed the importance of ensuring that the heap is not paged. The maximum heap size specified with the-XMX switch should be less than the total amount of physical memory installed on the system (shown by "Bootinfo–r" or "Lsattr-el sys0-a Realmem" For more such commands, see "AIX Commands you should Not leave home without ".
Tools such as Topas and iostat can show the use of individual disks, but in most cases the root cause is either a GC cycle or a known functional part, or if you know your application, It should be fairly straightforward to determine the root cause of the problem. Tools such as Filemon can even tell you which files are being accessed to eliminate guesswork from optimization efforts. If your Java application performance is affected by misconfigured systems, it is time to change the focus and consider system performance optimizations instead. For example, a solution to a disk bottleneck could be to distribute the data efficiently or choose to use a higher-speed disk. This topic is beyond the scope of this article, and for more information on this topic, see the Red Book such as Understanding IBM eserver pseries Performance and Sizing.
Configuring network buffers and optimizing other network parameters can have a significant impact on network-intensive applications. A good reference to network-optimized parameters is the performance Management Guide, tunable Parameters section. Some of the popular tweaks involve Thewall, Socketthresh, Sbmax, Somaxconnect, Tcp_sendspace, Tcp_recvspace, rfc1323, and so on. This information is neither AIX-specific nor Java-specific, but for network-intensive applications this should be the first step in performance optimization.
The remainder of this section will briefly describe some common tools and how to detect Java-specific issues. For more details, see AIX 5L Performance Tools Handbook and Understanding IBM eserver pseries Performance and Sizing.
The Multi-purpose Vmstat command should already be your good friend. For I/O work, you should look at the wa(I/O wait) value in the CPU section. If this value is very high, there may be a disk bottleneck, and you can then use Iostat to view disk usage in more detail.
Iostat is the ideal tool for determining if the system has I/O bottlenecks. It shows the read and write speed for all disks. This makes it an ideal tool to determine whether you need to "scatter" disk workloads across multiple disks. The tool also reports the same CPU activity as Vmstat.
When your application is running, start from the simple
iostat -s to determine what the system is doing on the whole. This command prints the following:
TTY: tin tout avg-cpu: % user % sys % idle % iowait 0.3 232.9 13.8 19.1 27.4 39.6 Disks: % tm_act Kbps TPs kb_read kb_wrtn hdisk0 28.7 291.4 35.0 176503 2744795 hdisk1 0.0 0.4 0.0 3537 0 hdisk7 1.7 34.9 9.8 8920 341112 hdisk14 24.5 1206.1 36.2 1188404 10904509 hdisk18 0.0 1.2 0.1 10052 2046 hdisk8 2.1 36.8 10.5 10808 357910
Review the%iowait data to determine if the system is spending too much time waiting for I/O to complete. If the system is paging, this is the data to be observed. However, it is important to note that this data alone is not sufficient to determine what is happening on the system. For example, if you write an order file in your application, the higher%iowait value is normal.
%tm_act shows the percentage of active time for a specific disk. The above trace shows a very interesting scene where%iowait is close to 40%, but Tm_act was no matter how close to 100%, but only hovering under 30%. The system on which the above tracking is made has a fibre Channel-attached storage, and the result proves that the bottleneck is the route to the SAN storage. Once it's clear, it looks pretty easy!
You can also use
# iostat -at <interval> <count> or
iostat -sat ... , these two commands will give the adapter's TPS and KBPS values (as well as read and write speeds). The-s flag will provide you with overall system statistics.
For network optimization,netstat is the ideal tool.
netstat -mcan be used to view mbuf memory usage, which tells you about sockets and network memory usage. If used
no -o extendednetstats=1 ,
netstat -m more details will be displayed, but this will have a performance impact and should only be used for diagnostic purposes. When used
netstat -m , the relevant information is displayed at the top of the output as follows:
Mbufs in use: mbuf cluster pages on use 272 Kbytes allocated to Mbufs 0 requests for Mbufs denied 0 calls to protocol drain routines 0 sockets not created because Sockthresh was reached
As well as the bottom of the output, as follows:
Streams mblk statistic failures: 0 High priority MBLK failures 0 Medium priority MBLK failures 0 low Priori Ty MBLK Failures
If you see a failure in the
netstat -m output, AIX 5L Performance Tools Handbook provides a clear description of which parameters to adjust. You may also want to try
netstat -i x (replace x with the interval at which data is collected) to see network usage and packets that might be dropped. For network-intensive applications, this is the first step in checking whether "everything is OK".
Netpmon uses tracking to get the details of the network activity during a time interval. It also displays the CPU statistics for the process, which shows:
- Total CPU time used by the process
- CPU usage of the process (percent of total time)
- The total time that the process spent executing network-related code
To start optimizing your work, you can try the following commands:
Netpmon-o/tmp/netpmon.log; Sleep 20; Trcstop
This command line runs the Netpmon command for 20 seconds, and then uses trcstop to stop the command and write the output to/tmp/netpmon.log. Looking at the generated data, you can see that the example we selected is well suited for the Java performance optimization article:
Process CPU Usage Statistics: ----------------------------- Network process (top) PID CPU Time CPU% CPU% ---------------------------------------------------------- java 12192 2.0277 5.061 1.370 UNKNOWN 13758 0.8588 2.144 0.000 Gil 1806 0.0699 0.174 0.174 UNKNOWN 18136 0.0635 0.159 0.000 dtgreet 3678 0.0376 0.094 0.000 swapper 0 0.0138 0.034 0.000 Trcstop 18460 0.0121 0.030 0.000 sleep 18458 0.0061 0.015 0.000
Another useful part of the trace is the adapter usage:
-----------Xmit----------- --------Recv--------- Device pkts/s bytes/s Util qlen PKTS/S bytes/s demux ---------------------------------------------------------------------------- -- Token Ring 0 288.95 22678 0.0%518.498 552.84 36761 0.0222 ... Device:token Ring 0 recv packets: 11074 recv Sizes (bytes): avg 66.5 min. Max 1514 Sdev 15.1 recv Times (msec): avg 0.008 min 0.005 max 0.029 sdev 0.001 demux times ( msec): avg 0.040 min 0.009 max 0.650 sdev 0.028 xmit packets: 5788 XMit Sizes ( bytes): avg 78.5 min max 1514 sdev 32.0 xmit Times (msec): avg 1794.434 min 0.083 Max 6443.266 Sdev 2013.966
Let's say you think there's too much information, or you want to see more specific information. Let's try the following command:
Netpmon-o So-o/tmp/netpmon_so.txt; Sleep 20; Trcstop
"-O so" allows Netpmon to focus on socket-level traffic. Now we can drill down into the Java process information:
Process:java pid:12192 reads: 2700 Read sizes (bytes): avg 8192.0 min 8192 max 8192 sdev 0.0 read times (msec): avg 184.061 min 12.430 max 2137.371 Sdev 259.156 writes : 3000 Write Sizes (bytes): avg 21.3 min 5 max Sdev 17.6 write Times (msec): avg 0.081< C22/>min 0.054 Max 11.426 Sdev 0.211
Is it useful? Let's take a step further and find the thread-level activity. Add "-T" to the command as follows:
Netpmon-o so-t-o/tmp/netpmon_so_thread.txt; Sleep 20; Trcstop
The generated output now contains thread-specific information, as follows:
THREAD tid:114559 reads: 9 Read sizes (bytes): avg 8192.0 min 8192 max 8192 sdev 0.0 Read times (msec): avg 988.850 min 19.082 max 2106.933 Sdev 810.518 writes: Write Sizes ( bytes): avg 21.3 min 5 max Sdev 17.6 write Times (msec): avg 0.389 min 0.059 Max 3.321 Sdev 0.977
You can now create a Java dump, see what the thread is, and determine if it works as expected. Especially for applications with multiple network connections, Netpmon allows capturing a comprehensive view of the activity.
Filemon can be used to determine which files are being used actively. This tool provides a very comprehensive view of file access and is useful for in-depth analysis after Vmstat/iostat confirms that the disk is a bottleneck. This tool also uses tracking features, so it works like Netpmon:
Filemon-o/tmp/filemon.log; Sleep 60; Trcstop
The resulting log file is quite large. Some of the areas that may be useful include:
Most Active Files------------------------------------------------------------------------#MBs #opns #rds #wrs file Volume:inode------------------------------------------------------------------------ 25.7 6589 0 unix/dev/hd2:147514 16.3 1 4175 0 Vxe102/dev /mailv1:581 16.3 1 0 4173. vxe102.pop/dev/poboxv:62 15.8 1 1 4044 tst1 /dev/mailt1:904 8.3 2117 2327 0 passwd/dev/hd4:8205 3.2 182 810 1 services/dev/hd4:8652 ...------------------------------------------------------------------------ Detailed file Stats------------------------------------------------------------------------file:/var/spool/mai l/v/vxe102 Volume:/dev/mailv1 (/var/spool2/mail/v) inode:581 opens:1 Total bytes xfrd:1 7100800 reads: 4175 (0 errs) Read sizes (bytes): avg 4096.0 min 4096 max 4096 Sdev 0.0 read times (M SEC): Avg 0.543 min 0.011 max 78.060 Sdev 2.753 ...
This tool is described in the references referenced earlier, and more detailed research is beyond the scope of this article.
The Java common technique used to avoid I/O and network bottlenecks is in the final analysis a good design and has been documented clearly in several places. But take a look at techniques NI004 and tricks NI005.
Back to top of page
Feature-Based Optimization techniques
Let's look at the different characteristics of a typical application. You should navigate to behaviors that are similar to your application, whether designed or observed, and apply the appropriate techniques.
Back to top of page
For network-intensive applications, you should use Netstat to ensure that there are no dropped packets, and so on. The Netstat and Netpmon sections in the AIX 5L Performance Tools Handbook Describe the various adjustments that can be performed when a failure is observed during monitoring, so there is no repetition here.
If you suspect that network throughput is a bottleneck, tip NI001 is useful for determining if there is a problem. In addition, if you do not use IPV6 at all, you can also use the tip NI002.
If you are considering application performance differences between AIX and other platforms, and you suspect that this difference is due to the socket options you set, you can look at the trick NI004.
If the application is an RMI client or server, you may observe that there are some rows in the VERBOSEGC output that are not described. For example, the following is excerpted from the VERBOSEGC output of an RMI application:
<GC (4057): GC cycle started Thu Apr 11:14:28 2004<GC (4057): Freed 254510616 bytes, 55% f Ree (453352000/810154496), in 1189 ms> <GC (4057): mark:991 MS, sweep:198 MS, compact:0 ms> <GC (4057): ref S:soft 0 (age >=), weak 2, final, Phantom 0> <GC (4057): Stop threads time:10, start Threads time:260> ; <GC (4058): GC cycle started Thu Apr 11:15:29 2004<GC (4058): Freed 267996504 bytes, 56% free (454445800/810154496 ), in 1243 ms> <GC (4058): mark:1041 MS, sweep:202 MS, compact:0 ms> <GC (4058): Refs:soft 0 (age >= 32 ), weak 0, final 253, Phantom 0><GC (4059): GC cycle started Thu Apr 11:16:31 2004<GC (4059): Freed 248113752 by TES, 56% Free (455754152/810154496), in 1386 ms> <GC (4059): mark:1095 MS, sweep:291 MS, compact:0 ms> <GC (4059): Refs:soft 0 (age >=), weak 0, final 263, Phantom 0>
These GC cycles are triggered almost exactly 60 seconds apart and are not triggered by "allocation failure (Allocation Failure)". The trick NI003 may apply here after making sure that the application does not call System.GC () directly.
For RMI-intensive applications, you should consider skill NI005, but be aware of the considerations that are mentioned in this technique.
Back to top of page
With Iostat and Filemon, you should be able to identify the root cause of the bottleneck. The solution is typically to tweak the application design to stop relying on disk access, or to tune the system to optimize disk access. Since these two types of adjustments are beyond the scope of this article, we recommend that you familiarize yourself with Iostat and Filemon. The information in the previous section should allow you to start on the road.
Back to top of page
General skill Set
The following will refer to the Java command-line arguments (specified before the Class/jar file name) as "switches." For example, the command line
java -mx2g hello has a single switch
Tip NIO001 Check the speed of your network connection
You can establish an FTP session between two systems that need to analyze their connection speed, and you can execute the following FTP commands:
Ftp> put "|dd if=/dev/zero bs=32k count=1000"/dev/null200 PORT command successful.150 Opening data connection For/dev /null.1000+0 Records in.1000+0 Records out.226 Transfer complete.32768000 bytes sent in 130.4 seconds (245.4 kbytes/s) Loca L: |DD if=/dev/zero bs=32k count=1000 Remote:/dev/null
The above quick test attempts to transfer 1000 0 blocks (blocks of zeroes), each with a size of three KB, and provides a simple way to determine the throughput of connections between two AIX computers. The above example shows a throughput of 245.4 KBps, which indicates a network problem because both AIX computers are using a full-duplex network adapter of up to two Mbps. If the above test shows 1.140 e+4 kbytes/s, this should be a good hint to focus on the application instead. You can change the block size and count to simulate your application behavior in more detail.
Tip NI002 IPV4 Stack
If you do not want to use IPV6 in your application, you can set the property preferipv4stack to True as follows:
Tip NIO003 Remote GC
If your application is an RMI client or server, you can also set the http://java.sun.com/j2se/1.4.2/docs/guide/rmi/sunrmiproperties.html defined at The Sun.rmi.dgc.client.gcInterval and/or Sun.rmi.dgc.server.gcInterval properties are used for IBM Java. Both of these properties are set by default to 60 seconds and, based on the needs of the application, can be increased to reduce the performance impact of extra GC cycles.
Note that the warnings at the top of the link and the risks associated with not releasing distributed objects also apply to IBM Java.
Tip NI004 Socket Buffer size
If you are setting the send and receive buffer sizes, be aware that calls to Setsendbuffersize (int) are used only as hints. Therefore, if you observe performance differences between platforms, you should add a call to Getsendbuffersize () and see if the prompt is picked up by the current platform. In a recently reported performance issue with AIX, the application called Setsendbuffersize (4096) from its code. AIX used the hint and set the buffer size on request, while the other platform ignored the call. As a result, the perceived performance on AIX is bad! Removing this call from your code can increase the performance of the application on AIX by more than four times times.
In general, you might want to omit calls from your application to adjust the TCP/IP stack because the AIX network stack is pre-tuned.
Tip NI005 Connection Pool
For RMI-intensive applications, enabling the thread pool allows the reuse of existing connections instead of creating new connections for each new RMI invocation. To enable the thread pool, you can set the following properties:
You can also disable the thread pool as follows:
Note: It is best to use the thread pool only for RMI-intensive applications. The latest version of Java on AIX (1.3.1 SR7 and above, and 1.4.1 SR2 above) disables thread pooling by default.
Back to top of page
This article describes common tools and techniques for dealing with network and disk I/O bottlenecks.
The next article ends this series with a general observation and a link to useful references.
- Other parts of the article series:
- 1th part
- 3rd part
- 4th part
- 5th part
Maximizing Java performance on AIX, Part 4: monitoring traffic