a general concept of performance optimization
 
It is generally believed that Java programs are always slower than C programs, and most people may have heard too much about this kind of advice. In fact, the situation is far more complicated than the old claims. Many Java programs are really slow, but slow speed is not an intrinsic feature of all Java programs. Many Java programs can achieve the efficiency of similar programs in C or C + +, but this is only possible when designers and programmers pay close attention to performance issues throughout the development process.
 
The main purpose of this article is to discuss how to optimize the performance of Java IO operations. Many applications spend a lot of time running on network or file IO operations, and poorly designed IO code may be several times slower than a carefully tuned IO code.
 
When it comes to performance optimizations for Java programs, some concepts are always mentioned again and again. The examples in this article revolve around the optimization of IO applications, but the basic principles also apply to other performance scenarios.
 
For performance optimization, perhaps the most important principle is: test early and Test often. Without knowing the root of the performance problem, there is no way to adjust performance effectively, and many programmers are frustrated by the unfounded guesswork of performance problems. In a module that takes up only 1% of the total program run time, the application performance improvement cannot exceed 1%. So, instead of guessing, you should use performance testing tools, such as some code analysis tools or logs with time information, to find the most time-consuming areas of application, and then focus on optimizing the hotspots of these programs. After the performance adjustment is complete, you should test again. Testing not only helps programmers focus on the most important code, but it also shows whether performance tuning has really been successful.
 
In the process of adjusting program performance, there may be a lot of data that needs to be tested, such as total elapsed time, memory average, peak memory consumption, program throughput, request latency, and object creation. Which factors to focus on, which is related to the specific situation and performance requirements. Most of the above data can be tested by some excellent commercialization analysis tools, however, it is not always necessary to have expensive code analysis tools to collect useful performance data.
 
The performance data collected in this article is for run time only, and the tool used in the test is similar to the following Timer class (it can be easily extended to support operations such as pause () and restart ()). Log output statements with time information affect test results because they also create objects and perform IO operations, and the timer allows us to gather time information without such statements.
 
 
 
  
   
   | public class Timer {//A simple "stopwatch" class with a precision of milliseconds. Private long StartTime, endtime; public void Start () {starttime = System.currenttimemillis ():} public void Stop () {endtime = System.currenttimemillis (); Public long GetTime () {return Endtime-starttime}}} | 
 
  
 
One common cause of Java performance problems is the excessive creation of temporary objects. Although new Java virtual machines effectively reduce performance impact when creating many small objects, the fact that object creation is expensive is still unchanged. Because of the immutable character of String objects, the string class is often the biggest culprit in performance problems, because each time a string object is modified, one or more new objects are created. As you can see, the second principle for improving performance is to avoid too many object creation operations.
 
Second, IO performance optimization
 
Many applications perform large-scale data processing, and IO operations are where subtle changes can lead to significant performance differences. The example from this article is an optimization of the performance of a word processing application, which is to analyze and process a large amount of text. In word processing applications, the time to read and process input text is critical, and the measures adopted to optimize the application provide a good example of the performance optimization principles noted above.
 
One of the main reasons that affects Java IO performance is that it uses a large number of single character IO operations, that is, using the Inputstream.read () and the Reader.read () methods to read one characters at a time. Java's single character Io operation inherits from the C language. In the C language, single character io is a common operation, such as repeatedly calling GETC () to read a file. C-language single-character IO operations are highly efficient because the getc () and PUTC () functions are implemented as macros and support file access with buffering, so these two functions can be performed only after a few clock cycles. In Java, the situation is completely different: for each character, not only one or more method calls, but more importantly, if you do not use any type of buffering, there will be a system call to get a character. Although a Java program that relies on read () may behave, function, and C programs, the two are not comparable in performance. Fortunately, Java offers several simple ways to help us get better IO performance.
 
Buffering can be implemented in one of the following two ways: using standard BufferedReader and Bufferedinputstream classes, or using block-reading methods to read a large chunk of data at once. The former is fast and simple, can improve performance effectively, and only a small number of code increase, there is less chance of error. The latter is also writing code, the complexity is slightly improved-certainly not difficult, but it can achieve better results.
 
To test the efficiency of different IO operations, this article uses six small programs that read hundreds of files and parse each character. Table One shows the elapsed time of these six programs, with five common Linux Java virtual machines being tested: Sun 1.1.7, 1.2.2 and 1.3 Java virtual machines, IBM 1.1.8 and 1.3 Java virtual machines.
 
These six procedures are:
 
Rawbytes: Read one byte at a time with Fileinputstream.read (). Rawchars: read one character at a time with Filereader.read (). Bufferedis: Use the Bufferedinputstream package FileInputStream to read one byte of data at a time with read (). Bufferedr: With the BufferedReader package FileReader, read () reads one character at a time. Selfbufferedis: Fileinputstream.read (byte[]) reads 1 K data at a time, accessing data from the buffer. Selfbufferedr: Filereader.read (char[]) reads 1 K data at a time, accessing data from the buffer.
 
 
 
  
   
   | Table i | 
 
   
   |  | Sun 1.1.7 | IBM 1.1.8 | Sun 1.2.2 | Sun 1.3 | IBM 1.3 | 
 
   
   | rawbytes | 20.6 | 18.0 | 26.1 | 20.70 | 62.70 | 
 
   
   | rawchars | 100.0 | 235 .0 | 174.0 | 438.00 | 148.00 | 
 
   
   | bufferedis | 9.2 | 1.8 | 8.6 | 2.28 | 2.65 | 
 
   
   | bufferedr | 16.7 | 2.4 | 10.0 | 2.84 | 3.10 | 
 
   
   | Selfbufferedis | 2.1 | 0.4 | 2.0 | 0.61 | 0.53 | 
 
   
   | Selfbufferedr /td> | 8.2 | 0.9 | 2.7 | 1.12 | 1.17 | 
 
  
 
Table one is the total time to process hundreds of files after adjusting the Java VM and program startup configuration. From table one we can get a few obvious conclusions:
 
InputStream is more efficient than reader. A char saves characters in two bytes, and byte requires only one, so there is less memory consumed by byte-saving characters and fewer machine instructions to execute. More importantly, the Unicode conversion is avoided with byte. Therefore, if possible, you should try to use byte substitution for char. For example, if the application must support internationalization, you must use char, or if you read from an ASCII data source (such as an HTTP or MIME header), or if you can determine that the input text is always in English, the program can use byte. Non-buffered character io is really slow. Character Io is inherently inefficient, and if there is no buffer, the situation is even worse. Therefore, in programming practice, you should at least buffer the flow, which can improve IO performance by up to 10 times times. The buffered block operation IO is faster than the buffered stream character Io. For character Io, although the buffer stream avoids the overhead of system calls each time a character is read, it still requires one or more method calls. Buffered block IO is 2 to 4 times times faster than buffered stream IO, 4 to 40 times times faster than buffer-free IO.
 
It is not easy to see from the table that character IO may counteract the advantages of a faster Java VM. In most performance tests, IBM 1.1.8 Linux java vm is about twice times as fast as the Sun 1.1.7 Linux java vm, but in Rawbytes and rawchars tests, the results show that the two are almost slow, The extra time overhead they spend on system calls masks the speed advantage of faster Java VMS.
 
Block Io has another, less obvious advantage. Buffered character IO sometimes has more requirements for the coordination between components, resulting in more error opportunities. Many times, the IO operation in the application is done by a component, which passes a reader or inputstream to the component and then the IO component processes the contents of the stream. Some IO components may erroneously assume that the stream it is working on is a buffered stream, but it does not explain this requirement in the document, or that the application developer has failed to heed this requirement, although the IO component describes the requirements in the documentation. In these cases, IO operations will not be as expected to be buffered, resulting in serious performance problems. This kind of situation is not possible if you use block IO instead (therefore, when designing software components, it is best to be able to make components impossible to misuse, rather than relying on documents to ensure that components are used correctly).
 
From this simple test, you can see that the most straightforward way to accomplish a simple task, such as reading text, may be 40 to 60 times times slower than a carefully chosen one. In these tests, the program makes some calculations when extracting and analyzing each character. If the program simply copies data from one stream to another, the performance difference between the non buffered character IO and block IO will be more pronounced, and block IO will be 300 to 500 times times the performance of the non buffered character Io.
 
third, test again
 
Performance tuning must be repeated, because secondary performance issues are often not exposed until major performance issues are resolved. In the case of word processing applications, the initial analysis shows that the program spends most of its time on reading characters, plus a dramatic improvement in performance after the Buffering function. Only after the program has solved the main performance bottleneck (character IO) has the remaining performance hotspots become apparent. The second analysis of the program shows that the program spends a lot of time creating a string object, and it appears that it creates more than one string object for each word in the input text.
 
The text analysis application in this article uses the modular design, the user can combine multiple text processing operations to achieve the expected goal. For example, users can combine word identifier parts (read input characters and organize them into words) and lowercase letter converter parts (convert words to lowercase letters), and a widget (converts words into their basic form, for example, converting jumper and jumped into jump).
 
Although the modular structure has obvious advantages, this approach can adversely affect performance. Because the interfaces between the parts are fixed (each part is entered as a string and another string is exported), there may be some duplication between the parts. If there are several parts that are often grouped together, it is worthwhile to optimize these situations.
 
In this word processing system, it can be seen from the actual usage that the user is almost always following the word identifier part, followed by the lowercase letter converter part. The word identifier analyzes each character, looks for a word boundary, and fills a word buffer. After identifying a complete word, the word identifier part creates a string object for it. The next part in the call chain is a lowercase letter converter part that will be raised in the previous string with String.tolowercase (), creating another string object. For each word in the input text, the sequential use of the two parts generates two string objects. Because the word identifier part and the lowercase letter converter parts are used together frequently, you can add an optimized lowercase letter word identifier that has the functionality of the original two parts, but creates a string object for each word, which helps improve performance. Table II shows the test results:
 
 
 
  
   
   | Table II | 
 
   
   |  |  | Sun 1.1.7 | IBM 1.1.8 | Sun 1.2.2 | Sun 1.3 | IBM 1.3 | 
 
   
   | A | Word identification | 23.0 | 3.6 | 10.7 | 2.6 | 2.9 | 
 
   
   | B | Word identification + lowercase letter Conversion | 39.6 | 6.7 | 13.9 | 3.9 | 3.9 | 
 
   
   | C | Combining word identification with lowercase letter conversions | 29.0 | 3.8 | 12.9 | 3.1 | 3.1 | 
 
   
   |  | Temporary string creation time (B-C) | 10.6 | 2.9 | 1.0 | 0.8 | 0.8 | 
 
  
 
From Table II We can get a few useful discoveries:
 
For Java VM 1.1, simple optimizations dramatically improve performance: about 25% to 45%. The last line shows that creating a temporary string object takes up the performance-added value of C between program A and program B. Also, as several other test projects show, the IBM Java VM 1.1 runs faster than the Sun Java VM 1.1. For 1.2 and 1.3 Java VMS, the performance difference between two versions is no longer that large, about 10% to 25%, which is equivalent to the percentage of time spent creating a temporary string object. This result shows that the higher-version Java VM does improve efficiency in creating object instances, but the performance impact of excessive object-creation operations is still noteworthy. For this type of operation to create a large number of small objects, the version 1.3 Java VM is much faster than the 1.1 and 1.2 Java VMS.
 
Performance optimization is a task that needs to be repeated. It is worthwhile to start collecting performance data early in the development process, as this allows for early identification and tuning of performance hotspots. Some simple improvements, such as adding buffering for IO operations or substituting a byte for char at the appropriate time, can often dramatically improve the performance of the application. In addition, there is a large performance difference between different VMS, simply changing to a faster Java VM, may let the program's performance to the expected goal of a big step.
 
Resources:
 
Sun China Website: adjust Java I/O performance