"This appendix is contributed by Joe sharp,
Java emphasizes accuracy, but reliable behavior must be at the cost of performance. This feature is reflected in automatic garbage collection, strict runtime checks, complete bytecode checks, and conservative runtime synchronization. As there are a large number of platforms available for selection for an interpreted virtual machine, performance is further hindered.
"Finish it first, and then gradually improve it. Fortunately, there are usually not many improvements ." (Steve McConnell's "about performance" [16])
The purpose of this appendix is to guide you to find and optimize the part that requires perfection ".
D.1 Basic Method
After the program is correctly and completely checked, you can solve the performance problems:
(1) Check the program performance in the real environment. If the requirements are met, the target is met. If not, go to the next step.
(2) Find the most fatal performance bottleneck. This may require some skills, but all efforts will not be wasted. If you simply guess the bottleneck and try to optimize it, it may be a waste of time.
(3) Use the acceleration technology described in this appendix and return to step 1.
To keep your efforts from being wasted, the bottleneck is a crucial part. Donald knuth [9] has improved a program that spends 50% of its time on about 4% of its code. In just one working hour, he modified several lines of code, doubling the execution speed of the program. At this point, if you continue to invest the time in the modification of the remaining code, it will only be worth the candle. Knuth has a famous saying in the programming world: "premature optimization is the root of all troubles" (premature optimization is the root of all edevils ). The most sensible approach is to suppress the impulse to premature optimization, because doing so may omit a variety of useful programming techniques, making the code harder to understand and manipulate, and requiring greater effort for maintenance.
D.2 find bottlenecks
The following methods can be used to identify the bottleneck that most affects program performance:
D.2.1 install your own test code
Insert the following "Explicit" Timing code to evaluate the program:
Long start = system. currenttimemillis ();
// Put the calculation code to be timed here
Long time = system. currenttimemillis ()-start;
Use System. Out. println () to print the accumulated time to the console window in an uncommon way. Once an error occurs, the compiler ignores the error. Therefore, you can use a static final Boolean to enable or disable timing so that the code can be safely stored in the final release program, in this way, emergency response can be taken at any time. Although more complex evaluation methods can be used, it is undoubtedly the easiest way to measure the execution time of a specific task.
The returned Time of system. currenttimemillis () is 1‰ seconds (1 ms. However, in some systems, the time precision is less than 1 Millisecond (such as Windows PC), so we need to repeat n times and divide the total time by N to obtain the accurate time.
D.2.2 JDK performance evaluation [2]
JDK provides a built-in evaluation program to track the time spent on each routine and write the evaluation results into a file. Unfortunately, the JDK evaluator is unstable. It works normally in JDK 1.1.1 but is unstable in later versions.
To run the Evaluation Program, add the-Prof option when calling the unoptimized version of the Java interpreter. For example:
Java_g-Prof myclass
Or add an applet ):
Java_g-Prof sun. Applet. appletviewer applet.html
It is not easy to understand the output information of the evaluation program. In fact, in JDK 1.0, it actually truncates the method name to 30 characters. Therefore, some methods may not be distinguished. However, if your platform does support the-Prof option, try the "hyperporf" [3] of Vladimir Bulatov or Greg White's "profileviewer" to explain the result.
D.2.3 special tools
If you want to keep up with the performance optimization tool trend at any time, the best way is to become a frequent visitor to some Web sites. For example, the "tools for optimizing Java" (Java optimization tool) website created by Jonathan Hardwick:
Http://www.cs.cmu.edu /~ Jch/Java/tools.html
D.2.4 performance evaluation skills
■ Because the system clock is used during the evaluation, do not run any other processes or applications at that time to avoid affecting the test results.
■ If you modify your program and try to improve its performance (at least on the Development Platform), you should test the code execution time before and after the modification.
■ Perform a time test in a completely consistent environment whenever possible.
■ If possible, a test independent of any user input should be designed to avoid errors in the results caused by different user responses.
D.3 speed-up method
Now, the key performance bottlenecks should be isolated. Next, we can optimize two types of applications: conventional methods and dependency on the Java language.
D.3.1 conventional means
Generally, an effective acceleration method is to redefine the program in a more realistic way. For example, in programming pearls [14], Bentley uses a novel data description to generate a very fast and streamlined spell checker, this article introduces Doug McILROY's expression of the English language. In addition, compared with other methods, better algorithms may improve performance, especially when the dataset size is getting bigger and bigger. For more information about these methods, see the "General books" list at the end of this appendix.
D.3.2 language-dependent method
For objective analysis, it is best to clearly understand the execution time of various operations. In this way, the result can be independent from the current computer-by dividing the time spent on the local assignment, the final result is "Standard Time ".
Calculation example Standard Time
Local value assignment I = N; 1.0
Instance value: this. I = N; 1.2
Int value-added I ++; 1.5
Byte value-added B ++; 2.0
Short value-added S ++; 2.0
Float value-added F ++; 2.0
Double value-added D ++; 2.0
Empty loop while (true) n ++; 2.0
Ternary expression (x <0 )? -X: x 2.2
Arithmetic call math. Abs (x); 2.5
Array value a [0] = N; 2.7
Long Value Added L ++; 3.5
Function call (); 5.9
Throw or catch exception try {Throw E;} or catch (e) {} 320
Synchmehod (); 570
New object new object (); 980
New array new int [10]; 3100
Through my own systems (such as my Pentium 200 pro, Netscape 3, and JDK 1.1.5), these relative times reveal that creating new objects and arrays will cause the heaviest overhead, synchronization will cause a heavy overhead, while a non-synchronous method call will cause a moderate overhead. Refer to resources [5] and [6] To summarize the web addresses of Measurement Application slices. You can run them on your own machine.
1. Regular Modification
The following are some general operation suggestions for accelerating the execution of key parts of the Java program (pay attention to the test results before and after the modification ).
Change... to... reason
Multiple inheritance of an interface abstract class (with only one parent) may impede performance optimization.
Non-local or array cyclic variables local cyclic variables compare the time consumed by the previous table. The time for an instance integer assignment is 1.2 times that of the local integer assignment, however, the time for assigning values to arrays is 2.7 times that of assigning values to local integers.
The Link List (fixed size) Stores discarded link items, or replaces the list with a circular array (generally known as the size). Each new object is equivalent to a local value of 980 times. Refer to reuse objects (next section), Van Wyk [12] p.87, and Bentley [15] P.81
X/2 (or any power of 2) x> 2 (or any power of 2) use faster hardware commands
D.3.3 Special Cases
■ String Overhead: The String concatenation operator + seems simple, but it actually consumes a lot of system resources. The compiler can efficiently connect strings, but variable strings require considerable processor time. For example, suppose S and T are string variables:
System. Out. println ("heading" + S + "trailer" + t );
The preceding statement requires a new stringbuffer (string buffer), append the independent variable, and use tostring () to convert the result back to a string. Therefore, disk space and processor time are greatly consumed. If you want to append multiple strings, you can use a string buffer directly-especially when you can reuse it in a loop. By prohibiting the creation of a new string buffer in each loop, you can save 980 units of object creation time (as described above ). Using substring () and other string methods can further improve the performance. If feasible, the character array speed can be even faster. Note that due to the synchronization relationship, stringtokenizer will cause a large overhead.
■ Synchronization: In the JDK interpreter, calling the synchronous method is usually 10 times slower than calling the non-synchronous method. After processing by the JIT compiler, the performance gap is increased to 50 to 100 times (note that the time summarized in the previous table is 97 times slower ). So try to avoid using the synchronization method-if not, the method synchronization is faster than the code block synchronization.
■ Reuse object: it takes a long time to create an object (according to the time summarized in the previous table, the object creation time is 980 times the value assignment time, the time for creating a small array is 3100 times the value assignment time ). Therefore, the best practice is to save and update the fields of the old object, rather than creating a new object. For example, do not create a font object in your paint () method. Instead, declare it as an instance object and initialize it again. After that, you can update it as needed in the paint. For more information, see program picking by Bentley, P.81 [15].
■ Exception: the exception handling module should be abandoned only when exceptions occur. What is "abnormal? This usually refers to a problem encountered by the program, which is generally not expected, so performance is no longer a priority. During optimization, the small "try-catch" block is merged. Because these blocks divide the code into small and independent fragments, they will prevent the compiler from optimizing. On the other hand, if you are too keen to delete the exception handling module, it may also cause a decline in code robustness.
■ Hash processing: first, the standard Java 1.0 and 1.1 hashtable classes require styling and synchronization of system resources that are particularly consumed (570 unit assignment time ). Second, early JDK libraries cannot automatically determine the optimal table size. Finally, the hash function should be designed for the feature of the actually used item (key. For all these reasons, we can design a hash class to work with specific applications to improve the performance of regular hash. Note that the hash map of the Java 1.2 collection library is more flexible and will not be automatically synchronized.
■ Method Embedding: the Java compiler can embed this method only when the method is final (final), private (dedicated), or static (static. In some cases, it is required that it never have local variables. If the code spends a lot of time calling a method that does not include any of the above attributes, consider writing a "final" version for it.
■ I/O: buffer should be used whenever possible. Otherwise, the final result may be that only one byte is input/output at a time. Note that JDK 1.0's I/O class uses a large number of synchronization measures. Therefore, if you use a "large batch" call like readfully (), you can explain the data yourself, to achieve better performance. Note that the "Reader" and "Writer" classes of Java 1.1 have been optimized for performance.
■ Shape and instance: The shape will take 2 to 200 unit value assignment time. A larger overhead even requires an inheritance (genetic) structure. Other high-cost operations will result in loss and ability to restore lower-level structures.
■ Graphics: The cutting technology is used to reduce the workload in repaint (); the buffer is multiplied to increase the receiving speed; and The Graphics compression technology is used to shorten the download time. "Java applets" from javaworld and "memory Ming Animation" from Sun are two good tutorials. Remember to use the most appropriate command. For example, drawing a polygon based on a series of points is much faster than drawline. For example, you must draw a straight line with a single pixel width. drawline (X, Y, X, Y) is faster than fillrect (X, Y.
■ Use API classes: use classes from Java APIs whenever possible, because they have been optimized for machine performance. This is hard to achieve with Java. For example, when copying an array of any length, arrarycopy () is much faster than loop.
■ Replace the API class: in some cases, the API class provides more functions than we want, and the corresponding execution time will also increase. Therefore, you can customize a special version so that it can do less but run faster. For example, assume that an application requires a container to store a large number of arrays. To speed up execution, you can replace the original vector with a faster dynamic object array.
1. Other suggestions
■ Move repeated constant calculations out of key loops-for example, calculating the buffer. Length of a fixed-length buffer zone.
■ Static final (static final) constants help the compiler optimize the program.
■ Achieve a loop with a fixed length.
■ Use javac Optimization Options:-o. It optimizes compiled code by embedding static, final, and private methods. Note that the length of the class may increase (for JDK 1.1 only-earlier versions may not support byte verification ). The new just-in-time (JIT) compiler dynamically accelerates code.
■ Minimize the count to 0-This uses a special JVM bytecode.