Design for Java performance (1)

Source: Internet
Author: User
Translation by supermmx

Read the entire "design for performance" series:
Part 1: Interface
Part 2: Reduce object Creation
Part 3: remote interfaces (March 23,200 1)

  Part 1: Interface

  Summary
Many common Java performance problems originate from the idea of class design in the early stages of the design process, as early as many developers began to consider performance issues. in this series, Brian Goetz discusses common Java performance risks and how to avoid them during design.

Many programmers consider performance management only later in the development cycle. they often delay performance optimization until the end, hoping to completely avoid it-sometimes this strategy is successful. however, early design ideas can affect performance optimization requirements and their success. if performance is an important indicator of your program, performance management should be integrated with the development cycle from the first day.

This series explores some early design ideas that can greatly affect application performance. in this article, I focus on one of the most common performance problems: creating temporary variables. the object creation method of a class is often determined at the design stage-but not intentionally-, and seed for later performance problems.

There are various performance problems. the easiest way to adjust is to simply select an incorrect Algorithm for Computing-just like using the bubble algorithm to sort a large dataset, or when a frequently used data item is used, it is calculated every time instead of as a buffer. you can use the summary analysis to identify these bottlenecks. Once these bottlenecks are found, you can easily correct them. however, many Java performance problems come from a deeper and more difficult source-Interface Design of a program component.

Today, most programs are built from components developed internally or purchased externally. even when the program does not rely heavily on existing components, the object-oriented design process also encourages applications to be packaged into components, which simplifies the design, development, and testing processes. these advantages are undeniable. You should realize that the interfaces implemented by these components may greatly affect the behavior and performance of the programs using them.

At this point, you may ask what interfaces are related to performance. the interface of a class not only defines the functions that the class can implement, but also defines its object creation behavior and its method call sequence. how a class defines its constructor and method determines whether an object can be reused, whether its method needs to be created, or whether its client needs to be created-intermediate object, and the number of methods that a client needs to call to use this class. these factors will affect the program performance.

  Note the creation of Objects

A basic java performance management principle is to avoid creating a large number of objects. this does not mean that you should not create any object and give up the benefits of object-oriented. however, you must pay attention to object creation in the tight loop when executing performance-related code. object creation is so costly that you should avoid unnecessary temporary or intermediate object creation when performance is required.

The string class is the main source of object creation in the programs that process text. because string is unchangeable, every time a string is modified or created, a new object must be created. the result is that programs that focus on performance should avoid using a large number of strings. however, this is generally impossible. even if you completely remove the dependency on string from your code, you often find that you are using components with interfaces defined based on string. therefore, you have to use string.

Example: Regular Expression matching

As an example, suppose you write an email server called Mailbot. mailbot needs to process the MIME Header Format, such as the sending date or the sender's email address, at the top of each message. A component that matches regular expressions is used to simplify the process of processing MIME headers. mailbot is smart enough not to create a String object for each header row or Header element. instead, it fills a character buffer with the input text and determines the position of the header to be processed by indexing the buffer. mailbot will call regular expressions to process each header line, so the performance of the matching is very important. let's use a poor interface of the regular expression matcher class as an example:

Public class awfulregexpmatcher {
/** Create a matcher with the given regular expression and which will
Operate on the given input string */
Public awfulregexpmatcher (string Regexp, string inputtext );
/** Retrieve the next match of the pattern against the input text,
Returning the matched text if possible or null if not */
Public String getnextmatch ();
}

Even when this class implements an algorithm that effectively matches regular expressions, any program that uses it in large quantities is still intolerable. since the matcher object is associated with the input text, you must create a new matcher object every time you call it. since your goal is to reduce the creation of unnecessary objects, it will be an obvious start to enable the token.

The class definition below demonstrates another possible interface of your matcher, allowing you to reuse the matcher, but it is still very bad.

Public class badregexpmatcher {
Public badregexpmatcher (string Regexp );
/** Attempts to match the specified regular expression against the input
Text, returning the matched text if possible or null if not */
Public String Match (string inputtext );
/** Get the next match against the input text, or return NULL if no match */
Public String getnextmatch ();
}

Ignore the fine points in regular expression matching-like returning a matched sub-expression. What is the problem with this seemingly harmless class definition? From the functional point of view, no. however, from the perspective of performance, there are many. first, the receiver needs its caller to create a string to represent the text to be matched. mailbot tries to avoid creating a String object, but when it wants to find a header to be parsed by a regular expression, it has to create a string to satisfy badregexpmatcher:

Badregexpmatcher datematcher = new badregexpmatcher (...);
While (...){
...
String headerline = new string (mybuffer, thisheaderstart,
Thisheaderend-thisheaderstart );
String result = datematcher. Match (headerline );
If (result = NULL ){...}
}

Second, a result string is created for the matcher. Even if Mailbot only cares about whether the match matches the text and does not need to match the text, this means that you should simply use badregexpmatcher to confirm whether a date header matches a specific format. You must create two string objects-the input and matching results of the matcher. there may not be many objects, but if you create two objects for each header line of each email processed by Mailbot, this will greatly affect the performance. the error lies not in the design of Mailbot, but in the design of badregexpmatcher class-or in the use.

Note that a lightweight match object is returned-The getoffset (), getlength (), and egetmatchstring () methods can be provided-instead of returning a string, which does not greatly improve performance. because creating a match object may be less costly than creating a string-including generating a char [] array and copying data, you still create an intermediate object, it is of no value to your callers.

This is enough. badregexpmatcher forces you to use the input format it wants to see, rather than the more effective form you can provide. however, the use of badregexpmathcer poses another risk, potentially bringing a greater risk to Mailbot performance: when processing mail headers, you begin to have a tendency to avoid using strings. however, since you are forced to create many string objects to meet badregexpmatcher, you may be tempted to give up this goal and use the string more freely. now, the bad design of a component has affected the program that uses it.
Even if you find a component with a better regular expression, you don't need to provide a string. At that time, your entire program will be affected.

  A better interface

How do you define badregexpmatcher without causing such a problem? First, badregexpmatcher should not specify its input. it should be able to accept the various input formats that its callers can effectively provide. second, it should not automatically generate a string for the matching result; it should return enough information so that the caller can generate it if needed. (For convenience, it can provide a method to do this, but it is not necessary.) There is a better interface:

Class betterregexpmatcher {
Public betterregexpmatcher (...);
/** Provide matchers for multiple formats of input -- string,
Character array, and subset of character array. Return-1 if no
Match was made; return offset of Match start if a match was
Made .*/
Public int match (string inputtext );
Public int match (char [] inputtext );
Public int match (char [] inputtext, int offset, int length );
/** Get the next match against the input text, if any */
Public int getnextmatch ();
/** If a match was made, returns the length of the match;
The offset and the length, the caller shocould be able
Reconstruct the match text from the offset and length */
Public int getmatchlength ();
/** Convenience routine to get the match string, in the event
Caller happens to wants a string */
Public String getmatchtext ();
}

The new interface reduces the requirement for the caller to convert the input to the desired format of the receiver. Mailbot can now call match () as follows ():

Int resultoffset = datematcher. Match (mybuffer, thisheaderstart,
Thisheaderend-thisheaderstart );
If (resultoffset <0 ){...}

This solves the goal of not creating any new objects. As an additional reward, its interface design style is added to Java's "Lots-of-simgle-Methos" design philosophy.

The exact impact of the creation of additional objects on performance depends on the workload of matth. you can determine the upper limit of a performance difference by creating a regular expression pair and timing. in Sun JDK 1.3, the above code snippets are about 50 times faster than the badregexpmatcher class in the betterregexpmatcher class. using a simple string matching implementation, betterregexpmatcher is five times faster than the corresponding badregexpmatcher.

  Exchange type

Badregexpmatcher forces Mailbot to convert the input text from the character array to a string. As a result, it creates unnecessary objects. many implementations of badregexpmatcher immediately convert string into a character array, making it easy to access the input text. this not only applies for another example, but also means that you have finished all the work, and the final form is the same as that at the beginning. mailbot and badregexpmatcher do not want to process string -- string only looks like a very obvious format for passing text between components.

In the above badregexpmatcher example, the string class is used as an exchange type. an exchange type is a type that neither the caller nor the caller wants to use or use it as the data format, but both can easily convert it or convert it from it. defining an interface with an exchange type reduces the complexity of the interface while maintaining flexibility, but sometimes simplicity leads to high-cost performance.

The most typical example of an exchange type is the JDBC resultset interface. it is impossible to provide its resultset interface like a dataset provided by any local database. However, by implementing a resultset, The JDBC driver can easily encapsulate the local data provided by the database. similarly, the client program cannot represent data records like this, but you can hardly convert the resultset to the desired data representation. in the JDBC example, you accept the cost of this layer because it brings the benefits of standardization and cross-database implementation portability. however, pay attention to the performance cost of the switching type.

This is not worth it. The impact of the exchange type on performance is not easy to measure. if you test the code snippet that calls badregexpmatcher, it will create the Mailbot input string at runtime. However, the string is only used to meet badregexpmatcher. if you want to assess the real impact of a component on program performance, you should not only measure the resource usage of its code, but also the code that uses and recovers it. this is difficult to complete for standard testing tools.

  Conclusion

Not all programs focus on performance, but not all programs have performance problems. but for those programs that focus on these, this article mentioned all very important, because they can not be modified in the last minute. since it is very difficult to modify its interface after you write code and use a class, it takes a little extra time to consider the performance characteristics during your design period.

In the second part, I will demonstrate some methods to reduce unnecessary object creation by using modifyable and non-modifyable.

About the author
Brian Goetz is a professional software developer with over 15 years of experience. He is a principal consultant at quiotix, a software development and consulting firm located in Los Altos, Calif.

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.