Java performance optimization [2]: String filtering practices

Source: Internet
Author: User

The previous post introduced the performance differences between basic and reference types (mainly because of different memory allocation methods ). Today, we will take a specific example to show you how the optimization works.

★Requirements
First, describe the requirement. For details, specify a String object and filter out other characters except numbers ('0'-'9. The time overhead should be as small as possible. The filter function is prototype: String filter (string Str );
In response to the above requirements, I wrote five different filter functions. Filter1 to filter5. Among them, filter1 has the worst performance and filter5 has the best performance. Before you proceed to read the subsequent content, you should first think about what kind of function will be written if you implement it? You 'd better write down the functions you want to compare them later.

★CodeImplementation

◇ Test code
To facilitate performance testing, prepare a Test Code as follows:

Class test {public static void main (string [] ARGs) {If (ARGs. length! = 1) {return;} string STR = ""; long nbegin = system. currenttimemillis (); For (INT I = 0; I <1024*1024; I ++) {STR = filtern (ARGs [0]); // call a specific filter function here} Long nend = system. currenttimemillis (); system. out. println (nend-nbegin); system. out. println (STR );}};
Don't peek at the subsequent content before you think about your implementation method! In addition, note that the Java environment I use is JDK 1.5.0-09, and the test string used is "d186783e36b721651e8af96ab1c4000b ". Because the JDK version and machine performance are different, your testing results on your own machine may be different from the values given below.

◇ Version 1
First, we will reveal filter1 with the worst performance. The Code is as follows:

Private Static string filter1 (string strold) {string strnew = new string (); For (INT I = 0; I <strold. length (); I ++) {If ('0' <= strold. charat (I) & strold. charat (I) <= '9') {strnew + = strold. charat (I) ;}return strnew ;}
If your code is unfortunately the same as filter1, your Java skills are quite bad. You don't understand how to use stringbuffer to optimize the concatenation of strings.
In order to compare the processing time of filter1 with that of subsequent operations, the processing time of filter1 is recorded in the range of 8.81-8.90 seconds.

◇ Version 2
Let's take a look at filter2. The Code is as follows:

Private Static string filter2 (string strold) {stringbuffer strnew = new stringbuffer (); For (INT I = 0; I <strold. length (); I ++) {If ('0' <= strold. charat (I) & strold. charat (I) <= '9') {strnew. append (strold. charat (I) ;}} return strnew. tostring ();}
In fact, when I commented on filter1, The filter2 Tianji was leaked. Filter2 uses stringbuffer to optimize the performance of the connection string. Why is stringbuffer having better performance than string? I will not elaborate on it. If you are not clear about it, you can check it on Google. I think there should be a lot of students who will write code similar to filter2.
In addition, the stringbuilder is added to JDK 1.5, which has better performance than stringbuffer. However, considering that it is possible to get another version of JDK for comparative testing, the difference between stringbuilder and stringbuffer is not the focus of this article, so the subsequent examples are implemented using stringbuffer.
The processing time of filter2 is about 2.14-2.18 seconds, which is increased by about 4 times.

◇ Version 3
Next, let's take a look at filter3. The Code is as follows:

Private Static string filter3 (string strold) {stringbuffer strnew = new stringbuffer (); int nlen = strold. length (); For (INT I = 0; I <nlen; I ++) {char CH = strold. charat (I); If ('0' <= CH & Ch <= '9') {strnew. append (CH) ;}return strnew. tostring ();}
At first glance, the code of filter3 and filter2 is similar! You can take a closer look at it. charat (I) is assigned to the char variable, saving the overhead of repeatedly calling the charat () method. length () is saved as nlen first, which also saves the overhead of calling length () repeatedly. Those who can think of this step are more careful.
After this optimization, the processing time was reduced to 1.48-1.52, and increased by about 30%. Because the internal implementation of charat () and length () is quite simple, the performance improvement is not obvious.
In addition, according to feedback from netizens, on JDK 1.6, the performance of filter3 and filter2 is basically the same. It may be because JDK 1.6 has been optimized.

◇ Version 4
Then let's take a look at filter4. The Code is as follows:

Private Static string filter4 (string strold) {int nlen = strold. length (); stringbuffer strnew = new stringbuffer (nlen); For (INT I = 0; I <nlen; I ++) {char CH = strold. charat (I); If ('0' <= CH & Ch <= '9') {strnew. append (CH) ;}return strnew. tostring ();}
The difference between filter4 and filter3 is very small. The only difference is that the stringbuffer constructor with parameters is called. The stringbuffer constructor can be used to set the initial capacity to effectively avoid allocating memory when append () appends characters, thus improving performance.
The processing time of filter4 is about 1.33-1.39 seconds. An increase of about 10% is a pity that the increase is a little small.

◇ Version 5
Finally, let's take a look at the ultimate version, filter5 with the best performance.

Private Static string filter5 (string strold) {int nlen = strold. length (); char [] charray = new char [nlen]; int NPOs = 0; For (INT I = 0; I <nlen; I ++) {char CH = strold. charat (I); If ('0' <= CH & Ch <= '9') {charray [NPOs] = CH; NPOs ++ ;}} return new string (charray, 0, NPOs );}
You may think that the difference between filter5 and the previous versions is big! Filter5 neither uses string nor stringbuffer, but uses character Arrays for intermediate processing.
The processing time of filter5 is only 0.72-0.78 seconds, which is nearly 50% higher than that of filter4. Why? Is it because the direct operation of character arrays saves append (char) calls? View the append (char) Source code The internal implementation is very simple and should not be improved so much.
What's the reason?
First, although filter5 has an overhead for creating character arrays, stringbuffer constructor also has overhead for creating character arrays. Offset. Therefore, filter5 saves the creation overhead of the stringbuffer object than filter4. (This factor is obvious in my JDK 1.5 environment)
Second, because stringbuffer is thread-safe (its methods are synchronized), calling its methods has a certain amount of synchronization overhead, while the character array does not, this is another improvement in performance. (According to feedback from netizens, this factor is obvious in JDK 1.6)
Based on the above two factors, filter5 is significantly higher than filter4.

★Summary of five versions
In the above five versions, the performance of filter1 and filter5 is about 12 times different (more than one order of magnitude ). Except that filter3 and filter2 improve performance by eliminating repeated function calls, other versions reduce the time overhead by saving memory allocation. The impact of memory allocation on performance can be seen! If you read the last post to write filter4 or filter5, it means that you have understood the secrets and I will not write it in white.

★I would like to add a note about the balance between time and space.
In addition, you need to provide additional instructions. Version 4 and version 5 use a time-based approach to improve performance. If the string to be filtered is large and the proportion of numeric characters is very low, this method is not suitable.
For example, most of the processed strings contain less than 10% numeric characters, and only a few strings contain more numeric characters. What should I do at this time? For filter4, you can change New stringbuffer (nlen); To New stringbuffer (nlen/10); to save space costs. But filter5 won't be able to do that.
Therefore, the specific version 4 or version 5 depends on the specific situation. Filter5 is cost-effective only when you are very concerned about time overhead and have a high proportion of numeric characters (at least 50%. Otherwise, filter4 is recommended.

 

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/program_think/archive/2009/03/19/4002955.aspx

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.