Performance optimization of Java strings

Source: Internet
Author: User


The underlying type is converted to string

In a program you may often need to convert other types into strings, sometimes as values of some underlying type. When stitching strings, if you have two or more base type values that need to be put in front, you need to explicitly convert the first value to a string (otherwise, like System.out.println (1+ ' a ') will output 98 instead of "1a"). Of course, there is a set of string.valueof methods that can do this (or the wrapper class for the underlying type), but who would be willing to write it if there is a better way to cut the code a little bit?

Stitching up an empty string ("" + 1) in front of the underlying type is the simplest method. The result of this expression is a string, after which you are free to perform string concatenation-the compiler will automatically convert all the underlying types into strings.

Unfortunately, this is the worst way to implement it. To find out why, we need to first introduce how this string concatenation is handled in Java. If a string (either literal constant or variable, the result of a method call is good) followed by A + sign followed by any type expression:

1 string_exp + any_exp

The Java compiler will turn it into:

1 newStringBuilder().append( string_exp ).append( any_exp ).toString()

If there are multiple + numbers in the expression, then the corresponding will be more than a few stringbuilder.append calls, the last is the ToString method.

StringBuilder (String) This construction method allocates a 16-character memory buffer. Therefore, StringBuilder does not need to re-allocate memory if there are no more than 16 characters to be spliced, but StringBuilder expands its buffer if it exceeds 16 characters. Finally, when the ToString method is called, the buffer inside the StringBuilder is copied, and the new one is returned as a string object.

This means that when the underlying type is converted to a string, the worst thing you have to do is create a StringBuilder object, a char[16] array, a String object, a char[] array that can store the input values in. With string.valueof, at least StringBuilder objects are omitted.

Sometimes you may not need to convert the underlying type at all. For example, you are parsing a string that is separated by single quotation marks. At first you might have written it this way:

1 finalintnextComma = str.indexOf("‘");

Or something like this:

1 finalintnextComma = str.indexOf(‘\‘‘);

The program has been developed, the requirements have changed, the need to support any delimiter. Of course, your first reaction is to save the delimiter in a string object and then use the String.IndexOf method to split it. We assume that there is a pre-configured delimiter placed in the M_separator word Cheri (words: can use this variable name, it should not be Java development origin of it.) )。 So, the code you're parsing should look like this:

123456789101112 private static List<String> split( final String str ){    final List<String> res = new ArrayList<String>( 10 );    int pos, prev = 0;    while ( ( pos = str.indexOf( m_separator, prev ) ) != -1 )    {        res.add( str.substring( prev, pos ) );        prev = pos + m_separator.length(); // start from next char after separator    }    res.add( str.substring( prev ) );    return res;}

But then you find that this delimiter has only one character. At initialization time, you changed the string mseparator to char mseparator, and then changed the setter method together. But you want to parse the method do not change too big (the code is now good, why should I bother to change it?) ):

123456789101112 private static List<String> split2( final String str ){    final List<String> res = new ArrayList<String>( 10 );    int pos, prev = 0;    while ( ( pos = str.indexOf("" + m_separatorChar, prev ) ) != -1 )    {        res.add( str.substring( prev, pos ) );        prev = pos + 1; // start from next char after separator    }    res.add( str.substring( prev ) );    return res;}

As you can see, the call to the IndexOf method has been changed, but it has created a new string and passed it in. Of course, this is wrong, because there is also a indexof method that receives a char type instead of a string type. We use it to rewrite:

123456789101112 private static List<String> split3( final String str ){    final List<String> res = new ArrayList<String>( 10 );    int pos, prev = 0;    while ( ( pos = str.indexOf(m_separatorChar, prev ) ) != -1 )    {        res.add( str.substring( prev, pos ) );        prev = pos + 1; // start from next char after separator    }    res.add( str.substring( prev ) );    return res;}

We will use the above three implementations to test, the "Abc,def,ghi,jkl,mno,pqr,stu,vwx,yz" This string parsing 10 million times. The following are the runtimes for Java 641 and 715. Java7 because of the linear complexity of its string.substring method, the running time is increased instead. For this you can refer to the information below.

As you can see, a simple refactoring significantly shortens the time required to split a string (SPLIT/SPLIT2->SPLIT3).

Split Split2 Split3
Java 6 4.65 sec 10.34 sec 3.8 sec
Java 7 6.72 sec 8.29 sec 4.37 sec
string concatenation

In this paper, of course, there are two other ways not to mention string concatenation. The first is String.Concat, which is seldom used. It is actually assigned a char[], length is the length of the concatenation of the string, it copies the data of the string into the inside, and finally used a private constructor to generate a new string, the construction method will no longer copy the char[], so this method call only created two objects, One is the string itself, and the other is its internal char[]. Unfortunately, this method is more efficient unless you are stitching only two strings.

Another way is to use the StringBuilder class, and its series of append methods. If you have a lot of values to splice, this method is, of course, the quickest. It was first introduced in Java5 to replace StringBuffer. The main difference is that StringBuffer is thread-safe, and StringBuilder is not. But will you often stitch strings together?

In the test, we spliced the numbers between 0 and 100000, using String.Concat, + operators, and StringBuilder, with the following code:

12345678910111213 String res = ""; for ( int i = 0; i < ITERS; ++i ){    final String s = Integer.toString( i );    res = res.concat( s ); //second option: res += s;}        //third option:        StringBuilder res = new StringBuilder(); for ( int i = 0; i < ITERS; ++i ){    final String s = Integer.toString( i );    res.append( s );}
String.Concat + Stringbuilder.append
10.145 sec 42.677 sec 0.012 sec

The result is obvious that--o (n) is significantly more time-complex than O (N2). However, a large number of + operators are used in the actual work-because they are very convenient. To solve this problem, a-xx:+otimizestringconcat switch was introduced starting with JAVA6 update 20. The version between Java 702 and Java 715, which is open by default (in Java 6_41 or off by default), so you might have to open it manually. As with other-XX options, its documentation is pretty poor:

Optimize String concatenation operations where possible. (Introduced in Java 6 Update 20)

Let's assume that Oracle's engineers are doing their best to implement this option. Anecdotal evidence is that it replaces some of the StringBuilder splicing logic with an implementation similar to String.Concat--it is a suitable size for the char[] and then copies the stuff. Finally, a string is generated. Those nested stitching operations it can also support (str1 + (STR2+STR3) +STR4). When this option is turned on, the results show that the + sign is very close to String.Concat:

String.Concat + Stringbuilder.append
10.19 sec 10.722 sec 0.013 sec

We do another test. As mentioned earlier, the default StringBuilder constructor allocates a 16-character buffer. This buffer is expanded when you need to add a 17th character. We append the numbers from 100 to 100000 to the back of "12345678901234" respectively. The length of the result string should be between 17 and 20, so the implementation of the default + operator will require StringBuilder resizing. As a comparison, let's do another test, where we create a StringBuilder (21) directly to ensure that its buffer is large enough to not be re-adjusted:

12 finalString s = BASE + i;final String s = new StringBuilder( 21).append( BASE ).append( i ).toString();

If this option is not turned on, the + sign implementation will be more than half the time of the explicit StringBuilder implementation. When this option is turned on, the results on both sides are the same. But the interesting thing is, even if the implementation of the StringBuilder itself, turned on the switch after the speed actually became faster!

+, switch off +, switch open New StringBuilder (21), switch off New StringBuilder (21), switch on
0.958 sec 0.494 sec 0.663 sec 0.494 sec
    • When converting to a string, avoid using "" strings for conversions. Use the appropriate String.valueof method or the ToString (value) method of the wrapper class.
    • Try to use StringBuilder for string concatenation. Check the old code and replace it with the stringbuffer that can be replaced.
    • Use the-XX:+OPTIMIZESTRINGCONCAT option introduced in Java 6 Update 20 to improve the performance of string concatenation. It has been opened by default in the most recent version of Java7, but it is still closed in Java 6_41.

Performance optimization of Java strings

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: