Analysis of string connection performance in Java

Source: Internet
Author: User
Tags array length

Summary: If the number of strings is less than 4 (not including 4), use String.Concat () to concatenate string, otherwise the length of the final result is computed first, and then the length is used to create a StringBuilder. Finally, use this StringBuilder to connect all strings.
I suggest that if you determine that the number of strings that need to be connected is less than 4, connect directly using String.Concat (), although Stringbundler can help you handle this automatically, but creating a string[] and those method calls is a meaningless overhead.

String in Java is a very special class, and one of the main reasons for it is:the string is immutable (immutable).

The immutability of string is the cornerstone of Java security and thread safety, and without it, Java will become vulnerable.

But the cost of immutability is expensive, and when you try to "change" a string, you are actually creating a new string, and the original string will be garbage in most cases (garbage). Thanks to Java's automatic garbage collection mechanism, developers don't have to be too paranoid about these string garbage ck on table. But if you completely ignore the existence of these garbage, and even wantonly use the string API, your program will undoubtedly suffer from a large number of GC (garbage collection) activities.

In the history of the JDK, there has been some effort to improve the garbage creation overhead of string. Add the StringBuilder to stringbuffer,jdk1.5 in JDK1.0. StringBuffer and StringBuilder are functionally identical, and the difference is that StringBuffer is thread-safe, and StringBuilder is not. The vast majority of string connection operations occur in a method call, that is, a single-threaded working environment, so thread safety is absolutely superfluous here. So the JDK's advice to developers is to use StringBuffer or StringBuilder when you're working on a string connection, using StringBuilder instead of StringBuffer when you're sure that the connection operation only occurs in a single-threaded environment. In most cases compliance with this recommendation can significantly improve performance compared to direct use of String.Concat (), but some situations in the real world are far more complex. This advice does not give you the best performance benefits! Today we are going to delve into the performance of the string connection operation and hope to help you understand the problem thoroughly.

First, there is a rumor that some people say that SB (StringBuffer and StringBuilder) always have better performance than String.Concat (). This statement is inaccurate! String.Concat () is worth more than SB under certain conditions. Let's prove it by an example.

Task:
Connect two strings,

String a = "ABCDEFGHIJKLMNOPQ"; Length=17
String B = "ABCDEFGHIJKLMNOPQR"; Length=18


Description
We are going to analyze the garbage production of different connection schemes. In the discussion we will ignore the garbage caused by the input parameters because they are not created by the connection code. In addition, we only calculate the char[inside string], because except for this character array string the other fields are very small, can completely ignore their effect on the GC.

Scenario 1:
Using String.Concat ()

Code:

String result = A.concat (b);

This line of code is simply not easy, but let's take a look at the source code of Sun JDK java.lang.String and figure out how this call is going to work.
Source snippet for Sun JDK java.lang.String:

1 public string concat (String str) {
2 int otherlen = Str.length ();
3 if (Otherlen = = 0) {
4 return this;
5}
6 char buf[] = new Char[count + Otherlen];
7 getChars (0, Count, buf, 0);
8 Str.getchars (0, Otherlen, buf, Count);
9 return new String (0, Count + otherlen, buf);
10}
11
String (int offset, int count, Char value[]) {
This.value = value;
This.offset = offset;
This.count = count;
16}

This code first creates a new char[], the array length is a.length () + b.length (), and then copies the contents of a and B to the new array, and finally uses the array to create a new string object. Here we have to pay special attention to the constructor used, this constructor has only the package access permission, which directly uses the incoming char[] as an internal character array of the newly generated string, without any copy protection. This constructor must be a package-level access permission, or you can use it to create a mutable string object (Modify the incoming char[] after the string is constructed). The JDK code in Java.lang guarantees that the incoming array will not be modified after this constructor is called, plus Java security does not allow third-party code to join the Java.lang package (you can taste Try putting your class into the Java.lang package, this class will not load successfully, so the immutability of the string will not be destroyed.

The whole process we did not create any garbage objects (we their assertions, A and B are incoming parameters, not the connection code created, so even if they become garbage we also do not calculate), so everything good!

Scenario 2:
Using Sb.append (), where I use StringBuilder for analysis, is exactly the same for StringBuffer.

Code:

String result = new StringBuilder (). append (a). Append (b). ToString ();

This line of code is significantly more complex than the code in the String.Concat () scenario, but how does it perform? Let's analyze it in 4 steps for New StringBuilder (), append (a), append (b), and ToString ().
1) New StringBuilder ().
Let's take a look at the source code of StringBuilder:

1 public StringBuilder () {
2 super (16);
3}
4
5 abstractstringbuilder (int capacity) {
6 value = new Char[capacity];
7}

It creates a char[of size 16] and has not created any garbage objects so far.
2) append (a).
Continue to see the source code:

1 public StringBuilder append (String str) {
2 super.append (str);
3 return this;
4}
5 Public abstractstringbuilder append (String str) {
6 if (str = = null) str = "NULL";
7 int len = Str.length ();
8 if (len = = 0) return this;
9 int newcount = count + len;
if (Newcount > Value.length)
Expandcapacity (Newcount);
Str.getchars (0, Len, value, count);
Count = Newcount;
return this;
15}
expandcapacity void (int minimumcapacity) {
newcapacity int = (value.length + 1) * 2;
if (Newcapacity < 0) {
newcapacity = Integer.max_value;
The else if (Minimumcapacity > Newcapacity) {
newcapacity = minimumcapacity;
22}
* Value = arrays.copyof (value, newcapacity);
24}

This code first ensures that SB's internal char[] has enough space left, which creates a new size of 34 char[], and the previous size of 16 char[] becomes a garbage object. Mark Point 1, we created the first garbage object with a size of 16 char.
3) append (b).
The same logic, first ensure that the internal char[] has enough space left, which creates a new size of 70 char[], and the previous size of 34 char[] becomes a garbage object. Mark Point 2, we created a second garbage object with a size of 34 char.
4) toString ()
Look at the source code:

1 public String toString () {
2//Create a copy, don ' t share the array
3 return new String (value, 0, count);
4}
5 Public String (char value[], int offset, int count) {
6 if (offset < 0) {
7 throw new Stringindexoutofboundsexception (offset);
8}
9 if (count < 0) {
Ten throw new Stringindexoutofboundsexception (count);
11}
//Note:offset or count might be near-1>>>1.
if (Offset > Value.length-count) {
+ throw new Stringindexoutofboundsexception (offset + count);
15}
This.offset = 0;
This.count = count;
This.value = Arrays.copyofrange (value, offset, offset+count);
19}

To focus on this constructor, it has public access, so it has to do copy protection, otherwise it is possible to destroy the immutability of the string. But this creates a garbage object. Mark Point 3, we created a third garbage object with a size of 70 char.

so we created 3 garbage objects, with a total size of 16+34+70=120 char! Java uses UNICODE-16 encoding, which means 240byte of garbage!

One thing that can improve SB's performance is to change the code to:

String result = new StringBuilder (a.length () + b.length ()). Append (a). Append (b). ToString ();

Calculate for yourself, this time we only created 1 garbage objects, the size of 17+18=35 char, or not, right?

With String.Concat (), SB creates a "lot" of rubbish (anything bigger than 0 and 0 is infinitely big!). , and I'm sure you've noticed that SB has more method calls than String.Concat () (Stack operations are not free).

Further analysis can be found (self-analysis), when you connect less than 4 strings (excluding 4), String.Concat () is more efficient than SB.

So when you want to connect more than 3 strings (without 3), we should use SB, right?

Not all right!

SB has a natural inherent flaw, which uses a dynamically growing internal char[] to append a new string, and when you append a new string and SB reaches the internal capacity limit, it must enlarge the internal buffer. Then SB gets a bigger char[], and the previously used char[] becomes rubbish. If we can tell SB exactly how long the end result will be, it can save a lot of rubbish generated by unnecessary growth. But it's not easy to predict the length of the final result!

Predicting the number of strings to concatenate is much easier than predicting the length of the final result. We can cache the string to be concatenated, and then at the last moment (when calling toString (), calculate the exact length of the final result, create a SB to connect the string with that length, and save a lot of unnecessary intermediate garbage char[]. Although it is sometimes difficult to predict exactly how many strings to concatenate, we can follow the example of SB, using a dynamically growing string[] to cache a string, because string[] is more than the original char[] Much smaller (the real-world string is generally more than one character), so a dynamically growing string[] is much cheaper than a dynamically growing char[. The next stringbundler I want to introduce is based on this principle.

1 public Stringbundler () {
2 _array = new String[_default_array_capacity]; _default_array_capacity = 16
3}
4
5 public stringbundler (int arraycapacity) {
6 if (arraycapacity <= 0) {
7 throw new IllegalArgumentException ();
8}
9 _array = new String[arraycapacity];
10}
11


The first constructor creates a stringbundler with a default array size of 16, and the second constructor allows you to specify an initial capacity. Whenever you call append (), you do not actually perform a string join operation, but instead place the string in the cache array.

1 public stringbundler append (String s) {
2 if (s = = null) {
3 S = stringpool.null;
4}
5 if (_arrayindex >= _array.length) {
6 expandcapacity ();
7}
8 _array[_arrayindex++] = s;
9 return this;
10}
11

If the number of strings you append exceeds the cache array capacity, the internal string[] will grow dynamically.

1 protected void expandcapacity () {
2 string[] NewArray = new string[_array.length << 1];
3 system.arraycopy (_array, 0, NewArray, 0, _array.length);
4 _array = NewArray;
5}
6


It is much cheaper to expand a string[] than to expand char[]. Because string[] is relatively small, and the frequency of growth is far lower than the original char[].
When you have finished appending all, call ToString () to get the final result.

1 public String toString () {
2 if (_arrayindex = = 0) {
3 return Stringpool.blank;
4}
5 String s = null;
6 if (_arrayindex <= 3) {
7 s = _array[0];
8 for (int i = 1; i < _arrayindex; i++) {
9 s = s.concat (_array[i]);
10}
11}
else {
length int = 0;
+ for (int i = 0; i < _arrayindex; i++) {
Length + = _array[i].length ();
16}
StringBuilder sb = new StringBuilder (length);
for (int i = 0; i < _arrayindex; i++) {
Sb.append (_array[i]);
20}
s = sb.tostring ();
22}
The return s;
24}
25

If the number of strings is less than 4 (not including 4), use String.Concat () to concatenate strings, otherwise the length of the final result is computed first, and then the length is used to create a StringBuilder. Finally, use this StringBuilder to connect all strings.

I suggest that if you determine that the number of strings that need to be connected is less than 4, connect directly using String.Concat (), although Stringbundler can help you handle this automatically, but creating a string[] and those method calls is a meaningless overhead.

If you want to learn more about Stringbundler, you can view Liferay's Jira connection,
http://support.liferay.com/browse/LPS-6072

Well, that's enough explaining, it's time to look at the performance test results, and these test results will show you how much stringbundler can improve your performance!

We're going to compare String.Concat (), Stringbuffer,stringbuilder, using Stringbundler with the default constructor, The performance difference when connecting to a string using the given initialization capacity constructor of the Stringbundler.

There are two parts to the specific comparison:

    1. Compare the time consumption of various connection modes when the same number of connection operations are completed.
    2. Compare the amount of garbage produced for various connections in the same number of connection operations.


The connection string used in the test is 17, the number of strings to be concatenated is from 72 to 2, and 100,000 repetitions are performed for each number of connections.
For 1, I only compared the results from 40 to 2 o'clock for the number of connections, because the JVM's preheating would have an impact on the previous result (JIT consumes a lot of CPU time).
For 2, I used all the results to compare and analyze, because the JVM's preheating does not affect the total amount of garbage generation (JIT will also produce garbage, but for each test should be approximately equal, I only than the poor value, so the impact can be ignored).

By the way, I use the following JVM parameters to generate the GC log:

-xx:+useserialgc-xloggc:gc.log-xx:+printgcdetails

SERIALGC is used to eliminate the effect of multi-processor on test results.

The following picture shows the difference in time consumption between various connection modes:

As can be seen from the diagram:

    1. String.Concat () has the best performance when connecting 2 or 3 strings
    2. Stringbundler is better than SB on the whole
    3. StringBuilder is superior to StringBuffer (thanks to a large number of synchronization operations saved)

For 3, in the future I will further discuss the blog, in our own code and JDK code, there are a lot of similar situations, many synchronization protection is unnecessary (at least in certain circumstances is not necessary), such as the JDK IO package. If we can bypass these unnecessary synchronization operations, we can significantly improve program performance.

Let's analyze the following GC logs (GC logs are not 100% accurate to tell you the amount of garbage, but it can tell you a general trend)

String.Concat () 229858963K
StringBuffer 34608271K
StringBuilder 34608144K
Stringbundler (default constructor) 21214863K
Stringbundler (explicitly specifying the string number constructor) 19562434K


As can be seen from the statistics, Stringbundler saves a lot of string garbage.

Finally, I'll leave you with 4 suggestions.

    1. When you connect a 2 or 3 string, use String.Concat ().
    2. If you want to connect more than 3 strings (not including 3), and you can accurately predict the length of the final result, use Stringbuilder/stringbuffer and set the initialization capacity.
    3. If you want to connect more than 3 strings (without 3) and you are not able to accurately predict the length of the final result, use Stringbundler.
    4. If you use Stringbundler, and you can predict the number of strings to concatenate, use the constructor that specifies the initialization capacity.

If you are lazy! Using Stringbundler directly, he is the best choice in most cases, and in other cases, although he is not the best choice, it can provide adequate performance protection.

Here I have provided a stringbundler that eliminates the dependency on Liferay other classes of files for everyone to download and use.

Analysis of string connection performance in Java

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.