I recently wrote a lot of Java-related blogs, because I am performing performance tuning on a java system I have previously made. In this process, I have accumulated some experience and learned a lot. This article also applies.
In my system, there is a query that performs a search on an index in the memory, and then fills all the data items found in a jsonobject, finally, call the tostring function of this jsonobject to convert it into a string and send it over the network.
The experiment shows that if this query is used in large quantities, JVM will frequently call the garbage collection (GC ). At first, I thought that my data structure was not written, which led to resource-consuming searches. Later, I learned through further experiments that the process of converting the search results to strings consumes a lot of memory.
So I extracted the code in the "Convert result to string" step and executed it repeatedly to check the memory consumption. This is the result of the experiment. The last column in interpretation 1:
Rs 200 indicates that a query result contains 200 data items (others, and so on );
The unit of 4600 is kb, indicating the memory consumed by converting such a query result to a string, that is, the blue area in the figure;
The unit of 56 is also kb, indicating the size of the string after the result is converted to the string type. That is, the red area in the figure (hopefully it will be visible );
Therefore, the conversion of such a query result produces a large number of temporary variables, which consume about 4 MB of memory. If you process 250 such queries in one second, it is 1 GB, this makes it easy to understand why minor GC is required for my program in dozens of seconds, even though I have 10 Gb + memory.
The above is just an introduction. Although the experiment is not very rigorous, there will be some deviations in the experiment data, but it can be explained that the string operations of Java itself are very problematic, it may become the bottleneck of your program. Therefore, the following method does not support Java string optimization.
Strings are immutable. A string cannot be altered once created.
The string in Java is immutable, which is the most important property of the string and the root of poor performance. The source code of string is written as follows:
Public final class string <br/> implements Java. io. serializable, comparable <string>, charsequence <br/> {<br/>/** the value is used for character storage. */<br/> private final char value []; <br/> //............... <br/>}< br/>
Char value [] is used to store a specific byte stream. As you can see, it is modified by final, so it cannot be modified once initialized. All the functions provided by the string class that seem to be able to modify the string are actually creating a new string and passing the reference of the new string back. (The substring operation of a substring does not copy the entire string. On the contrary, it only generates a new pointer to the original chararray .)
In particular, when manipulating large strings, the "+" operation is often required (Example 1 ):
String S1 = "string a"; <br/> string S2 = "string B"; <br/> string S3 = S1 + "" + S2; <br/>
If optimization of the compiler is not taken into account (This will be discussed later), a temporary string variable will be generated during the S3 creation process, storing S1 + "", then a temporary variable is generated to connect the previous Temporary Variable WITH S2. Therefore, even in such a simple connection operation, two temporary variables are generated, each of which has its own char [].
Another example (Example 2 ):
String STR = "hello"; <br/> for (INT I = 0; I <10000; ++ I) {<br/> STR = STR + "" + I; <br/>}< br/>
The statement in the for loop requires two temporary variables according to our previous introduction. In this example, 10000*2 temporary variables are created. Similarly, these temporary variables also have their own char [], though they are discarded once. What's even more frightening is that as the loop goes on, char [] in the temporary variable will increase with the increase of Str.
Therefore, if you need to frequently use the "+" operator to connect strings, instead of using string, use stringbuilder instead (Example 3 ):
Stringbuilder sb = new stringbuilder (); <br/> Sb. append ("hello"); <br/> for (INT I = 0; I <10000; ++ I) {<br/> Sb. append (""). append (I); <br/>}< br/> string STR = sb. tostring (); <br/>
If stringbuilder is used, no temporary variables are generated. Instead
Char value [];
Note that it is different from string in that it does not use final modification. Each time append is called, the string to be appended is copied to the value. If the initialized char [] array is filled, stringbuilder automatically calls the void expandcapacity function to double the value size. Therefore, using stringbuilder will not generate any temporary variables.
Of course, it is worth mentioning that the so-called expandcapacity function is actually creating a new char array. Its size is twice the size of the original array, copy the values in the old char array to the new char array. Then, stringbuilder uses the new char array as its internal char value []. Therefore, to avoid copying such an array, we should try to set a reasonable internal array size when initializing stringbuilder:
Stringbuilder sb = new stringbuilder (20005); <br/> Sb. append ("hello"); <br/> for (INT I = 0; I <10000; ++ I) {<br/> Sb. append (""). append (I); <br/>}< br/> string STR = sb. tostring (); <br/>
I will not talk about the use of stringbuilder. I searched a lot of information on the Internet and found a lot of experiment comparison data.
Compiler Optimization:
Currently, JDK automatically optimizes the "+" Operation of string, for example:
String STR = "hello" + "" + "world ";
This line of code will be optimized:
String STR = "Hello World ";
The code in Example 1:
String S3 = S1 + "" + S2;
The compiler will also use stringbuilder for optimization;
However, for a loop similar to example 2, it seems that it cannot be optimized. Therefore, a temporary variable will still be generated every cycle.
After introducing stringbuilder, let's get started: how to convert an object into a string. First look at a piece of code:
Class A {<br/> // Private Data </P> <p> @ override <br/> Public String tostring () {<br/> //........... <br/>}< br/> Class B {<br/> // Private Data </P> <p> @ override <br/> Public String tostring () {<br/> //.......... <br/>}< br/> public class c {<br/> private A; <br/> Private B; </P> <p> private string data; </P> <p> @ override <br/> Public String tostring () {<br/> stringbuilder sb = new stringbuilder (); <br/> Sb. append (data ). append (""). append (. tostring ()). append ("") <br/>. append (B. tostring (); <br/> return sb. tostring (); <br/>}< br/>}
This is a general way to convert an object to a string, that is, to reload its tostring () function.
C contains instances a and B. Therefore, in the tostring () function of C, we will first call the tostring of A and B, connect them to other data in C. In this process, although we also use stringbuilder. tostring () and B. when tostring () is used, two temporary string variables are generated. If A and B are embedded with other types, the tostring () function of A and B also needs to call these types of tostring (), this will lead to more temporary variables.
This is what jsonobject class does. The tostring () function of the outermost layer of jsonobject will call the tostring () of all types of instances embedded in it. Therefore, once this JSON nested hierarchy is large, A large number of temporary variables will be created.
In <Java Performance Tuning>, an improved method is provided:
Class A {<br/> // Private Data </P> <p> Public void appendto (stringbuilder SB) {<br/> Sb. append (......); <br/>}</P> <p> @ override <br/> Public String tostring () {<br/> stringbuilder sb = new stringbuilder (); <br/> appendto (SB); <br/> return sb. tostring (); <br/>}< br/> Class B {<br/> // Private Data <br/> Public void appendto (stringbuilder SB) {<br/> Sb. append (......); <br/>}</P> <p> @ override <br/> Public String tostring () {<br/> stringbuilder sb = new stringbuilder (); <br/> appendto (SB); <br/> return sb. tostring (); <br/>}< br/> public class c {<br/> private A; <br/> Private B; </P> <p> private string data; </P> <p> Public void appendto (stringbuilder SB) {<br/> Sb. append (data ). append (""); <br/>. appendto (SB); <br/> Sb. append (""); <br/> B. appendto (SB); <br/>}</P> <p> @ override <br/> Public String tostring () {<br/> stringbuilder sb = new stringbuilder (); <br/> appendto (SB); <br/> return sb. tostring (); <br/>}< br/>
In the above Code, each class creates a void appendto (stringbuilder SB) method. In this method, each class appends the string to be converted to the given stringbuilder. In the appendto method of the outer class (such as C Class), The appendto method of the nested class is called. The original tostring () function only needs to initialize a stringbuilder, pass the stringbuilder instance to its own appendto function, and finally call the tostring of stringbuilder to return a string variable. No temporary string variables need to be created throughout the process. Only a string is generated in the last step and returned as the result.
I used this method to rewrite my code. Finally, I optimized the code to Kb from each original query result to about KB (in fact, I can optimize it again, but this result is enough for me ).
Conclusion
Although Java differs from C ++ in that it sacrifices certain performance and some control over the details to achieve programming simplicity, however, this does not mean that efficient Java code cannot be written. Of course, this is not simple and requires some skills.
In addition, although this blog describes many advantages of stringbuilder. However, in <Java Performance Tuning>, it defines stringbuilder as "double-edged sword"-a double-edged sword. It indicates that stringbuilder is not suitable for any place. For details, read the book directly.