Java Program Optimization: string manipulation, basic operation methods and other optimization strategies (i)

Source: Internet
Author: User
Tags joins truncated

In view of the practical problems in the process of writing Java programs, this paper is divided into two parts, first, the string related operations, data segmentation, processing super-large string objects and other solutions and optimization recommendations, and give specific code examples, and then the data definition, Operational logic optimization and other aspects of the proposed solutions and optimization recommendations, and give a specific code example. Since the experiments in this article are based on Lenovo L430 notebooks, I5-3320CPU,4GB memory, running code on other machines may result in different results, subject to your own experimental environment.

String manipulation Optimizations

String Object

A string object, or its equivalent object (such as a char array), always occupies the largest chunk of space in memory, so how to handle strings efficiently is the key to improving the overall performance of the system.

A string object can be thought of as an extension and further encapsulation of a char array, which consists mainly of 3 parts: The char array, the offset, and the length of the String. A char array represents the contents of a string, which is a superset of the string represented by a string object. The actual contents of the String also need to be positioned and intercepted in this char array by offset and length.

There are 3 basic features of String:

1. invariance;

2. For the optimization of constant pool;

3. Final definition of the class.

Invariance means that once a String object is generated, it can no longer be changed. This feature of String can be generalized into the invariant (immutable) pattern, where the state of an object does not change after the object is created. The main function of invariant mode is that when an object needs to be shared by multiple threads, and when access is frequent, it can omit the time of synchronization and lock waiting, thus greatly improving system performance.

optimization for a constant pool means that when two string objects have the same value, they reference only the same copy in the constant pool, which can save significant memory space when the same string repeats itself.

The following code str1, str2, STR4 refer to the same address, but STR3 re-opens up a memory space, although STR3 occupies the heap space alone, but it points to the same entity as the str1. The code is shown in Listing 1 below.

Listing 1. Sample code
public class Stringdemo {public static void main (string[] args) {String str1 = "abc"; String str2 = "abc"; String str3 = new String ("abc"); String STR4 = str1; System.out.println ("is str1 = str2?") + (STR1==STR2)); System.out.println ("is str1 = STR3?") + (STR1==STR3)); System.out.println ("is str1 refer to STR3?") + (Str1.intern () ==str3.intern ())); System.out.println ("is str1 = STR4" + (STR1==STR4)); System.out.println ("is str2 = STR4" + (STR2==STR4)); System.out.println ("is STR4 refer to STR3?") + (Str4.intern () ==str3.intern ())); }}

The output is shown in Listing 2.

Listing 2. Output results
is str1 = Str2?trueis str1 = Str3?falseis str1 refer to Str3?trueis str1 = Str4trueis str2 = Str4trueis STR4 refer to Str3 ? true

SubString Tips for use

String's substring method source creates a new string object in the last line, new string (Offset+beginindex,endindex-beginindex,value); The purpose of this line of code is to be able to efficiently and quickly share a char array object within a String. However, in this method of intercepting a string by an offset, the original content value array of string is copied into the new substring. It is assumed that if the original string is large and the length of the truncated character is very short, then the truncated substring contains all the contents of the native string and occupies the corresponding memory space, and only determines its actual value by offset and length. This algorithm increases the speed but wastes space.

The following code demonstrates the use of the substring method to intercept a very small string in a very large string, if the substring method of string would cause memory overflow, it would ensure normal operation if a new string method was created repeatedly.

Listing 3.substring Method Demo
import java.util.arraylist;import java.util.list;public class stringdemo {  Public static void main (String[] args) { list<string> handler =  New arraylist<string> ();  for (int i=0;i<1000;i++) { hugestr h = new  hugestr ();  improvedhugestr h1 = new improvedhugestr ();  handler.add ( H.getsubstring (1, 5));  handler.add (H1.getsubstring (1, 5));  } }  static  class hugestr{ private string str = new string (new char[800000]) ;  public string getsubstring (int begin,int end) { return str.substring (begin ,  end); } }  static class improvedhugestr{ private string  Str = new string (new char[10000000]);  public string getsubstring (int  Begin,int end) { return new string (str.substring (begin, end));  } }} 

The output results are shown in Listing 4.

Listing 4. Output results
Exception in thread "main" Java.lang.OutOfMemoryError:Java heap spaceat java.util.Arrays.copyOf (Unknown Source) at Java.lang.StringValue.from (Unknown source) at java.lang.string.<init> (Unknown source) at stringdemo$ Improvedhugestr.<init> (stringdemo.java:23) at Stringdemo.main (Stringdemo.java:9)

Improvedhugestr can work because it uses the string constructor without a memory leak to regenerate the string object so that a string object with a memory leak problem returned by the substring () method loses all strong references. The garbage collector is recognized as garbage object for recycling, which ensures the stability of the system memory.

The split method of string supports passing in regular expressions to help with strings, but simple string splitting is poor performance.

Compare the processing string performance of the split method and the StringTokenizer class with the code shown in Listing 5.

Split string Approach Discussion

The split method of string supports passing in regular expressions to help with strings, but the disadvantage is that the algorithm it relies on has poor performance when splitting simple strings. The code shown in Listing 5 compares the split method of string with the difference in performance when calling the StringTokenizer class to handle strings.

The split method for listing 5.String demonstrates
import java.util.stringtokenizer;public class splitandstringtokenizer { public  Static void main (String[] args) { string orgstr = null; stringbuffer  sb = new stringbuffer ();  for (int i=0;i<100000;i++) { sb.append (i);  Sb.append (",");  } orgstr = sb.tostring (); long start =  System.currenttimemillis ();  for (int i=0;i<100000;i++) { orgstr.split (",");  } long  end = system.currenttimemillis ();  system.out.println (End-start);   start =  system.currenttimemillis ();  string orgstr1 = sb.tostring ();  StringTokenizer  st = new stringtokenizer (ORGSTR1, ",");  for (int i=0;i<100000;i++) {  St.nexttoken ();  } st = new stringtokenizer (ORGSTR1, ","); end =  System.currenttimemillis ();  system.out.prinTLN (End-start);   start = system.currenttimemillis (); string orgstr2 =  Sb.tostring ();  string temp = orgstr2; while (true) { string splitstr =  null; int j=temp.indexof (",");  if (j<0) break; splitstr=temp.substring (0,&NBSP;J);  temp = temp.substring (j+1); } temp=orgstr2; end =  System.currenttimemillis ();  system.out.println (end-start);  }}

The output is shown in Listing 6:

Listing 6. Run output results
390151615

When a StringTokenizer object is generated, the next segmented string can be obtained through its nextToken () method, and the Hasmoretoken method can be used to know if there are more strings to be processed. The comparison found that split takes a very long time and is handled quickly with StringTokenizer objects. We try to implement the string segmentation algorithm ourselves, and the string segmentation algorithm, which is combined with the substring method and the IndexOf method, helps to quickly slice and replace the string.

Because string is an immutable object, the string object produces a relatively poor performance when it is necessary to modify the string, such as String joins, substitutions, and so on. However, the JVM thoroughly optimizes the code to synthesize a single long string at compile time for multiple concatenated operations.

The result of the above example is that the split algorithm compares each character, so that when the string is large, it needs to read the whole string into memory, find the matching characters one by one, and it will be time consuming. While the StringTokenizer class allows an application to enter a token (tokens), the object of the StringTokenizer class maintains its current position in the internally identified string. Some operations allow strings in existing locations to be processed in advance. The value of a token is returned by the string that obtained the StringTokenizer class object that it once created.

Listing 7.split class source code
Import java.util.arraylist;public class split {public string[] split ( Charsequence input, int limit)  { int index = 0; boolean  Matchlimited = limit > 0; arraylist<string> matchlist = new  ArrayList<String>  matcher m = matcher (Input); // add  Segments before each match found while (M.find ())  { if  (!matchlimited  | |  matchlist.size ()  < limit - 1)  { String match =  Input.subsequence (Index, m.start ()). ToString ();  matchlist.add (match);  index = m.end () ; } else if  (Matchlist.size ()  == limit - 1)  { // last  one string match = input.subsequence (Index,input.length ()). ToString ();  Matchlist.add (match);  index = m.End ();  } } // if no match was found, return this if   (index == 0) { return new string[] {input.tostring ()}; }// Add  remaining segment if  (!matchlimited | |  matchlist.size ()  < limit) { matchlist.add (Input.subsequence (Index, input.length ()) . toString ());  }// construct result int resultsize = matchlist.size ();  if  (limit == 0) { while  (resultsize > 0 &&  Matchlist.get (resultSize-1). Equals (""))  resultsize--;  string[] result = new  string[resultsize];  return matchlist.sublist (0, resultsize). ToArray (result);  } }}

Split uses the data object and the character search algorithm to complete the data segmentation, which is suitable for the data quantity less scene.

Merging strings

Because string is an immutable object, the string object produces a relatively poor performance when it is necessary to modify the string, such as String joins, substitutions, and so on. However, the JVM thoroughly optimizes the code to synthesize a single long string at compile time for multiple concatenated operations. For very large string objects, we use string objects to connect, connect using the Concat method, use the StringBuilder class, and more, as shown in Listing 8.

Listing 8. Sample code to handle a very large String object
public class stringconcat { public static void  Main (String[] args) { string str = null; string result =  "";   long start = system.currenttimemillis ();  for (int i=0;i<10000;i++) { str  = str + i; } long end = system.currenttimemillis ();  System.out.println (End-start);   start = system.currenttimemillis ();  for (int i=0; i<10000;i++) { result = result.concat (string.valueof (i)); } end =  System.currenttimemillis ();  system.out.println (End-start);  start =  System.currenttimemillis ();  stringbuilder sb = new stringbuilder ();  for (int  i=0;i<10000;i++) { sb.append (i);  } end = system.currenttimemillis ();  System.out.println (End-start);  }} 

The output is shown in Listing 9.

Listing 9. Run output results
3751870

Although the first method compiler determines that the addition of a String is run as a StringBuilder implementation, the compiler does not make a sufficiently intelligent decision, and each cycle generates a new StringBuilder instance that greatly reduces system performance.

StringBuffer and StringBuilder both implement the Abstractstringbuilder abstract class, with almost the same external pretext, the biggest difference between the two is that stringbuffer to almost all of the methods are synchronized, and StringBuilder does not have any synchronization. Because the method synchronization needs to consume certain system resources, therefore, the StringBuilder efficiency is better than stringbuffer. However, in multithreaded systems, StringBuilder cannot guarantee thread safety and cannot be used. The code is shown in Listing 10.

Listing 10.StringBuilderVSStringBuffer
Public class stringbufferandbuilder {public stringbuffer contents = new  stringbuffer ();  public stringbuilder sbu = new stringbuilder ();p ublic  void log (string message) { for (int i=0;i<10;i++) { /*contents.append (i);  Contents.append (message);  contents.append ("\ n");  */contents.append (i); Contents.append ("\ n"); Sbu.append (i); Sbu.append ("\ n");  } public void getcontents () { //system.out.println (contents);  system.out.println ("Start print stringbuffer"); SYSTEM.OUT.PRINTLN (contents);  system.out.println ("End print stringbuffer");} Public void getcontents1 () { //system.out.println (contents);  system.out.println ("Start  print stringbuilder "); System.out.println (SBU);  system.out.println ("End print stringbuilder");}    public static void main (String[] args)  throws interruptedexception { stringbufferandbuilder ss = new  Stringbufferandbuilder ();  runthread t1 = new runthread (ss, "Love"); runthread t2  = new runthread (SS, "Apple"); Runthread t3 = new runthread (ss, "egg"); T1.start (); T2.start (); T3.start (); T1.join (); T2.join (); T3.join ();}  }class runthread extends thread{ string message; stringbufferandbuilder  buffer; public runthread (stringbufferandbuilder buffer,string message) {  This.buffer = buffer;this.message = message; } public void run () {  while (True) { buffer.log (message);  //buffer.getcontents (); Buffer.getcontents1 (); try {sleep (5000000);}  catch  (interruptedexception e)  {// TODO Auto-generated catch  Blocke.printstacktrace ();}}  } }

The output results are shown in Listing 11.

Listing 11. Run results
Start print stringbuffer0123456789end print stringbufferstart print stringbufferstart print Stringbuilder01234567890123456789end Print Stringbufferstart Print Stringbuilder0123456789012345678901234567890123456789end print stringbuilderend print stringbuilderstart print Stringbuffer012345678901234567890123456789end Print Stringbufferstart Print Stringbuilder012345678901234567890123456789end Print StringBuilder

The StringBuilder data did not operate as expected. The expansion strategy of StringBuilder and StringBuffer is to double the original capacity, request the memory space with the new capacity, create a new char array, and then copy the contents of the original array into the new array. Therefore, the expansion of large objects involves a large amount of memory replication operations. Performance can be improved if the size is pre-evaluated.

Java Program Optimization: string manipulation, basic operation methods and other optimization strategies (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.