Memory overflow and countermeasures caused by substring and split in Java

Source: Internet
Author: User

The following example illustrates the outofmemoryerror caused by the string substring method:

[Java] view plaincopyprint? Public class testgc {
Private string large = new string (New char [2, 100000]);

Public String getsubstring (){
Return this. Large. substring (0, 2 );
}

Public static void main (string [] ARGs ){
Arraylist <string> substrings = new arraylist <string> ();
For (INT I = 0; I <1000000; I ++ ){
Testgc = new testgc ();
Substrings. Add (testgc. getsubstring ());
}
}
}

Public class testgc {
Private string large = new string (New char [2, 100000]);

Public String getsubstring (){
Return this. Large. substring (0, 2 );
}

Public static void main (string [] ARGs ){
Arraylist <string> substrings = new arraylist <string> ();
For (INT I = 0; I <1000000; I ++ ){
Testgc = new testgc ();
Substrings. Add (testgc. getsubstring ());
}
}
} Run the program and the result is:

Exception in thread "Main" Java. Lang. outofmemoryerror: Java heap Space

Why does this happen? Check the source code of the JDK string substring method to find the cause. The source code is as follows:

[Java] view plaincopyprint? Public string substring (INT beginindex, int endindex ){

If (beginindex <0 ){
Throw new stringindexoutofboundsexception (beginindex );
}
If (endindex> count ){
Throw new stringindexoutofboundsexception (endindex );
}
If (beginindex> endindex ){
Throw new stringindexoutofboundsexception (endindex-beginindex );
}
Return (beginindex = 0) & (endindex = count ))? This:
New String (Offset + beginindex, endindex-beginindex, value );
}

Public string substring (INT beginindex, int endindex ){
If (beginindex <0 ){
Throw new stringindexoutofboundsexception (beginindex );
}
If (endindex> count ){
Throw new stringindexoutofboundsexception (endindex );
}
If (beginindex> endindex ){
Throw new stringindexoutofboundsexception (endindex-beginindex );
}
Return (beginindex = 0) & (endindex = count ))? This:
New String (Offset + beginindex, endindex-beginindex, value );
} The last line of the method calls a private constructor of string, as follows:

[Java] view plaincopyprint? // Package private constructor which shares Value array for speed.

String (INT offset, int count, char value []) {
This. value = value;
This. offset = offset;
This. Count = count;
}

// Package private constructor which shares Value array for speed.
String (INT offset, int count, char value []) {
This. value = value;
This. offset = offset;
This. Count = count;
} According to the access permissions and comments of the constructor, Sun specifically wrote this constructor to optimize performance.
To avoid Memory copying and improve performance, this method does not re-create a char array, but directly reuse the char [] of the original string object by changing the offset.

To identify different string content. That is to say, the small String object produced by substring still points to the char [] of the large string object.

Therefore, the outofmemoryerror problem occurs.
After finding the problem, modify the getsubstring method in the above Code as follows:

[Java] view plaincopyprint? Public String getsubstring (){
Return new string (this. Large. substring (0, 2 ));
}

Public String getsubstring (){
Return new string (this. Large. substring (0, 2 ));
} Returns the result of the substring and returns a new string. If you run the program again, there is no problem with outofmemoryerror. Why? Because

In this case, the public constructor of the string class is called. The source code of this method is as follows:

[Java] view plaincopyprint? Public String (string original ){
Int size = original. count;
Char [] originalvalue = original. value;
Char [] V;
If (originalvalue. length> size ){
// The array representing the string is bigger than the new
// String itself. Perhaps this constructor is being called
// In order to trim the baggage, so make a copy of the array.
Int off = original. offset;
V = arrays. copyofrange (originalvalue, off, off + size );
} Else {
// The array representing the string is the same
// Size as the string, so no point in making a copy.
V = originalvalue;
}
This. offset = 0;
This. Count = size;
This. value = V;
}

Public String (string original ){
Int size = original. count;
Char [] originalvalue = original. value;
Char [] V;
If (originalvalue. length> size ){
// The array representing the string is bigger than the new
// String itself. Perhaps this constructor is being called
// In order to trim the baggage, so make a copy of the array.
Int off = original. offset;
V = arrays. copyofrange (originalvalue, off, off + size );
} Else {
// The array representing the string is the same
// Size as the string, so no point in making a copy.
V = originalvalue;
}
This. offset = 0;
This. Count = size;
This. value = V;
} From the code, we can see that when the length of the value in the string object is greater than count, a char [] is created again and the memory is copied.

In addition to the substring method, the split method of string also has the same problem. The source code of split is as follows:

[Java] view plaincopyprint? Public String [] Split (string RegEx, int limit ){
Urn pattern. Compile (RegEx). Split (this, limit );
}

Public String [] Split (string RegEx, int limit ){
Return Pattern. Compile (RegEx). Split (this, limit );
} We can see that the split method of string is implemented through the split method of pattern. The source code of the split method of pattern is as follows:

[Java] view plaincopyprint? Public String [] Split (charsequence input, int limit ){

Int Index = 0;
Boolean matchlimited = Limit> 0;
Arraylist <string> matchlist = new arraylist <string> ();
Matcher M = matcher (input );

// Add segments before each match found
While (M. Find ()){
If (! Matchlimited | matchlist. Size () <limit-1 ){
String Match = input. Subsequence (index, M. Start (). tostring ();
Matchlist. Add (MATCH );
Index = M. End ();
} Else if (matchlist. Size () = Limit-1) {// last one
String Match = input. Subsequence (index,
Input. Length (). tostring ();
Matchlist. Add (MATCH );
Index = M. End ();
}
}

// If no match was found, return this
If (Index = 0)
Return New String [] {input. tostring ()};

// Add remaining segment
If (! Matchlimited | matchlist. Size () <limit)
Matchlist. Add (input. Subsequence (index, input. Length (). tostring ());

// Construct result
Int resultsize = matchlist. Size ();
If (Limit = 0)
While (resultsize> 0 & matchlist. Get (resultSize-1). Equals (""))
Resultsize --;
String [] result = new string [resultsize];
Return matchlist. sublist (0, resultsize). toarray (result );
}

Public String [] Split (charsequence input, int limit ){
Int Index = 0;
Boolean matchlimited = Limit> 0;
Arraylist <string> matchlist = new arraylist <string> ();
Matcher M = matcher (input );

// Add segments before each match found
While (M. Find ()){
If (! Matchlimited | matchlist. Size () <limit-1 ){
String Match = input. Subsequence (index, M. Start (). tostring ();
Matchlist. Add (MATCH );
Index = M. End ();
} Else if (matchlist. Size () = Limit-1) {// last one
String Match = input. Subsequence (index,
Input. Length (). tostring ();
Matchlist. Add (MATCH );
Index = M. End ();
}
}

// If no match was found, return this
If (Index = 0)
Return New String [] {input. tostring ()};

// Add remaining segment
If (! Matchlimited | matchlist. Size () <limit)
Matchlist. Add (input. Subsequence (index, input. Length (). tostring ());

// Construct result
Int resultsize = matchlist. Size ();
If (Limit = 0)
While (resultsize> 0 & matchlist. Get (resultSize-1). Equals (""))
Resultsize --;
String [] result = new string [resultsize];
Return matchlist. sublist (0, resultsize). toarray (result );
} Row 9th in the method: stirng match = input. Subsequence (intdex, M. Start (). tostring ();
The subsequence method of the string class is called. The source code of this method is as follows:

[Java] view plaincopyprint? Public charsequence subsequence (INT beginindex, int endindex ){

Return this. substring (beginindex, endindex );
}

Public charsequence subsequence (INT beginindex, int endindex ){
Return this. substring (beginindex, endindex );
} The Code shows that the substring method of the string class is finally called, so the same problem exists. Split small objects, directly use

Use the char [] of the original string object.

 

After reading the substring method of stringbuilder and stringbuffer, this problem does not exist. The source code is as follows:

[Java] view plaincopyprint? Public string substring (INT start, int end ){
(Start <0)
Throw new stringindexoutofboundsexception (start );
(End> count)
Throw new stringindexoutofboundsexception (end );
(Start> end)
Throw new stringindexoutofboundsexception (end-Start );
Return new string (value, start, end-Start );
}

Public string substring (INT start, int end ){
If (start <0)
Throw new stringindexoutofboundsexception (start );
If (end> count)
Throw new stringindexoutofboundsexception (end );
If (Start> end)
Throw new stringindexoutofboundsexception (end-Start );
Return new string (value, start, end-Start );
} The last line calls the public constructor of the string class. The source code of the method is as follows:

[Java] view plaincopyprint? Public String (char value [], int offset, int count ){
If (offset <0 ){
Throw new stringindexoutofboundsexception (offset );
}
If (count <0 ){
Throw new stringindexoutofboundsexception (count );
}
// Note: offset or count might be near-1 >>> 1.
If (Offset> value. Length-count ){
Throw new stringindexoutofboundsexception (Offset + count );
}
This. offset = 0;
This. Count = count;
This. value = arrays. copyofrange (value, offset, offset + count );
}

Public String (char value [], int offset, int count ){
If (offset <0 ){
Throw new stringindexoutofboundsexception (offset );
}
If (count <0 ){
Throw new stringindexoutofboundsexception (count );
}
// Note: offset or count might be near-1 >>> 1.
If (Offset> value. Length-count ){
Throw new stringindexoutofboundsexception (Offset + count );
}
This. offset = 0;
This. Count = count;
This. value = arrays. copyofrange (value, offset, offset + count );
} Instead of directly using the char [] of the original string object, this method re-copies the memory.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.