Most of the content in this article is taken from the following two articles:
Http://blog.xebia.com/2007/10/04/leaking-memory-in-java /,
Http://www.iteye.com/topic/626801
Use an extreme example to illustrate the outofmemoryerror caused by the string substring method:
public class TestGC { private String large = new String(new char[100000]); public String getSubString() { return this.large.substring(0,2); } public static void main(String[] args) { ArrayList<String> subStrings = new ArrayList<String>(); for (int i = 0; i <1000000; i++) { TestGC testGC = new TestGC(); subStrings.add(testGC.getSubString()); } } }
Run the program and the result is:
Exception in thread "Main" Java. Lang. outofmemoryerror: Java heap Space
Why does this happen? Check the source code of the JDK string substring method to find the cause. The source code is as follows:
public String substring(int beginIndex, int endIndex) {if (beginIndex < 0) { throw new StringIndexOutOfBoundsException(beginIndex);}if (endIndex > count) { throw new StringIndexOutOfBoundsException(endIndex);}if (beginIndex > endIndex) { throw new StringIndexOutOfBoundsException(endIndex - beginIndex);}return ((beginIndex == 0) && (endIndex == count)) ? this : new String(offset + beginIndex, endIndex - beginIndex, value); }
The last line of this method calls a private constructor of string, as follows:
// Package private constructor which shares value array for speed. String(int offset, int count, char value[]) {this.value = value;this.offset = offset;this.count = count; }
From the access permissions and comments of the constructor, Sun specifically wrote this constructor to optimize performance.
To avoid Memory copying and improve performance, this method does not re-create a char array. Instead, it directly reuses the char [] of the original string object and identifies different string content by changing the offset and length. That is to say, the small String object produced by substring still points to the char [] of the large string object, which leads to the outofmemoryerror problem.
After finding the problem, modify the getsubstring method in the above Code as follows:
public String getSubString() { return new String(this.large.substring(0,2)); }
Returns the result of the substring and returns a new string. If you run the program again, there is no problem with outofmemoryerror. Why? This is because the public constructor of the string class is called. The source code of this method is as follows:
public String(String original) {int size = original.count;char[] originalValue = original.value;char[] v; if (originalValue.length > size) { // The array representing the String is bigger than the new // String itself. Perhaps this constructor is being called // in order to trim the baggage, so make a copy of the array. int off = original.offset; v = Arrays.copyOfRange(originalValue, off, off+size); } else { // The array representing the String is the same // size as the String, so no point in making a copy. v = originalValue; }this.offset = 0;this.count = size;this.value = v; }
The Code shows that when the length of the value in the string object is greater than count, a char [] is created again and the memory is copied.
In addition to the substring method, the split method of string also has the same problem. The source code of split is as follows:
public String[] split(String regex, int limit) {return Pattern.compile(regex).split(this, limit); }
We can see that the split method of string is implemented through the split method of pattern. The source code of the split method of pattern is as follows:
public String[] split(CharSequence input, int limit) { int index = 0; boolean matchLimited = limit > 0; ArrayList<String> matchList = new ArrayList<String>(); Matcher m = matcher(input); // Add segments before each match found while(m.find()) { if (!matchLimited || matchList.size() < limit - 1) { String match = input.subSequence(index, m.start()).toString(); matchList.add(match); index = m.end(); } else if (matchList.size() == limit - 1) { // last one String match = input.subSequence(index, input.length()).toString(); matchList.add(match); index = m.end(); } } // If no match was found, return this if (index == 0) return new String[] {input.toString()}; // Add remaining segment if (!matchLimited || matchList.size() < limit) matchList.add(input.subSequence(index, input.length()).toString()); // Construct result int resultSize = matchList.size(); if (limit == 0) while (resultSize > 0 && matchList.get(resultSize-1).equals("")) resultSize--; String[] result = new String[resultSize]; return matchList.subList(0, resultSize).toArray(result); }
Row 9th in the method: stirng match = input. Subsequence (intdex, M. Start (). tostring ();
The subsequence method of the string class is called. The source code of this method is as follows:
public CharSequence subSequence(int beginIndex, int endIndex) { return this.substring(beginIndex, endIndex); }
The Code shows that the substring method of the string class is finally called, so the same problem exists. The split small object directly uses the char [] of the original string object.
After reading the substring method of stringbuilder and stringbuffer, this problem does not exist. The source code is as follows:
public String substring(int start, int end) {if (start < 0) throw new StringIndexOutOfBoundsException(start);if (end > count) throw new StringIndexOutOfBoundsException(end);if (start > end) throw new StringIndexOutOfBoundsException(end - start); return new String(value, start, end - start); }
The last line calls the public constructor of the string class. The source code of the method is as follows:
public String(char value[], int offset, int count) { if (offset < 0) { throw new StringIndexOutOfBoundsException(offset); } if (count < 0) { throw new StringIndexOutOfBoundsException(count); } // Note: offset or count might be near -1>>>1. if (offset > value.length - count) { throw new StringIndexOutOfBoundsException(offset + count); } this.offset = 0; this.count = count; this.value = Arrays.copyOfRange(value, offset, offset+count); }
Instead of directly using the char [] of the original string object, this method re-copies the memory.