Does substring in Java really cause a memory leak?

Source: Internet
Author: User

Developed in Java, string is the type that our development program can say must be used, string has a substring method to intercept the string, which we must also use often. But you know, about whether substring in Java 6 will cause a memory leak, there are some discussions in foreign forums and communities so that Java officials have flagged them as bugs, and this Java 7 has been re-implemented. Read here Maybe your problem is coming, how can substring cause memory leak? Then we take the problem, go into the little black room, see if substring memory leaks, and how to cause so-called memory leaks.

Basic introduction

The substring method provides two overloads, the first of which is to accept only the method that starts a parameter of the intercept position.

 Public String substring (int

For example, we use the above method, "unhappy". SUBSTRING (2) Returns the result "Happy"

Another overload is to accept a method that starts the intercept position and a parameter that ends the intercept position.

 Public String substring (intint endIndex)

Use this method, "smiles". Substring (1, 5) Returns the result "mile"

Through this introduction we have a basic understanding of the role of substring, so as to facilitate our understanding of the following content.

Preparatory work

Because this problem arises in Java 6, if your Java version number is not Java 6 you need to tweak it.

Terminal tuning (for Mac Systems)

View Java version number

13:03 $ java-"1.8.0_25"1.8.0_25-

Switch to 1.6

Ubuntu uses alternatives--config Java,fedora above using alternatives--config java.

If you use Eclipse, you can select the project, right-click, select Properties (Attributes)-java Compiler (Java compiler) for special designation.

Problem recurrence

This is where the code to reproduce the problem is used in the official Java bug.

 Public classTESTGC {PrivateString largestring =NewString (New byte[100000]); String getString () {return  This. largestring.substring (0,2); }        Public Static voidMain (string[] args) {java.util.ArrayList list=Newjava.util.ArrayList ();  for(inti = 0; i < 1000000; i++) {TESTGC gc=NewTESTGC ();         List.add (Gc.getstring ()); }     } } 

However, the above code, as long as Java 6 (Java 7 and 8 will not throw an exception) will be reported Java.lang.OutOfMemoryError:Java heap space exception, which means that there is not enough heap memory for us to create objects, The JVM has chosen to throw an exception operation.

So someone would say, because you created a TESTGC object in each loop, and although we added ArrayList as a two-character string, this object also stores largestring objects of this size, which inevitably results in oom.

However, you are not right. For example, if we look at this code, we only modify the GetString method.

 Public classTESTGC {PrivateString largestring =NewString (New byte[100000]); String getString () {//return this.largeString.substring (0,2);      return NewString ("AB"); }        Public Static voidMain (string[] args) {java.util.ArrayList list=Newjava.util.ArrayList ();  for(inti = 0; i < 1000000; i++) {TESTGC gc=NewTESTGC ();         List.add (Gc.getstring ()); }       } } 

Executing the above method does not cause an OOM exception because we hold 1 million AB string objects, and the TESTGC object (including the largestring) is freed in Java garbage collection. So there's no memory overflow here.

So what exactly is causing the memory leak? To study this problem, we need to look at the implementation of the method.

In-depth Java 6 implementation

There are three of these properties in the String class

    • The value character array that stores the actual contents of the string
    • Offset the starting position of the string in the character array value
    • The count string contains the length of the character

Implementation of substring in Java 6

 PublicString substring (intBeginindex,intEndIndex) {   if(Beginindex < 0) {       Throw Newstringindexoutofboundsexception (Beginindex); }   if(EndIndex >count) {       Throw Newstringindexoutofboundsexception (EndIndex); }   if(Beginindex >EndIndex) {       Throw NewStringindexoutofboundsexception (EndIndex-beginindex); }   return(Beginindex = = 0) && (endIndex = = count))? This :       NewString (offset + beginindex, EndIndex-beginindex, value); } 

Construction method for the above method call

String ( intintchar  value[]) {   this. Value =   value;    this. offset = offset;    this. Count =

When we read the above code, we should be enlightened, the original is this look AH!

When we call string A's substring to get the string B, in fact, this operation is nothing more than adjust the offset and Count B, the use of the content or a before a value character array, and did not re-create a new exclusive to B content character array.

For example, we have a 1G string a, we use SUBSTRING (0,2) to get a string of only two characters B, if the life cycle of B is longer than a or manually set A is null, when garbage collection is done, a is recycled. B is not recycled, then this 1G memory footprint still exists because B holds a reference to this 1G-sized character array.

See here, you should be able to understand why the above code memory overflow.

Array of shared content characters

In fact, the string generated in substring with the original string shared content array is a great design, thus avoiding the substring character array copying every time. As its documentation illustrates, the shared content character array is in order to be speed. But for the problem in this case, the shared content character array looks a bit lame.

How to Solve

It is possible to use the following code for a case where a 1G string that was previously less common is only 2 characters, so that the content array reference for the 1G string is not held.

New

In this constructor, when the source string content array is longer than the string length, the array is copied, and the new string creates an array of characters containing only the contents of the source string.

 Publicstring (string original) {intSize =Original.count; Char[] OriginalValue =Original.value; Char[] v; if(Originalvalue.length >size) {       //The array representing the String is bigger than the new//String itself. Perhaps this constructor is being called//in order to trim the baggage, so make a copy of the array.      intOff =Original.offset; V= Arrays.copyofrange (OriginalValue, off, off+size); } Else {       //The array representing the String is the same//size as the String, so no point in making a copy.v =OriginalValue; }    This. Offset = 0;  This. Count =size;  This. Value =v;} 

Java 7 Implementation

The implementation of substring in Java 7 discards the previous mechanism for the sharing of the content character array, with the use of an array copy of the substring (excluding itself) to implement a single string that holds its own content.

 PublicString substring (intBeginindex,intEndIndex) {     if(Beginindex < 0) {       Throw Newstringindexoutofboundsexception (Beginindex); }     if(EndIndex >value.length) {Throw Newstringindexoutofboundsexception (EndIndex); }     intSublen = EndIndex-Beginindex; if(Sublen < 0) {       Throw Newstringindexoutofboundsexception (Sublen); }     return(Beginindex = = 0) && (endIndex = = value.length))? This                 : NewString (value, Beginindex, Sublen);} 

The constructor method called in the substring method is used to copy the content character array.

 PublicString (CharValue[],intOffsetintcount) {     if(Offset < 0) {           Throw Newstringindexoutofboundsexception (offset); }     if(Count < 0) {       Throw Newstringindexoutofboundsexception (count); }     //Note:offset or Count might be near-1>>>1.    if(Offset > Value.length-count) {       Throw NewStringindexoutofboundsexception (offset +count); }      This. Value = Arrays.copyofrange (value, offset, offset+count); } 

Is it really a memory leak?

We know that substring may cause memory problems in some cases, but is this called a memory leak?

In fact, personally think this should not be considered as a memory leak, the use of substring generated string B will hold the original string a content array reference, but when both A and B are recycled, the contents of the character array can be garbage collected.

Which version implements the good

About Java 7 changes to substring, received a mixed feedback.

Individuals are more inclined to implement Java 6, which, when substring, uses a shared-content character array, which is faster and does not need to re-request memory. Although there may be memory performance issues in this article, there are ways to resolve them.

Java 7 implementations do not require programmer special action to avoid the problem in this article, but the performance of each substring will always be worse than the implementation of Java 6. This implementation seems a bit "bad".

The value of the problem

Although this problem appears in Java 6 and has been fixed in Java 7, it does not mean that we do not need to know, and Java 7 re-implementation is sprayed very badly.

In fact, the value of this problem is still more valuable, especially the content character array sharing this optimization implementation. I hope that we can provide help and some ideas for future design implementation.

The affected method

Both trim and subsequence have operations that call substring. Changes implemented by Java 6 and Java 7 substring also indirectly affect these methods.

Reference Resources

The following three articles are relatively good, but there are a few problems, I have already marked out, when you read, you need to pay attention to.

    • The substring () Method in JDK 6 and JDK 7 resolves the issue mentioned in the JAVA6 string concatenation is not recommended, for specific reasons can refer to Java details: string concatenation
    • How SubString method works in Java–memory Leak Fixed in JDK 1.7 This article mentions a conceptual error that the new string does not prevent the old string from being recycled, but rather blocks the old string from the contents of the character array. Be careful when reading.
    • JDK-4513622: (str) Keeping a substring of a field prevents GC for object mentioned in this article has a little problem with the use of non-new form, which ignores the existence of a string constant pool, specifically view the following Attention.

Attention

In the code that reproduces the problem above

string getString () {   //      returnnew String ("AB"

This is best not to be written as follows, because the string constant pool exists in the JVM, and "AB" does not recreate the new string, all the variables refer to an object, and the new string () is used to recreate the object each time.

String getString () {       return ' AB '

For a string constant pool, a later article will be introduced.

Recommended vomiting

If you are interested in the content of this article, you can read the following Joshua Bloch written book, although a bit expensive, or in English. Java puzzlers

Original link: http://droidyue.com/blog/2014/12/14/substring-memory-issue-in-java/

Does substring in Java really cause a memory leak?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.