Memory leaks caused by the split method of the String class

Source: Internet
Author: User

Original address: http://jarfield.iteye.com/admin/blogs/583946

Always admired the rigor and elegance of Sun's approach to technology (poor sun). The source code of the Java library in the Sun JDK, even the annotations are clear, the specification standard fan, the use of Javadoc annotation is meticulous, read very familiar and comfortable. Therefore, in daily work and learning, often read the Java Library Source code, the joy. If you encounter a strange problem, the source code help is even greater.

Gossip less, return to the point. These days, has been in the Java "memory leak" problem entangled. The memory consumed by Java applications continues to rise regularly, exceeding the monitoring threshold. Sherlock Holmes had to take a shot.

Speaking of Java memory leaks, in fact, the definition is not so clear. First, if the JVM has no bugs, there is no "heap space that cannot be recycled" in theory, which means that memory leaks in C/s are not in Java. Second, if the Java program has been holding a reference to an object, but from the procedural logic, the object will never be used again, then we can assume that the object has been compromised. If the number of such objects is large, it is clear that a lot of memory space is compromised ("waste" is more accurate).

However, this article to say the memory leak, does not belong to the above reason, therefore quotes. Its specific reasons are indeed unexpected. For more information, please see the following explanations. General steps for analyzing memory leaks

If you find a leak in memory consumed by Java applications, we typically use the following steps to analyze the heap dump used by Java applications using the Java Heap Analysis tool to find out where the memory footprint is more than expected (typically because too many) suspects are needed, You need to analyze the referential relationships of suspect objects and other objects. Review the source code of the program to find out why the number of suspects is excessive. dump Heap

If there is a memory leak in the Java application, do not worry about killing the application, but save the site. If it is an Internet application, you can cut traffic to another server. The purpose of saving the site is to dump the heap of the running JVM.

JDK has its own jmap tool that can do this thing. It is executed by: Java code jmap-dump:format=b,file=heap.bin <pid>

Jmap-dump:format=b,file=heap.bin <pid>

The meaning of format=b is that the dump comes out of the file in binary format.

The meaning of File-heap.bin is that the file name of the dump is heap.bin.

<pid> is the process number of the JVM.

(under Linux) perform PS aux first | grep Java, find the JVM's PID, then execute Jmap-dump:format=b,file=heap.bin <pid>, and get heap dump file. analyze Heap

The binary heap dump file is parsed into human-readable information, which is naturally required with the help of a professional tool, which is recommended here memory Analyzer.

Memory Analyzer, referred to as Mat, is an open-source project of the Eclipse Foundation, donated by SAP and IBM. The software produced by giant companies is still very good, mat can analyze the heap of hundreds of millions of-level objects, quickly calculate the memory size of each object, the reference relationship between objects, and detect the suspect of memory leak, powerful and user-friendly.

The Mat interface is based on Eclipse Development and is published in two forms: the Eclipse plug-in and the Eclipe RCP. Mat's analysis results are provided in the form of pictures and statements, at a glance. In a word, the individual still likes this tool very much. Here are two official screenshots:


To start with, I opened the heap.bin with Mat, it is easy to see,char[] number out of its expectation, occupy more than 90% of the memory . In general, char[] does take up a lot of memory in the JVM, as well as a very large number, because string objects are char[as internal storage. But this time the char[] too greedy, careful observation, found that there are tens of thousands of char[], each occupies hundreds of K of memory . This phenomenon indicates that theJava program holds tens of thousands of large string objects . With the logic of the program, this is not supposed to be, there must be a problem somewhere.

The Clues

in suspicious char[], select one arbitrarily, using the path to GC root function, find the reference path to the char[, and find that the string object is referenced by a hashmap . This is also expected to happen, the Java memory leak is mostly because the object is left in the global HashMap can not be released. However, the HashMap is used as a cache, setting the threshold of the cache entry, which is automatically eliminated when the threshold is reached. From this logical analysis, there should be no memory leaks. Although the string object in the cache has reached tens of thousands of, it still does not have a pre-set threshold (the threshold setting is large because the estimated string object is smaller).

However, another question caught my attention: Why the cached string object is so large. The length of the internal char[] is up to hundreds of K. Although the number of string objects in the cache has not yet reached the threshold, the string object is much larger than we expected, resulting in a significant amount of memory consumption, which is a sign of memory leaks (and, to be precise, excessive memory consumption) .

Take a further look at this question and see how the string object was put into the hashmap. By looking at the source code of the program, I found that there was a large string object, but did not put a large string object into HashMap. Instead, the string large object is split (called the String.Split method), and the split string object is placed in the HashMap .

This is strange, put in the HashMap is obviously split after the string small object, how can occupy so much space. Is there a problem with the split method of the String class?

View Code

With that in view, I looked at the code for the string class in Sun JDK6, mainly the implementation of the Split method:

Java code public string[] Split (String regex, int limit) { return Pattern.compile (regex    ). Split (this, limit); Public string[] Split (String regex, int limit) {return pattern.compile (regex). Split (this, limit);

As you can see, the Stirng.split method calls the Pattern.split method. Continue to see the code for the Pattern.split method:

Java code PublicString[] Split (charsequence input,intLimit) {intindex = 0;Booleanmatchlimited = limit > 0; Arraylist<string> matchlist =NewArraylist<string> ();            Matcher m = Matcher (input); ADD segments before each match found while(M.find ()) {if(!matchlimited | | matchlist.size () < limit-1)                    {String match = input.subsequence (index, M.start ()). ToString ();                    Matchlist.add (match);                index = M.end (); }Elseif  (Matchlist.size ()  == limit - 1)  { // last one                     string match  = input.subsequence (index,                                                            input.length ()). ToString ();                     Matchlist.add (match);                     Index = m.end ();                }         &nbsP;&NBSP;&NBSP;&NBSP}            // if no match  was found, return this            if(index = = 0) returnNewString[] {input.tostring ()}; ADD remaining segmentif(!matchlimited | | matchlist.size () < limit) Matchlist.add (input.subsequence (Index, Input.length ()). T            Ostring ()); Construct resultintResultSize = Matchlist.size ();if(limit = 0) while(resultsize > 0 && matchlist.get (resultSize-1). Equals ("")) resultsize--; String[] result =NewString[resultsize]; return matchlist.sublist (0, resultsize). ToArray (result);        }   public string[] Split (charsequence input, int limit) {int index = 0; b Oolean matchlimited = limit > 0; arraylist<string> matchlist = new arraylist<string> (); Matcher m = Matcher (input); ADD segments before each match found while (M.find ()) {if (!matchlimited | | matchlist.size () < limit-1) {String m Atch = input.subsequence (index, M.start ()). ToString (); Matchlist.add (match); index = M.end (); else if (matchlist.size () = = limit-1) {//last one String match = input.subsequence (index, Input.length ()). ToString () ; Matchlist.add (match); index = M.end (); }//If no match is found, return this if (index = 0) return new string[] {input.tostring ()}; ADD remaining segment if (!matchlimited | | matchlist.size () < limit) Matchlist.add (input.subsequence (Index, INPUT.L Ength ()). ToString ()); construct result int resultsize = Matchlist.size (); if (limit == 0 while (resultsize > 0 && matchlist.get (resultSize-1). Equals ("")) resultsize--; String[] result = new String[resultsize]; Return matchlist.sublist (0, resultsize). ToArray (result); }

Watch line 9th: stirng match = Input.subsequence (Intdex, M.start ()). ToString ();

The match here is a split string object, which is actually the result of a string object subsequence. Keep looking at the String.subsequence code:

Java code public charsequence subsequence (int beginindex, int endindex) { retur    N this. substring (beginindex, endindex); Public charsequence subsequence (int beginindex, int endindex) {return this.substring (Beginindex, endindex);}

String.subsequence had called the string.substring, and continued to look:

Java code PublicString substring (intBeginindex,intEndindex) {if(Beginindex < 0) {ThrowNewStringindexoutofboundsexception (Beginindex); }if(Endindex > Count) {ThrowNewStringindexoutofboundsexception (Endindex); }if(Beginindex > Endindex) {ThrowNewStringindexoutofboundsexception (Endindex-beginindex); } return((beginindex = 0) && (endindex = count))? This:NewString (offset + beginindex, endindex-beginindex, value); Public String substring (int beginindex, int endindex) {if (Beginindex < 0) {throw new Stringindexoutofboundsexcep tion (beginindex); } if (Endindex > Count) {throw new stringindexoutofboundsexception (endindex);} if (Beginindex > Endindex) {throw New Stringindexoutofboundsexception (Endindex-beginindex); Return ((beginindex = 0) && (endindex = count)? This:new String (offset + beginindex, endindex-beginindex, value); }

Look at the 11th and 12 lines, we finally see that, if the content of the substring is the complete original string, then the original string object is returned, otherwise a new string object is created, but this string object appears to use the original string object's char[]. We confirm this by using the constructor of string:

Java code//Package Private constructor which shares value array for speed.        String (int offset, int count, Char value[]) { this. Value = value;        this. Offset = offset;        this. Count = Count; }//Package private constructor which shares value array for speed. String (int offset, int count, Char value[]) {this.value = value; this.offset = offset; this.count = count;}

To avoid memory copying and speed, the Sun JDK directly reused the original string object's char[], offset and length to identify different string contents. In other words, the substring of astring object will still point to the char[],split of the original string large object . This explains why the char[of String objects in HashMap are so large. Explanation of Reason

In fact, the previous section has analyzed the reason, this section is sorted again: The program obtains a string large object from each request, the object internal char[] length reaches hundreds of K. The program does a split on a string large object and puts the split string object into the HashMap as a cache. Sun JDK6 Optimized The String.Split method, and the split Stirng object directly used the char[of the original string object, and each string in the HashMap actually pointed to a huge char[] The upper limit of the hashmap is million, so the total size of the cached sting object = million k=g level. G-class memory is cached, and a lot of memory is wasted, causing memory leaks.
Solution

The reason is found, the solution will have. Split is going to be used, but instead of putting the split string object directly into the HashMap, we'll call the copy constructor string (string original) of string, which is safe, and can look at the code:

Java Code     /**        * initializes a newly  created {@code  String} object so that it    represents         * the same sequence of characters as the  argument; in other words,    the        * newly  Created string is a copy of the argument string. unless an        * explicit copy of {@code  original} is  needed, use of this    Constructor is        *  Unnecessary since strings are immutable.        *        *  @param   original        *         a {@code  string}        */         PublicString (string original) {intsize = Original.count;Char[] OriginalValue = Original.value;Char[] v;if(Originalvalue.length > Size)  {//the array representing the string is bigger than the new//string itself. Perhaps this constructor are being called//in order to trim the baggage, so make a copy of the array.intOff = Original.offset;        v = arrays.copyofrange (OriginalValue, off, off+size); }Else{//the array representing the ' is ' same//size as the string, so no. In making a CO            Py.        v = originalvalue; } This. offset = 0;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.