"Memory leakage" caused by string. Split"

Source: Internet
Author: User

Sun has always admired Sun's rigorous and elegant Technology (bless Sun ). The source code of the Java library in Sun JDK is clear and well-regulated, and the javadoc annotation is used meticulously, which is very familiar and comfortable to read. Therefore, in daily work and study, I often read the source code of the Java library? If you encounter a strange problem, the source code will be more helpful.

Let's talk less and get back to the topic. These days, I have been struggling with Java's "Memory leakage" problem. Java applications are constantly occupying and regularly increasing resources, which eventually exceeds the monitoring threshold. Sherlock Holmes has to do it!

Java Memory leakage is not so clearly defined. First, if the JVM does not have a bug, theoretically there will be no "heap space that cannot be recycled", that is, the memory leakage in C/C ++ does not exist in Java. Secondly, if the Java program has always held a reference to an object, but in terms of program logic, this object will no longer be used, we can recognize that this object has been leaked. If the number of such objects is large, it is obvious that a large amount of memory space will be leaked ("waste" is more accurate.

However, the memory leakage mentioned in this Article does not belong to the above reason, so it is enclosed in quotation marks. The specific reason is indeed unexpected. For more information, see the following.

General steps for analyzing memory leaks

 

If we find that the memory occupied by Java applications shows signs of leakage, we generally use the following steps for analysis:

  1. Dumping the heap used by Java applications
  2. Use the Java heap analysis tool to find suspected objects that occupy more memory than expected (usually because of a large number ).
  3. If necessary, you need to analyze the reference relationship between the suspect object and other objects.
  4. Check the source code of the program to find out the cause of the large number of suspect objects.

Dump heap

If the memory of a Java application is exposed, do not kill the application in a hurry. Instead, save the application. For Internet applications, you can switch traffic to other servers. The purpose of saving the scene is to dump the heap of the running JVM.

JDK's built-in jmap tool can do this. The execution method is as follows:

Jmap-dump: format = B, file = heap. Bin <pid>

Format = B Indicates the binary format of the dump file.

The file-heap.bin means that the dump file name is heap. Bin.

<Pid> is the JVM process number.

(In Linux) Run PS aux | grep Java first, find the jvm pid, and then run jmap-dump: format = B, file = heap. bin <pid> to obtain the heap dump file.

Analyze heap

The binary heap dump file is parsed into the human-readable information, which naturally requires the help of professional tools. Memory analyzer is recommended here.

Memory analyzer (MAT) is an open-source project of the eclipse Foundation, donated by SAP and IBM. The software produced by giant companies is still very useful, mat can analyze heap with hundreds of millions of objects, quickly calculate the memory size occupied by each object, reference relationships between objects, and automatically detect suspected objects with memory leaks. It has powerful functions, the interface is user-friendly.

The mat interface is developed based on Eclipse and released in two forms: Eclipse plug-in and javase RCP. The analysis results of Mat are provided in the form of images and reports. In short, I personally like this tool very much. Next, we will post two official screenshots:

 

To put it bluntly, I used mat to open heap. Bin, which is easy to see,The number of char [] is unexpectedly large, occupying more than 90% of the memory.. In general, char [] Does occupy a lot of memory in JVM, because the string object uses char [] as the internal storage. But this char [] is too greedy. Take a closer look,It is found that there are several bytes of char [], each occupying hundreds of KB of memory. This phenomenon indicates that,The Java program saves tens of thousands of large string objects.. In combination with the logic of the program, this should not be, it must have encountered a problem somewhere.

Shunteng

In the suspicious char [], select any one and use the path to GC root function to find the Reference Path of the char []. It is found that the string object is referenced by a hashmap.. This is also unexpected. Java Memory leakage is mostly because the objects are left in the global hashmap and cannot be released. However, the hashmap is used as a cache, and the threshold value of the slow storage entry is set. After the threshold value is reached, it is automatically eliminated. From this logic analysis, there should be no memory leakage. Although the number of bytes has been reached for the string object in the cache, it still does not reach the preset threshold (the threshold value is relatively large because the string object is estimated to be relatively small at that time ).

However, another question has aroused my attention: Why is the cached String object so huge? The internal char [] can be several hundred kb in length.Although the number of string objects in the cache has not reached the threshold, the size of the string object far exceeds our expectation, resulting in a large amount of memory consumption, signs of Memory leakage (to be precise, it should be caused by excessive memory consumption).

To solve this problem, let's take a look at how large string objects are put into hashmap. By viewing the source code of the program, I found that there are indeed large string objects,Instead of putting the string large object into hashmap, we split the string large object (call the string. split method) and put the split string small object into hashmap..

This is strange. In hashmap, it is clearly a small String object after split. How can this space be occupied? Is there a question about the split method of the string class?

View code

With the above questions, I checked the string class code in Sun jdk6, mainly the implementation of the split method:

 

[Java]View plaincopy
  1. Public
  2. String [] Split (string RegEx, int limit ){
  3. Return Pattern. Compile (RegEx). Split (this, limit );
  4. }

 

We can see that the stirng. split method calls the pattern. split method. Continue with the code for the pattern. split method:

 

[Java]View plaincopy
  1. Public
  2. String [] Split (charsequence input, int limit ){
  3. Int Index = 0;
  4. Boolean matchlimited = Limit> 0;
  5. Arraylist <string> matchlist = new
  6. Arraylist <string> ();
  7. Matcher M = matcher (input );
  8. // Add segments before each match found
  9. While (M. Find ()){
  10. If (! Matchlimited | matchlist. Size () <limit-1 ){
  11. String Match = input. Subsequence (index,
  12. M. Start (). tostring ();
  13. Matchlist. Add (MATCH );
  14. Index = M. End ();
  15. } Else if (matchlist. Size () = Limit-1) {// last one
  16. String Match = input. Subsequence (index,
  17. Input. Length (). tostring ();
  18. Matchlist. Add (MATCH );
  19. Index = M. End ();
  20. }
  21. }
  22. // If no match was found, return this
  23. If (Index = 0)
  24. Return New String [] {input. tostring ()};
  25. // Add remaining segment
  26. If (! Matchlimited | matchlist. Size () <limit)
  27. Matchlist. Add (input. Subsequence (index,
  28. Input. Length (). tostring ());
  29. // Construct result
  30. Int resultsize = matchlist. Size ();
  31. If (Limit = 0)
  32. While (resultsize> 0 &&
  33. Matchlist. Get (resultSize-1). Equals (""))
  34. Resultsize --;
  35. String [] result = new string [resultsize];
  36. Return matchlist. sublist (0, resultsize). toarray (result );
  37. }

 

Note that row 3: stirng match = input. Subsequence (intdex, M. Start (). tostring ();

The match here is the split string small object, which is actually the result of the string large object subsequence. Continue to read the code of string. Subsequence:

 

[Java]View plaincopy
  1. Public
  2. Charsequence subsequence (INT beginindex, int endindex ){
  3. Return this. substring (beginindex, endindex );
  4. }

 

String. Subsequence has called string. substring. Continue to see:

 

[Java]View plaincopy
  1. Public String
  2. Substring (INT beginindex, int endindex ){
  3. If (beginindex <0 ){
  4. Throw new stringindexoutofboundsexception (beginindex );
  5. }
  6. If (endindex> count ){
  7. Throw new stringindexoutofboundsexception (endindex );
  8. }
  9. If (beginindex> endindex ){
  10. Throw new stringindexoutofboundsexception (endindex-beginindex );
  11. }
  12. Return (beginindex = 0) & (endindex = count ))? This:
  13. New String (Offset + beginindex, endindex-beginindex, value );
  14. }

 

Looking at lines 12th and 13, we can see that if the content of the substring is the complete original string, the original string object will be returned; otherwise, a New String object will be created, however, this string object seems to use the char [] of the original string object. We confirm this through the string constructor:

 

[Java]View plaincopy
  1. // Package
  2. Private constructor which shares Value array for speed.
  3. String (INT offset, int count, char value []) {
  4. This. value = value;
  5. This. offset = offset;
  6. This. Count = count;
  7. }

 

To avoid Memory copying and speed up, Sun JDK directly reuses the char [] of the original string object, and uses the offset and length to identify different string content. That is to say,The small String object produced by substring still points to the char [] of the original string large object. The same is true for split.. This explains why the char [] of the string object in hashmap is so big.

Cause explanation

In fact, the reason has been analyzed in the previous section. This section will be further organized:

  1. The program obtains a large string object from each request. The internal length of the char [] of this object is hundreds of kb.
  2. The Program Splits large string objects and places the small string objects obtained by split into hashmap for caching.
  3. Sun jdk6 optimized the string. split method. The split stirng object uses the char [] of the original string object directly.
  4. Each string object in hashmap actually points to a huge char []
  5. The maximum size of hashmap is, so the total size of the cached sting object is * k = G.
  6. G-level memory is occupied by the cache, and a large amount of memory is wasted, leading to memory leakage.

Solution

The cause is found, and the solution is available. Split is used, but we should not put the split String object directly into hashmap, but call the string copy constructor string (string original). This constructor is safe, for details, refer to the Code:

 

[Java]View plaincopy
  1. /**
  2. * Initializes a newly created {@ code string} object so that it
  3. Represents
  4. * The same sequence of characters as the argument; in other words,
  5. The
  6. * Newly created string is a copy of the argument string. Unless
  7. * Explicit copy of {@ Code Original} is needed, use of this
  8. Constructor is
  9. * Unnecessary since strings are immutable.
  10. *
  11. * @ Param original
  12. * A {@ code string}
  13. */
  14. Public String (string original ){
  15. Int size = original. count;
  16. Char [] originalvalue = original. value;
  17. Char [] V;
  18. If (originalvalue. length> size ){
  19. // The array representing the string is bigger than the new
  20. // String itself. Perhaps this constructor is being called
  21. // In order to trim the baggage, so make a copy of the array.
  22. Int off = original. offset;
  23. V = arrays. copyofrange (originalvalue, off, off + size );
  24. } Else {
  25. // The array representing the string is the same
  26. // Size as the string, so no point in making a copy.
  27. V = originalvalue;
  28. }
  29. This. offset = 0;
  30. This. Count = size;
  31. This. value = V;
  32. }

 

However, the new string (string) code is weird. Perhaps substring and split should provide an option for programmers to control whether to reuse the char [] of the string object.

Is it a bug?

 

Although the implementation of substring and split causes the current problem, can this be counted as a bug in the string class? I personally think it is hard to say. Because such optimization is reasonable, the results of substring and spit must be continuous subsequences of the original string. It can only be said that string is not only a core class, but also an equally important type as the original type for JVM.

JDK implements various possible optimizations to the string. But optimization brings us worry. Our programmers can use them well if they understand them well.

"Memory leakage" caused by string. Split"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.