In-depth parsing of the String.intern () method

Source: Internet
Author: User
Tags new set

Reprint: http://tech.meituan.com/in_depth_understanding_string_intern.html

Introduction

In the JAVA language there are 8 basic types and a more special type String . These types provide a constant pool concept in order to make them faster and more memory-efficient during operation. Chang is similar to a cache provided at the Java system level.

The 8 basic types of constant pools are system-coordinated, and String the constant pools of types are more specific. There are two main ways to use it:

    • Objects declared directly with double quotation marks are String stored directly in the constant pool.
    • If the object is not declared with double quotation marks String , you can use String the provided intern method. The Intern method queries the current string for existence from the string constant pool and puts the current string in a constant pool if it does not exist

Next, we'll talk about the String#intern method mainly.

first, the realization principle of intern

Let's take a closer look at how it's implemented.

1,java Code
/** * Returns A canonical representation for the string object. * <p> * A Pool of strings, initially empty, is maintained privately by the * class <code>String</code>  . * <p> * When the Intern method was invoked, if the pool already contains a * string equal to this <code>strin G</code> object as determined by * the {@link #equals (Object)} method, and then the string from the pool is * returned . Otherwise, this <code>String</code> object are added to the * pool and a reference to this <code>string&  Lt;/code> object is returned. * <p> * It follows for any of the strings <code>s</code> and <code>t</code>, * <code& Gt;s.intern () &nbsp;==&nbsp;t.intern () </code> is <code>true</code> * if and only if <code&  Gt;s.equals (t) </code> is <code>true</code>. * <p> * all literal strings and string-valued constant expressions is * interned. String literALS is defined in section 3.10.5 of the * <cite>the java&trade;  Language Specification</cite>. * * @return A string that have the same contents as this string, but was * guaranteed to was from a pool of uniqu  E strings. */Public native String intern ();

String#internMethod, this method is a native method, but the comments are very clear. "If the current string exists in the constant pool, the current string is returned directly. If this string is not in the constant pool, the string is placed in the constant pool and then returned.

2,native Code

After Jdk7, Oracle has taken over the JAVA source and is not open to the outside, according to the main developers of the JDK declaration openJdk7 and JDK7 use the same sub-main code, but the branch code will be slightly changed. So you can directly follow the OPENJDK7 source code to explore the implementation of intern.

# # # #native实现代码:
\openjdk7\jdk\src\share\native\java\lang\string.c

Java_java_lang_String_intern(JNIEnv *env, jobject this)  {      return JVM_InternString(env, this);  }

\openjdk7\hotspot\src\share\vm\prims\jvm.h

/* * java.lang.String */  JNIEXPORT jstring JNICALL  JVM_InternString(JNIEnv *env, jstring str);

\openjdk7\hotspot\src\share\vm\prims\jvm.cpp

// String support ///////////////////////////////////////////////////////////////////////////  JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))    JVMWrapper("JVM_InternString");    JvmtiVMObjectAllocEventCollector oam;    if (str == NULL) return NULL;    oop string = JNIHandles::resolve_non_null(str);    oop result = StringTable::intern(string, CHECK_NULL);  return (jstring) JNIHandles::make_local(env, result);  JVM_END

\openjdk7\hotspot\src\share\vm\classfile\symboltable.cpp

oop StringTable::intern(Handle string_or_null, jchar* name,                          int len, TRAPS) {    unsigned int hashValue = java_lang_String::hash_string(name, len);    int index = the_table()->hash_to_index(hashValue);    oop string = the_table()->lookup(index, name, len, hashValue);    // Found    if (string != NULL) return string;    // Otherwise, add to symbol to table    return the_table()->basic_add(index, string_or_null, name, len,                                  hashValue, CHECK_NULL);  }

\openjdk7\hotspot\src\share\vm\classfile\symboltable.cpp

oop StringTable::lookup(int index, jchar* name,                          int len, unsigned int hash) {    for (HashtableEntry<oop>* l = bucket(index); l != NULL; l = l->next()) {      if (l->hash() == hash) {        if (java_lang_String::equals(l->literal(), name, len)) {          return l->literal();        }      }    }    return NULL;  }

Its approximate implementation structure is:
Java uses JNI to invoke the method implemented by C + +, StringTable intern StringTable which intern is similar to the implementation in Java HashMap , but does not automatically expand. The default size is 1009.

Note that the string pool of string is a fixed size Hashtable , the default value size is 1009, if you put in string pool string very much, it will cause the hash conflict is serious, resulting in a long list, The immediate effect of a long list is that the String.intern performance will drop significantly when called (because one is needed).

Fixed in Jdk6 StringTable , which is 1009 in length, so if there are too many strings in the constant pool, the efficiency will decrease quickly. In Jdk7, StringTable the length can be specified by one parameter:

    • -XX:StringTableSize=99991
Two, the difference between Jdk6 and jdk7 under Intern

It is believed that many JAVA programmers do the same thing String s = new String("abc") as this statement to create several objects of the topic. The main purpose of this topic is to examine the programmer's constant pool of string objects or not. The above statement is a string object that creates 2 objects, the first object is the "ABC" string that is stored in a constant pool, and the second object is in the Java heap.

Take a look at the code:

public static void main(String[] args) {    String s = new String("1");    s.intern();    String s2 = "1";    System.out.println(s == s2);    String s3 = new String("1") + new String("1");    s3.intern();    String s4 = "11";    System.out.println(s3 == s4);}

Printing results are

    • Jdk6 underfalse false
    • Jdk7 underfalse true

Why do you explain it later, and then s3.intern(); cut the statement down one line and put it String s4 = "11"; back. will be s.intern(); put in the String s2 = "1"; back. What's the result?

public static void main(String[] args) {    String s = new String("1");    String s2 = "1";    s.intern();    System.out.println(s == s2);    String s3 = new String("1") + new String("1");    String s4 = "11";    s3.intern();    System.out.println(s3 == s4);}

Printing results are:

    • Jdk6 underfalse false
    • Jdk7 underfalse false

# # # #1, explanations in JDK6

Note: The Green Line in the figure points to the contents of the string object. The black line represents the address point.

As shown in. First of all, in the case of JDK6, all of the above print is false in Jdk6, because Chang in Jdk6 is placed in the Perm zone, and the Perm area is completely separate from the normal JAVA Heap area. As mentioned above, if the string declared with quotation marks is generated directly in the string constant pool, the new string object is placed in the JAVA Heap area. So comparing the object address of a JAVA Heap region to the object address of a string constant pool is certainly not the same, even if String.intern the calling method is not related.

# # # #2, explanations in JDK7

Say again the situation in Jdk7. To be clear here, in Jdk6 and the previous version, the string Chang is placed in the heap Perm area, the Perm area is a static class zone, mainly store some load class information, constant pool, method fragments and other content, the default size is only 4m, once the constant pool of large use intern is a direct result java.lang.OutOfMemoryError: PermGen space of errors. So in the Jdk7 version, the string constant pool has moved from the Perm zone to the normal Java Heap area. Why move, the Perm area is too small is a major reason, of course, according to the message that JDK8 has been directly canceled the Perm area, and the new set up a meta-region. It should be the JDK developers think that the Perm area is no longer suitable for the development of JAVA now.

Formally, after the string constant pool is moved to the JAVA Heap area, explain why the above printing results are available.

  • In the first piece of code, look at the S3 and S4 strings first. String s3 = new String("1") + new String("1");, this code now generates 2 final objects, which are the objects in the string constant pool "1" and the S3 reference in the JAVA Heap. There are 2 anonymous in the middle new String("1") we don't discuss them. The S3 Reference object content is "11" at this point, but there is no "11" object in the constant pool.
  • s3.intern();the next line of code is to put the "11" string in the S3 in the string constant pool, because there is no "11" string in the constant pool, so the general practice is to generate a "11" object in the constant pool as represented in the Jdk6 diagram, the key point is that the constant pool in jdk7 is not Perm area, this piece has been adjusted. There is no need to store another copy of the object in a constant pool, and you can directly store references in the heap. This reference points to the object referenced by the S3. This means that the reference address is the same.
  • The last sentence of the String s4 = "11"; code "11" is the display of the declaration, so it will go directly to the constant pool created, when created to find that the object has already been, this is also a reference to the S3 Reference object. So S4 refers to the same point as S3. So the final comparison s3 == s4 is true.

  • Look at the S and S2 objects again. The String s = new String("1"); first line of code generates 2 objects. A string object in the "1" and JAVA Heap in the constant pool. s.intern();This sentence is the S object to go to the constant pool to find that "1" is already in the constant pool.

  • The next String s2 = "1"; line of code is to generate a S2 reference to the "1" object in the constant pool. The result is a significantly different reference address for S and S2. The picture is very clear.

    • Let's look at the second piece of code, from the second picture above. The first and second pieces of code are changed in s3.intern(); the order that they are placed String s4 = "11"; behind. Thus, when the String s4 = "11"; declaration S4 is first executed, there is no "11" object in the constant pool, and after execution, the "11" object is the new object produced by the S4 declaration. Then s3.intern(); , when executed, the "11" object in the constant pool already exists, so the S3 and S4 references are different.
    • In the second code, s and S2 code, s.intern(); This sentence will not have any effect on the back, because the object pool in the execution of the first sentence code String s = new String("1"); has been generated when the "1" object. The S2 declarations below are directly referenced from the constant pool. The reference addresses of S and S2 are not equal.

# # #小结
From the example code above, we can see that the JDK7 version has made some changes to the intern operation and the constant pool. Mainly includes 2 points:

    • Moved a string constant pool from the Perm to the Java heap
    • String#internmethod, if an object exists in the heap, the object's reference is saved directly without recreating the object.
third, use intern1,intern Correct Use example

Now let's take a look at a more common String#intern example of how to use it.

The code is as follows:

static final int MAX = 1000 * 10000;static final String[] arr = new String[MAX];public static void main(String[] args) throws Exception {    Integer[] DB_DATA = new Integer[10];    Random random = new Random(10 * 10000);    for (int i = 0; i < DB_DATA.length; i++) {        DB_DATA[i] = random.nextInt();    }    long t = System.currentTimeMillis();    for (int i = 0; i < MAX; i++) {        //arr[i] = new String(String.valueOf(DB_DATA[i % DB_DATA.length]));         arr[i] = new String(String.valueOf(DB_DATA[i % DB_DATA.length])).intern();    }    System.out.println((System.currentTimeMillis() - t) + "ms");    System.gc();}

The parameters to run are: The -Xmx2g -Xms2g -Xmn1500M above code is a demo code, where there are two statements are different, one is to use intern, one is not used intern. Results such as

2160ms

826ms

With these results, we found that code that does not use intern generates 1000w strings, which takes up about 640m of space. The code using the intern generated 1345 strings, occupying about 133k of the total space. In fact, through the observation program only used 10 strings, so the accurate calculation should be exactly the difference of 100w times. While the examples are extreme, they do accurately reflect the huge savings in space generated by intern.

Careful classmates will find that there is some growth in time after using the Intern method. This is because every time in the program is used, new String and then the intern operation time, which is really unavoidable if there is sufficient memory space, but we usually use, the memory space is certainly not infinite, do not use intern occupy space causes the JVM The time for garbage collection is much greater than this time. After all, the use of 1000w times here intern only 1 more seconds more time.

In-depth parsing of the String.intern () method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.