Reprint: http://tech.meituan.com/in_depth_understanding_string_intern.html
Introduction
In the JAVA language there are 8 basic types and a more special type String
. These types provide a constant pool concept in order to make them faster and more memory-efficient during operation. Chang is similar to a cache provided at the Java system level.
The 8 basic types of constant pools are system-coordinated, and String
the constant pools of types are more specific. There are two main ways to use it:
- Objects declared directly with double quotation marks are
String
stored directly in the constant pool.
- If the object is not declared with double quotation marks
String
, you can use String
the provided intern
method. The Intern method queries the current string for existence from the string constant pool and puts the current string in a constant pool if it does not exist
Next, we'll talk about the String#intern
method mainly.
first, the realization principle of intern
Let's take a closer look at how it's implemented.
1,java Code
/** * Returns A canonical representation for the string object. * <p> * A Pool of strings, initially empty, is maintained privately by the * class <code>String</code> . * <p> * When the Intern method was invoked, if the pool already contains a * string equal to this <code>strin G</code> object as determined by * the {@link #equals (Object)} method, and then the string from the pool is * returned . Otherwise, this <code>String</code> object are added to the * pool and a reference to this <code>string& Lt;/code> object is returned. * <p> * It follows for any of the strings <code>s</code> and <code>t</code>, * <code& Gt;s.intern () == t.intern () </code> is <code>true</code> * if and only if <code& Gt;s.equals (t) </code> is <code>true</code>. * <p> * all literal strings and string-valued constant expressions is * interned. String literALS is defined in section 3.10.5 of the * <cite>the java™ Language Specification</cite>. * * @return A string that have the same contents as this string, but was * guaranteed to was from a pool of uniqu E strings. */Public native String intern ();
String#intern
Method, this method is a native method, but the comments are very clear. "If the current string exists in the constant pool, the current string is returned directly. If this string is not in the constant pool, the string is placed in the constant pool and then returned.
2,native Code
After Jdk7, Oracle has taken over the JAVA source and is not open to the outside, according to the main developers of the JDK declaration openJdk7 and JDK7 use the same sub-main code, but the branch code will be slightly changed. So you can directly follow the OPENJDK7 source code to explore the implementation of intern.
# # # #native实现代码:
\openjdk7\jdk\src\share\native\java\lang\string.c
Java_java_lang_String_intern(JNIEnv *env, jobject this) { return JVM_InternString(env, this); }
\openjdk7\hotspot\src\share\vm\prims\jvm.h
/* * java.lang.String */ JNIEXPORT jstring JNICALL JVM_InternString(JNIEnv *env, jstring str);
\openjdk7\hotspot\src\share\vm\prims\jvm.cpp
// String support /////////////////////////////////////////////////////////////////////////// JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str)) JVMWrapper("JVM_InternString"); JvmtiVMObjectAllocEventCollector oam; if (str == NULL) return NULL; oop string = JNIHandles::resolve_non_null(str); oop result = StringTable::intern(string, CHECK_NULL); return (jstring) JNIHandles::make_local(env, result); JVM_END
\openjdk7\hotspot\src\share\vm\classfile\symboltable.cpp
oop StringTable::intern(Handle string_or_null, jchar* name, int len, TRAPS) { unsigned int hashValue = java_lang_String::hash_string(name, len); int index = the_table()->hash_to_index(hashValue); oop string = the_table()->lookup(index, name, len, hashValue); // Found if (string != NULL) return string; // Otherwise, add to symbol to table return the_table()->basic_add(index, string_or_null, name, len, hashValue, CHECK_NULL); }
\openjdk7\hotspot\src\share\vm\classfile\symboltable.cpp
oop StringTable::lookup(int index, jchar* name, int len, unsigned int hash) { for (HashtableEntry<oop>* l = bucket(index); l != NULL; l = l->next()) { if (l->hash() == hash) { if (java_lang_String::equals(l->literal(), name, len)) { return l->literal(); } } } return NULL; }
Its approximate implementation structure is:
Java uses JNI to invoke the method implemented by C + +, StringTable
intern
StringTable
which intern
is similar to the implementation in Java HashMap
, but does not automatically expand. The default size is 1009.
Note that the string pool of string is a fixed size Hashtable
, the default value size is 1009, if you put in string pool string very much, it will cause the hash conflict is serious, resulting in a long list, The immediate effect of a long list is that the String.intern
performance will drop significantly when called (because one is needed).
Fixed in Jdk6 StringTable
, which is 1009 in length, so if there are too many strings in the constant pool, the efficiency will decrease quickly. In Jdk7, StringTable
the length can be specified by one parameter:
-XX:StringTableSize=99991
Two, the difference between Jdk6 and jdk7 under Intern
It is believed that many JAVA programmers do the same thing String s = new String("abc")
as this statement to create several objects of the topic. The main purpose of this topic is to examine the programmer's constant pool of string objects or not. The above statement is a string object that creates 2 objects, the first object is the "ABC" string that is stored in a constant pool, and the second object is in the Java heap.
Take a look at the code:
public static void main(String[] args) { String s = new String("1"); s.intern(); String s2 = "1"; System.out.println(s == s2); String s3 = new String("1") + new String("1"); s3.intern(); String s4 = "11"; System.out.println(s3 == s4);}
Printing results are
- Jdk6 under
false false
- Jdk7 under
false true
Why do you explain it later, and then s3.intern();
cut the statement down one line and put it String s4 = "11";
back. will be s.intern();
put in the String s2 = "1";
back. What's the result?
public static void main(String[] args) { String s = new String("1"); String s2 = "1"; s.intern(); System.out.println(s == s2); String s3 = new String("1") + new String("1"); String s4 = "11"; s3.intern(); System.out.println(s3 == s4);}
Printing results are:
- Jdk6 under
false false
- Jdk7 under
false false
# # # #1, explanations in JDK6
Note: The Green Line in the figure points to the contents of the string object. The black line represents the address point.
As shown in. First of all, in the case of JDK6, all of the above print is false in Jdk6, because Chang in Jdk6 is placed in the Perm zone, and the Perm area is completely separate from the normal JAVA Heap area. As mentioned above, if the string declared with quotation marks is generated directly in the string constant pool, the new string object is placed in the JAVA Heap area. So comparing the object address of a JAVA Heap region to the object address of a string constant pool is certainly not the same, even if String.intern
the calling method is not related.
# # # #2, explanations in JDK7
Say again the situation in Jdk7. To be clear here, in Jdk6 and the previous version, the string Chang is placed in the heap Perm area, the Perm area is a static class zone, mainly store some load class information, constant pool, method fragments and other content, the default size is only 4m, once the constant pool of large use intern is a direct result java.lang.OutOfMemoryError: PermGen space
of errors. So in the Jdk7 version, the string constant pool has moved from the Perm zone to the normal Java Heap area. Why move, the Perm area is too small is a major reason, of course, according to the message that JDK8 has been directly canceled the Perm area, and the new set up a meta-region. It should be the JDK developers think that the Perm area is no longer suitable for the development of JAVA now.
Formally, after the string constant pool is moved to the JAVA Heap area, explain why the above printing results are available.
- In the first piece of code, look at the S3 and S4 strings first.
String s3 = new String("1") + new String("1");
, this code now generates 2 final objects, which are the objects in the string constant pool "1" and the S3 reference in the JAVA Heap. There are 2 anonymous in the middle new String("1")
we don't discuss them. The S3 Reference object content is "11" at this point, but there is no "11" object in the constant pool.
s3.intern();
the next line of code is to put the "11" string in the S3 in the string constant pool, because there is no "11" string in the constant pool, so the general practice is to generate a "11" object in the constant pool as represented in the Jdk6 diagram, the key point is that the constant pool in jdk7 is not Perm area, this piece has been adjusted. There is no need to store another copy of the object in a constant pool, and you can directly store references in the heap. This reference points to the object referenced by the S3. This means that the reference address is the same.
The last sentence of the String s4 = "11";
code "11" is the display of the declaration, so it will go directly to the constant pool created, when created to find that the object has already been, this is also a reference to the S3 Reference object. So S4 refers to the same point as S3. So the final comparison s3 == s4
is true.
Look at the S and S2 objects again. The String s = new String("1");
first line of code generates 2 objects. A string object in the "1" and JAVA Heap in the constant pool. s.intern();
This sentence is the S object to go to the constant pool to find that "1" is already in the constant pool.
- The next
String s2 = "1";
line of code is to generate a S2 reference to the "1" object in the constant pool. The result is a significantly different reference address for S and S2. The picture is very clear.
- Let's look at the second piece of code, from the second picture above. The first and second pieces of code are changed in
s3.intern();
the order that they are placed String s4 = "11";
behind. Thus, when the String s4 = "11";
declaration S4 is first executed, there is no "11" object in the constant pool, and after execution, the "11" object is the new object produced by the S4 declaration. Then s3.intern();
, when executed, the "11" object in the constant pool already exists, so the S3 and S4 references are different.
- In the second code, s and S2 code,
s.intern();
This sentence will not have any effect on the back, because the object pool in the execution of the first sentence code String s = new String("1");
has been generated when the "1" object. The S2 declarations below are directly referenced from the constant pool. The reference addresses of S and S2 are not equal.
# # #小结
From the example code above, we can see that the JDK7 version has made some changes to the intern operation and the constant pool. Mainly includes 2 points:
- Moved a string constant pool from the Perm to the Java heap
String#intern
method, if an object exists in the heap, the object's reference is saved directly without recreating the object.
third, use intern1,intern Correct Use example
Now let's take a look at a more common String#intern
example of how to use it.
The code is as follows:
static final int MAX = 1000 * 10000;static final String[] arr = new String[MAX];public static void main(String[] args) throws Exception { Integer[] DB_DATA = new Integer[10]; Random random = new Random(10 * 10000); for (int i = 0; i < DB_DATA.length; i++) { DB_DATA[i] = random.nextInt(); } long t = System.currentTimeMillis(); for (int i = 0; i < MAX; i++) { //arr[i] = new String(String.valueOf(DB_DATA[i % DB_DATA.length])); arr[i] = new String(String.valueOf(DB_DATA[i % DB_DATA.length])).intern(); } System.out.println((System.currentTimeMillis() - t) + "ms"); System.gc();}
The parameters to run are: The -Xmx2g -Xms2g -Xmn1500M
above code is a demo code, where there are two statements are different, one is to use intern, one is not used intern. Results such as
2160ms
826ms
With these results, we found that code that does not use intern generates 1000w strings, which takes up about 640m of space. The code using the intern generated 1345 strings, occupying about 133k of the total space. In fact, through the observation program only used 10 strings, so the accurate calculation should be exactly the difference of 100w times. While the examples are extreme, they do accurately reflect the huge savings in space generated by intern.
Careful classmates will find that there is some growth in time after using the Intern method. This is because every time in the program is used, new String
and then the intern operation time, which is really unavoidable if there is sufficient memory space, but we usually use, the memory space is certainly not infinite, do not use intern occupy space causes the JVM The time for garbage collection is much greater than this time. After all, the use of 1000w times here intern only 1 more seconds more time.
In-depth parsing of the String.intern () method