The implementation of string constants and String.intern in Java

Source: Internet
Author: User

In Java there are constantpool constant pools, constant pools of constants for classes, methods, interfaces, and references to objects that hold string constants that typically hold symbolic links to symbols or real strings.

Let's take a look at a simple code and decompile bytecode

public class Test {public static void main (string[] args) {String test = "Test";}}


Constant Pool: #1 = class #2//Test #2 = Utf8 Test #3 = Class  #4//Java/lang/object #4 = Utf8 java/lang/object #5 = Utf8 <init> #6 = Utf8 () V #7 = Utf8 Code #8 = methodref #3. #9//Java/lang/object. " <init>:() v #9 = Nameandtype #5: #6//"<init>":() v #10 = Utf8 linenumbertabl  E #11 = Utf8 localvariabletable #12 = Utf8 this #13 = Utf8 ltest;   #14 = Utf8 Main #15 = Utf8 ([ljava/lang/string;) V #16 = String #2//  Test #17 = Utf8 args #18 = Utf8 [ljava/lang/string;  #19 = Utf8 ljava/lang/string;    #20 = Utf8 sourcefile #21 = Utf8 test.java{public test (); Flags:acc_public code:stack=1, Locals=1, Args_size=1 0:aload_0 1:invokespecial #8//Method java/lang/object. "  <init> ":() V 4:return linenumbertable:line 2:0 Localvariabletable:start  Length Slot Name Signature 0 5 0 this ltest;    public static void Main (java.lang.string[]);  Flags:acc_public, Acc_static code:stack=1, locals=2, args_size=1 0:ldc #16//      String test 2:astore_1 3:return linenumbertable:line 6:0 line 15:3 Localvariabletable:start Length Slot Name Signature 0 4 0 args [ljava/lang/st               Ring 3 1 1 Test ljava/lang/string;}

In the anti-compile bytecode, we see the definition of the constant pool

#16 = String #2

#2 =utf8 Test

When this string constant content test is stored in a symbolic link symbol when the class is initialized, it holds the char array in UTF-8 encoded C, and the index is stored in #16 instead of # #, which is directly associated with the initialization of the class.

For String test= "test" code corresponding to the calling instruction

0:LDC #16

2:astore_1

You can see a statement split into 2 parts, one LDC #16 and the save reference to the parameter test

So let's take a look at how the LDC Directive is implemented, and in interpreterRuntime.cpp we see the implementation of LDC

Irt_entry (void, INTERPRETERRUNTIME::LDC (javathread* thread, bool wide))//access constant pool constantpooloop pool = m  Ethod (thread)->constants (); int index = wide?  GET_INDEX_U2 (thread, Bytecodes::_ldc_w): get_index_u1 (thread, BYTECODES::_LDC);  Constanttag tag = pool->tag_at (index);    if (Tag.is_unresolved_klass () | | Tag.is_klass ()) {Klassoop Klass = pool->klass_at (index, CHECK);    Oop java_class = Klass->java_mirror ();  Thread->set_vm_result (Java_class); } else {#ifdef ASSERT//If We entered this runtime routine, we believed the tag contained//an unresolved string,    An unresolved class or a resolved class.    However, another thread could has resolved the unresolved string//or class by the time we go there. ASSERT (tag.is_unresolved_string () | | tag.is_string (), "expected string"); #endif oop s_oop = Pool->string_at (Index, C    HECK);  Thread->set_vm_result (S_oop); }irt_end

Because this is a string constant, the code goes to Pool->string_at (index, CHECK) and the code goes to the

Oop Constantpooloopdesc::string_at_impl (constantpoolhandle this_oop, int which, TRAPS) {  oop str = NULL;  Cpslot entry = This_oop->slot_at (which);  if (Entry.is_metadata ()) {    Objectlocker ol (This_oop, THREAD);    if (This_oop->tag_at (which). Is_unresolved_string ()) {      //Intern string      symbol* sym = this_oop-> Unresolved_string_at (which);      str = Stringtable::intern (sym, Check_ (Constantpooloop (NULL)));      This_oop->string_at_put (which, str);    } else {      //Another thread beat us and interned string, read string from constant pool      str = this_oop->resolved _string_at (which);    }  } else {    str = entry.get_oop ();  }  ASSERT (Java_lang_string::is_instance (str), "must be String");  return str;}

Before LDC is called, the string constant value is represented by a symbol, and when LDC is called, a string reference is generated by calling Stringtable::intern, and stored in a constant pool, if the LDC directive is called directly from the constant pool according to the index # Remove the string reference (This_oop->resolved_string_at (which)) from 16.

Stringtable

Stringtable is a string cache table that holds the referenced table of string constants and avoids the overhead of generating a new string.

Stringtable data structure is commonly used in Java Hashtable, first calculate the string hashcode, according to the hashcode to the corresponding array, and then traverse the inside of the linked list structure to compare each character in the string until the same is found. When the data is more, it will cause the search efficiency to slow, Java will enter SafePoint point of time to do a rehash.


After calling the LDC Directive, the C + + char array of symbol is converted into a new Unicode Java char array, and a new string reference is generated to save the reference to Stringtable.

String.intern method

The String.intern () method works by finding the reference saved in the stringtable where the string is located, as shown in the following code

Jvm_entry (jstring, jvm_internstring (jnienv *env, jstring str))  jvmwrapper ("jvm_internstring");  Jvmtivmobjectalloceventcollector OAM;  if (str = = NULL) return null;  OOP string = Jnihandles::resolve_non_null (str);  OOP result = Stringtable::intern (string, check_null);  Return (jstring) jnihandles::make_local (env, result); Jvm_end
We see the familiar stringtable:intern approach, and here is a bit different from LDC, when the reference already exists, and if the string does not exist in Stringtable, The reference to the string is stored directly in the stringtable.


Explanation

Some time ago I saw a blog saying

" using the String.intern () method, you can save a string class to a global string table, and if a Unicode string with the same value is already in the table, the method returns the address of the string already in the table. If you do not have a string of the same value in the table, register your own address in the table "

This explanation is wrong, and for example

String S1=new string ("Kvill");        System.out.println (S1==s1.intern ());  

In fact, this example is wrong because the "Kvill" has called the LDC Directive, which means "Kvill" has generated a string reference

The code is actually similar to the

String test= "Kvill"; string S1 = new string (test);
This makes it clearer to see that this is already a totally two different string object.

And to prove this, just change the program to

Char[] test={' k ', ' V ', ' I ', ' l ', ' l '}; String S1=new string (test); System.out.println (S1==s1.intern ());
Return to True, the preceding sentence is actually right.















Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.