String can be said to be a complicated data type in. net, manyArticleI have introduced it, but it is not very comprehensive. This article comprehensively introduces various internal mechanisms and features of string. This article is a revised version. I have made some mistakes in the previous article. Thanks to the help of the following three netizens: Hellgate, anytao, and eaglet!
String is a special data type, which is both a primitive type and a reference type. during compilation and running ,. net has done some optimization work on it. The formal optimization work will sometimes confuse programmers and make string look hard to comprehend. This article is divided into two chapters, a total of four sections, to talk about the strange side of string.
1. Constant string
To fully understand the stirng type, you must first understand the value type and reference type in. net. In C #, the following data types are value types:
Bool, byte, Char, Enum, sbyte, and numeric type (including void type)
The following data types are reference types:
Class, interface, Delegate, object, stirng
Have you seen it? stirng, which we are going to discuss, is in it. Declared string variables are stored in the heap, which is a complete reference type.
As a result, manyCodeIf you have any questions, will the string type also "pull the whole body? Let's take a look at the following three lines of code:
String A = "str_1 ";
String B =;
A = "str_2 ";
Don't be bored. You must be clear about this! In the above Code, the "=" in line 3rd has a hidden secret: its function can be understood as new, rather than modifying the variable ". The following is the Il code:
. Maxstack 1
. Locals Init ([0] string,
[1] string B)
Il_0000: NOP
Il_0001: ldstr "str_1"
Il_0006: stloc.0
Il_0007: ldloc.0
Il_0008: stloc.1
Il_0009: ldstr "str_2"
Il_000e: stloc.0 // the above two lines correspond to C # code A = "str_2 ";
Il_0015: Ret
Lines 1st and 6 of the Il code are displayed. The ldstr command creates the string "str_1" and associates the string with the variable "; lines 7 and 8 pop up the value at the top of the stack and associate it with the variable "B". 9 and 10 create the string "str_2" by ldstr ", associated with the variable "a" (instead of modifying the old value of variable A as we imagined, a new string is generated );
In C #, if a class is instantiated with the New Keyword, it is completed by the Il command newobj. When a string is created, it is completed by the ldstr command and the ldstr command is displayed, we can think that IL wants to create a new string. (Note: If IL wants to create a string and whether to create it, it is also determined by the string resident Mechanism during runtime. This is described in the following section .)
Therefore, the third line of C # code (A = "str_2";) Looks like modifying the old value of variable a "str_1 ", but it actually creates a new string "str_2" and points the pointer of variable A to the memory address of "str_2, "str_1" is still not affected in the memory, so the value of variable B has not changed-this is the constant of string. Students must keep this in mind. net, once a String object is created, it cannot be modified! Operations such as toupper, substring, and trim generate new strings in the memory.
This section focuses on review: Due to stirng type persistence, it is often misunderstood by fellow students. Although string belongs to the reference type, it often shows the value characteristics, which is caused by lack of understanding of the persistence of string, it is not a "value feature" at all ". For example:
String A = "str_1 ";
A = "str_2 ";
This will create two strings "str_1" and "str_2" in the memory, but only "str_2" is in use, "str_1" will not be modified or disappear, this wastes memory resources, which is why stringbuilder is recommended for a large number of string operations.
Ii. Resident of strings in. Net (important)
In the first section, we talked about the constant character of the string. This feature introduces another important character of the string: String resident.
In some aspects, it is the constant character of the string that creates the resident mechanism of the string and opens the door for the thread synchronization of the string (the same string object can be used in different applications ). Program The domain is accessed, so the resident strings are process-level. Garbage collection cannot release these string objects. Only when the process ends will these objects be released ).
We use the following two lines of code to describe the resident phenomenon of the string:
String A = "str_1 ";
String B = "str_1 ";
Please think about it. How many string objects will these two lines of code generate in the memory? You may think that two variables are generated: because two variables are declared, program 1st will generate "str_1" in the memory for reference by variable; 2nd rows generate a new string "str_1" for reference by variable B. But is that true? Let's use the referenceequals method to look at the memory reference addresses of variables A and B:
String A = "str_1 ";
String B = "str_1 ";
Response. Write (referenceequals (A, B); // compare whether a and B come from the same memory reference
Output: True
Have you seen this? We compared A and B using the referenceequals method. Although we have declared two variables, they actually come from the same memory address! This indicates that string B = "str_1"; no new string is generated in the memory.
This is because, when processing strings in. net, there is a very important mechanism called the string resident mechanism. Because string is a type that is frequently used in programming, CLR allocates only one memory for the same string. CLR maintains a special data structure internally. We call it a string pool, which can be understood as a hashtable. This hashtable maintains a part of the strings used in the program, the key of hashtable is the string value, while the value is the memory address of the string. Generally, if a variable of the string type is created in the program, CLR first traverses the string with the same hash code in hashtable. If it is found, return the address of the string directly to the corresponding variable. If not, A New String object will be created in the memory.
Therefore, the two lines of code only generate one string object in the memory, and variable B shares the "str_1" in the memory with variable ".
Let's take a look at the following four lines of code in the pipeline by combining the constant character strings mentioned in Section 1 with the resident mechanism described in Section 2:
String A = "str_1"; // declare variable A, pointing the pointer of variable A to the newly generated "str_1" address in memory
A = "str_2"; // CLR first traverses the string pool to see if "str_2" already exists. If not, create "str_2" and modify the pointer of variable, point to the "str_2" memory address, and "str_1" remains unchanged. (Constant string)
String c = "str_2"; // CLR first traverses the string pool to see if "str_2" already exists. If yes, direct the pointer of variable C to the address of "str_2. (String resident)
What if it is a dynamic string creation? Will the string stay?
We will explain the performance of the resident mechanism when creating strings dynamically in three cases:
String constant connection
String A = "str_1" + "str_2 ";
String B = "str_1str_2 ";
Response. Write (referenceequals (A, B); // compare whether a and B come from the same memory reference
Output: True
Il Code Description:
. Maxstack 1
. Locals Init ([0] string,
[1] string B)
Il_0000: NOP
Il_0001: ldstr "str_1str_2"
Il_0006: stloc.0
Il_0007: ldstr "str_1str_2"
Il_000c: stloc.1
Il_000d: Ret
Lines 1 and 6 correspond to C # code string a = "str_1" + "str_2 ";
C # string B = "str_1str_2" corresponding to 7th and 8 ";
It can be seen that the compiler has computed the result of String constant connection before the program is compiled into the Il code. The ldstr command directly processes the string value calculated by the compiler, in this case, the string resident mechanism is effective!
String variable connection
String A = "str_1 ";
String B = a + "str_2 ";
String c = "str_1str_2 ";
Response. Write (referenceequals (B, c ));
Output: false
Il Code Description:
. Maxstack 2
. Locals Init ([0] string,
[1] string B,
[2] string C)
Il_0000: NOP
Il_0001: ldstr "str_1"
Il_0006: stloc.0
Il_0007: ldloc.0
Il_0008: ldstr "str_2"
Il_000d: Call string [mscorlib] system. String: Concat (string,
String)
Il_0012: stloc.1
Il_0013: ldstr "str_1str_2"
Il_0018: stloc.2
Il_0019: Ret
Lines 1st and 6 correspond to string a = "str_1 ";
Lines 7th, 8, and 9 correspond to string B = a + "str_2";, and IL uses the Concat method to connect strings.
Lines 13th and 18 correspond to string c = "str_1str_2 ";
It can be seen that when the string variable is connected, il uses the Concat method to generate the final connection result at runtime, so the string resident mechanism is invalid in this case!
3. Explicit instantiation
String A = "";
String B = new string ('A', 1 );
Response. Write (referenceequals (a, B ));
Output false
Il code:
. Maxstack 3
. Locals Init ([0] string,
[1] string B)
Il_0000: NOP
Il_0001: ldstr ""
Il_0006: stloc.0
Il_0007: LDC. i4.s 97
Il_0009: LDC. i4.1
Il_000a: newobj instance void [mscorlib] system. String:. ctor (char,
Int32)
Il_000f: stloc.1
Il_0010: Ret
This situation is easy to understand. Il uses newobj to instantiate a String object, and the resident mechanism is invalid. From the string B = new string ('A', 1); we can see that the string type is actually implemented by char, the birth of a string is never as simple as we think. A string is born only when stacks and stacks work together. This is introduced in section 4.
Of course, when the string resident mechanism is ineffective, we can easily use the string. Intern method to manually resident it in the string pool, such as the following code:
String A = "";
String B = new string ('A', 1 );
Response. Write (referenceequals (A, String. Intern (B )));
Output: True
The program returns true, indicating that the variables "a" and "B" come from the same memory address.
Now, the following two sections will show you the internal secrets of the string through examples. You can test your understanding of the string through this example. Please stay tuned!
I amAicken) Please keep an eye on my next article.
". Net Discovery series "is an explanation. NET platform's essential articles now include :. 7 In the net Discovery series-in-depth understanding. net (garbage collection mechanism) released in the first second of the New Year
. Net Discovery series 5-Me JIT (I)
. Net Discovery series 6-Me JIT (lower)
. Net Discovery series 3-deep understanding of. Net garbage collection mechanism (I)
. Net Discovery series 4-deep understanding of. Net garbage collection mechanism (II)
One of the. NET Discovery series-string from entry to mastery (on)
. Net Discovery series II-string from entry to mastery (lower)