C # Text Processing (String) study notes,
Abstract: string is the most frequently used type in programming. A string represents a constant set of character sequences. The string type is directly inherited from the object, so it is a reference type, that is, there is no string on the thread stack (the type directly inherited from the object must be a reference type, because all value types are inherited from System. valueType. It is worth noting that System. ValueType is a reference type ).
String is the most frequently used type in programming. A string represents a constant set of character sequences. The string type is directly inherited from the object, so it is a reference type, that is, there is no string on the thread stack (the type directly inherited from the object must be a reference type, because all value types are inherited from System. valueType. It is worth noting that System. ValueType is a reference type ).
1. Create a string
C # Think of string as a primitive type, that is, the compiler may use text constants in the source code to directly express the string. The compiler stores these text constant strings in the metadata of the managed module. Create a string in C.
You cannot use the new operator to create a string.
Class Program
{
Static void Main (string [] args)
{
String hap = new string ("heaiping ");
}
}
The above code may cause a compilation error.
Instead, use the following simplified syntax
Class Program
{
Static void Main (string [] args)
{
String hap = "heaiping ";
}
}
For more information, see the following code.
Code namespace StringStudy
{
Class Program
{
Static void Main (string [] args)
{
String s = "heaiping"; // do not use new
SomeClass SC = new SomeClass (); // use new
}
}
Class SomeClass
{
}
}
Compile the above Code to view the IL code generated by ILDasm as follows.
Code. method private hidebysig static void Main (string [] args) cel managed
{
. Entrypoint
// Code size 14 (0xe)
. Maxstack 1
. Locals init ([0] string s,
[1] class StringStudy. SomeClass SC)
IL_0000: nop
IL_0001: ldstr "heaiping" // create a string here
IL_0006: stloc.0
IL_0007: newobj instance void StringStudy. SomeClass:. ctor () // create SomeClass here
IL_000c: stloc.1
IL_000d: ret
} // End of method Program: Main
As shown above, C # uses newobj as the IL command for the object instance and then calls its constructor. For string, it uses a special command ldstr (whether it is loadstring or not ), this command obtains text constants from metadata to construct string objects. This indicates that CLR has a more efficient and special string construction method.
The string type is also constructed by using new instead of text metadata. It can accept parameters such as char *, which is provided to host C ++, this will not be studied ~~~~~~~
See the following code.
Class Program
{
Static void Main (string [] args)
{
String s = "heaiping ";
S = s + "" + "soft ";
}
}
The above Code uses the + operator to connect strings. We do not recommend using the + operator in the book, because it will create multiple strings on the hosting stack for the garbage collector to be executed, rather than simply attaching them to the back. Use System. Text. StringBuilder (next learning );
There is also a special declaration that allows the compiler to consider the characters between quotation marks as part of a string.
Code class Program
{
Static void Main (string [] args)
{
// FileA and fileB are the same
String fileA = "C: // Windows // Temp ";
String fileB = @ "C:/Windows/Temp ";
}
}
@ Symbol will speed up the compiler. This character string is a literal string, telling the compiler to regard the backslash as a character rather than an escape character.
Ii. Constant character of strings
The uniqueness of a string, that is, once a string is created, it cannot become longer, shorter, or change any of the characters.
Iii. String comparison
List
Member |
Member type |
Description |
Compare |
Static Method |
Sort and compare strings to control the language and culture, and determine whether to consider the case sensitivity. |
CompareTo |
Instance method |
Sort and compare strings internally using the language culture of the current thread |
StartsWith/EndsWith |
Instance method |
Whether to start or end with a specified string. It is case sensitive and uses the language culture of the current thread. |
CompareOrdinal |
Static Method |
Compare by character set, regardless of language culture, case sensitive, fast |
Eauals |
Static/instance |
Static Methods Compare character sets. The instance method calls CompareOrdinal internally. The static method first checks whether two references point to the same object. If yes, the character set is no longer compared. |
GetHashCode |
Instance method |
Returns the hash code. |
In addition to the above, the string type also carries the = and! = Operator, which calls the static Equals method of string internally.
The preceding comparisons are described as follows:
- CompareOridinal is used to determine whether a string is equal, because it only compares whether character sets are equal and is faster.
- If it is a logical comparison that is presented to the user, what is logical comparison? Although some strings have different character sets, they are logically equal. The Compare method should be used, compare uses a sorting table for specific languages and cultures.
When comparing strings, the language culture has a great impact on them. The Compare method first obtains the CurrentCulture related to the calling thread internally, and then reads the CompareInfo attribute of CurrentCulture. Because the CompareInfo object encapsulates a character comparison table associated with the language and culture, each culture has only one CompareInfo object.
Iv. String resident
The character bed comparison is a waste of performance. CLR improves performance through the string interning mechanism. See the following code:
Code
Namespace StringStudy
{
Class Program
{
Static void Main (string [] args)
{
String s = "hap ";
Console. WriteLine (object. ReferenceEquals ("hap", s ));
SomeClass SC = new SomeClass ();
Console. WriteLine (object. ReferenceEquals (new SomeClass (), SC ));
}
}
Class SomeClass
{
}
}
According to the characteristics of the reference type, both the above two outputs should be False, because different references are compared,However, the output of Console. WriteLine (object. ReferenceEquals ("hap", s) is True. Why is string not a reference type ??String is of course a reference type, and the above performance is determined by CLR's special processing of string:
When CLR is initialized, it creates an internal hash with the key as a string and the value as a reference to the string object in the managed heap. At the beginning, the table is empty. When the JIT compiler compiles a method, it searches for each text constant string in the hash. For the above Code about string, the compiler will first look for the "hap" string, and because it is not found (not found for the first time ), it constructs a New string object (pointing to the hap) on the managed heap ), then, add the hybriddb string and reference pointing to the object to the hashed list (the hash list contains the value with the key of "hap"). Then, the JIT compiler searches for the second hybriddb string in the compiler, of course, the hash list already exists, so no operation is executed and the code starts to be executed.
When the code is executed, it will find a reference to the needing to be referenced by the hap string in the first line. Therefore, CLR searches for the hap in its internal hash and finds it, so that reference to the previously created string object is saved in the variable s. When you execute the command again, the CLR will search for the HA again in its internal hash and will still find it. In this way, references pointing to the same string object will be passed to the object's ReferenceEquals as the first parameter. Of course, it is the same reference as the second parameter, and the result is True.
Continue with the following code:
Code namespace StringStudy
{
Class Program
{
Static void Main (string [] args)
{
String s1 = "hap ";
String s2 = "h ";
String s3 = s2 + "ap ";
String s4 = "h" + "ap ";
Console. WriteLine (object. ReferenceEquals (s1, s3); // false
Console. WriteLine (s1.Equals (s3); // true
Console. WriteLine (object. ReferenceEquals (s1, s4); // true
}
}
}
This time, the output is a little strange. What is the problem? Originally, when a string reference method is compiled by JIT, all text constants embedded in the source code will be added to the hash list. String referenced by s2 (h) is connected with a text constant string (ap. The result is a newly constructed string object that is referenced by s3 in the managed heap. The dynamically created string contains a hap, but it is not added to the CLR hash, instead, we recommend that you reference all ReferenceEquals to return fasle. Equals is called because they have the same character set.
For s4 = "h" + "ap", this is because the IL command concatenates two text constant strings into a text constant string, and all outputs are True.
The ReferenceEquals method does not need to compare characters one by one, but only compares references. The efficiency is significantly higher than that of Equals. If we compare all strings in the program with references rather than character sets, then the system performance will be greatly improved. If there is a way to change a dynamic string containing the same character set to a string object in the managed heap, the application requires fewer objects, which can improve system performance. Very lucky. The string type improves the two static methods to achieve this.
Public static string Intern (string str );
Public static string IsInterned (string str );
The first method Intern accepts a string parameter and searches for it in the CLR hash. If this string can be found, Intern returns an existing reference. If no value is found, the string will be added to the hash list, and the reference of Intern will be returned. If the reference program no longer saves the string reference passed as the parameter, the garbage collector will recycle it. Use Intern to re-Modify the above program:
Code namespace StringStudy
{
Class Program
{
Static void Main (string [] args)
{
String s1 = "hap ";
String s2 = "h ";
String s3 = s2 + "ap ";
S3 = string. Intern (s3 );
Console. WriteLine (object. ReferenceEquals (s1, s3); // true
Console. WriteLine (s1.Equals (s3); // true
}
}
}
The output is all true, which is amazing. Because Intern also requires execution time, it is recommended that you use the string resident technology only when you need to compare the same string multiple times. Otherwise, the loss will not be worth the candle.
Note that the garbage collector does not release the string objects referenced in the CLR internal hash. Only when the application domain no longer applies these string objects will they be released.
The difference between the IsInterned method and the Intern method is that if no value is found in the hash list, null is returned instead of being created.
Code namespace StringStudy
{
Class Program
{
Static void Main (string [] args)
{
String s1 = "heaiping ";
String s2 = "h ";
String s3 = s2 + "ap ";
S3 = string. IsInterned (s3 );
If (s3 = null)
{
Console. WriteLine (0 );
}
Else
{
Console. WriteLine (1 );
}
}
}
}
The above program outputs 0, because in the current program, there is only heaiping in the hash list and there is no ha, so s3 becomes null at last.
String also has some members such as Length, IndexOF... Copy... and so on... I'm tired.
By the way, I sent a group discussion http://home.cnblogs.com/group/topic/38270.html about string.
The original article is as follows:
Class Class1
{
Static void StrChange (string str)
{
Str = "hellow ";
}
Static void Main ()
{
String str = "123"; // declare a string
StrChange (str); // CALL THE METHOD
Console. WriteLine (str); // output string
}
}
The output result is 123"
Is string a value type or a reference type?
If it is a value type, the result is still said. But I remember that string is a reference type... is it wrong ??
If it is a reference type, the output result should be: "hellow"
Why ?? Thank you for your help.
Now we understand that it is actually related to the CLR's hash list for the string,
String is a special reference type. If it is special, it should be on this hash.
In C language-> what?
-> Is a whole. It is used to point to a struct, class in C ++, and other pointers containing sub-data to obtain sub-data. In other words, if we define a struct in C and declare a pointer pointing to this struct, we need to use "->" to retrieve the data in the struct using the pointer ".
For example:
Struct Data
{
Int a, B, c;
};/* Define struct */
Struct Data * p;/* define struct pointer */
Struct Data A = {1, 2, 3};/* declare variable */
Int x;/* declare a variable x */
P = & A;/* point p to */
X = p-> a;/* indicates that the data item a in the struct pointed to by p is assigned to x */
/* Because p points to A, p-> a = A. a, that is, 1 */
For the first problem, p = p-> next; this should appear in the linked list of C language. next here should be a struct pointer of the same type as p, and its definition format should be:
Struct Data
{
Int;
Struct Data * next;
};/* Define struct */
............
Main ()
{
Struct Data * p;/* declare the pointer Variable p */
......
P = p-> next;/* assign the value in next to p */
}
The linked list pointer is a difficulty in C language, but it is also the key. It is very useful to learn it. To be careful, you must first talk about variables and pointers.
What is a variable? The so-called variables should not be simply thought that the amount will become a variable. Let's use the question of our Dean: "Is the classroom changing ?" Change, because there are different people in the classroom every day, but they do not change, because the classroom is always there, and it does not become larger or smaller. This is the variable: There is a constant address and a variable storage space. Under normal circumstances, we only see the variable in the room, that is, its content, but do not pay attention to the variable address, but the C language pointer is the address of the room. We declare that variables are equivalent to building a house to store things. We can directly watch things in the house, while declaring pointers is equivalent to getting a positioner. When a pointer points to a variable, it is to use the pointer to locate the variable. Then we can use the pointer to find the variable "tracked" and get the content in it.
What about struct? The structure is equivalent to a villa composed of several houses, and several houses are bound for use together. Suppose there are many such villas distributed in a big maze, and each villa has a house. The location information of another villa is put in it. Now you have found the first villa with the positioner and obtained what you want from it (the data part of the linked list ), then, calculate the location of the next villa into your positioner (p = p-> next), and go down to the next villa ...... If you go on like this, you will know that the information of a villa on the ground is gone (p-> next = NULL), and your trip is over. This is the process of traversing a linked list. Now you can understand the meaning of p = p-> next!
Write so much. I hope you can understand.
If you want to learn c and C ++ well, you must be familiar with linked lists and pointers!
In C language-> what?
-> Is a whole. It is used to point to a struct, class in C ++, and other pointers containing sub-data to obtain sub-data. In other words, if we define a struct in C and declare a pointer pointing to this struct, we need to use "->" to retrieve the data in the struct using the pointer ".
For example:
Struct Data
{
Int a, B, c;
};/* Define struct */
Struct Data * p;/* define struct pointer */
Struct Data A = {1, 2, 3};/* declare variable */
Int x;/* declare a variable x */
P = & A;/* point p to */
X = p-> a;/* indicates that the data item a in the struct pointed to by p is assigned to x */
/* Because p points to A, p-> a = A. a, that is, 1 */
For the first problem, p = p-> next; this should appear in the linked list of C language. next here should be a struct pointer of the same type as p, and its definition format should be:
Struct Data
{
Int;
Struct Data * next;
};/* Define struct */
............
Main ()
{
Struct Data * p;/* declare the pointer Variable p */
......
P = p-> next;/* assign the value in next to p */
}
The linked list pointer is a difficulty in C language, but it is also the key. It is very useful to learn it. To be careful, you must first talk about variables and pointers.
What is a variable? The so-called variables should not be simply thought that the amount will become a variable. Let's use the question of our Dean: "Is the classroom changing ?" Change, because there are different people in the classroom every day, but they do not change, because the classroom is always there, and it does not become larger or smaller. This is the variable: There is a constant address and a variable storage space. Under normal circumstances, we only see the variable in the room, that is, its content, but do not pay attention to the variable address, but the C language pointer is the address of the room. We declare that variables are equivalent to building a house to store things. We can directly watch things in the house, while declaring pointers is equivalent to getting a positioner. When a pointer points to a variable, it is to use the pointer to locate the variable. Then we can use the pointer to find the variable "tracked" and get the content in it.
What about struct? The structure is equivalent to a villa composed of several houses, and several houses are bound for use together. Suppose there are many such villas distributed in a big maze, and each villa has a house. The location information of another villa is put in it. Now you have found the first villa with the positioner and obtained what you want from it (the data part of the linked list ), then, calculate the location of the next villa into your positioner (p = p-> next), and go down to the next villa ...... If you go on like this, you will know that the information of a villa on the ground is gone (p-> next = NULL), and your trip is over. This is the process of traversing a linked list. Now you can understand the meaning of p = p-> next!
Write so much. I hope you can understand.
If you want to learn c and C ++ well, you must be familiar with linked lists and pointers!