Chapter 14th character, String, and text processing

Source: Internet
Author: User
Tags parse string

14.1 Characters

In the. Net framework, characters are always represented as 16-bit Unicode code values, which simplifies the development of internationalized applications.

Each character is represented as an instance of the SYSTEM.CHAR structure.

For an instance of char, you can call the static GetUnicodeCategory method, which returns a value of the System.Globalization.UnicodeCategory enumeration type.

The char type provides several static methods, such as Isdigit,isupper. Note that all of these methods either get a single character as a parameter, or get a string and the target character's index in the string as a parameter.
The reason why ToLower and toupper need linguistic and cultural information is that the case conversion of letters is a kind of language culture-dependent operation. The language and culture information is obtained from the static CurrentCulture property of these two methods that query the System.Threading.Thread class internally.

In addition to these static methods, the char type provides several instance methods. For example, the Equals method returns True if the two char instances represent the same 16-bit Unicode code bit. The CompareTo method returns two Char instances ignoring the comparison results of the language culture. The Getnumericvalue method, which returns the numeric form of a character, demonstrates this method.

public static class Program {

public static void Main () {

Double D;

D = Char.getnumericvalue (' 3 ');//3

Console.WriteLine (D.tostring ());

D = char.getnumericvalue (' A ');//-1

Console.WriteLine (D.tostring ());

}

}

You can use three techniques to convert each numeric type to a char instance.

    • Transformation (forced type conversion) The simplest way to convert a char to a numeric value is to transform.
    • Use Convert type The System.Convert type provides several static methods for converting between char and numeric types. All of these methods perform transformations in a checked way. So once the conversion is found to cause data loss, a OverflowException exception is thrown.
    • Use iconvertible interface This technique is inefficient because calling an interface method on a value type requires that the instance be boxed-char and all numeric types are value types

The following code demonstrates how to use the three technologies:

public static Class program

{

public static void Main ()

{

Char C;

Int32 N;

Forcing type conversions using C #

c = (Char) 65;

n = (Int32) C;

Using the Convert type

c = Convert.tochar (65);

Try

{

700000000000 too large for 16 bits of Char

c = Convert.tochar (700000000000);

}

catch (OverflowException)

{

Console.WriteLine ("Cannotconvert 700000000000 to a Char");

}

Using the IConvertible interface

c = ((iconvertible) 65). ToChar (NULL);

}

}

14.2 System.String Type

A string that represents an immutable (immutable) sequential character set. The string type derives directly from object, so it is a reference type. Therefore, the string object always exists on the heap and never runs to the line stacks.

14.2.1 Construct string

C # treats string as a primitive type-that is, the compiler allows a literal constant string to be represented directly in the source code. The compiler places these literal constant strings in the module's metadata and loads and references them at run time.

In C #, you cannot construct a string object from a literal constant string using the new operator, and you must use the simplified syntax.

public static Class program

{

Error

string s = new String ("Hi");

That's right

String S1 = "Hi";

}

For special characters such as line breaks, carriage returns, and BACKSPACE, C # uses an escape mechanism.

\ r return return

\ NewLine line break

String containing carriage return line feed

String s = "Hi\r\nthere";

The following is the correct way to define the above string

String s1= "Hi" + environment.newline+ "there";

You can concatenate several strings into one using C # 's + operator. String s2 = "Hi" + "" + "there";

In the preceding code, because all strings are literal constant strings, the C # compiler will concatenate them at compile time, eventually placing only one string (that is, "Hi there") in the module's metadata. Using the + operator for non-literal constant strings, the connection is made at run time. To concatenate several strings together at run time, avoid using the + operator because it creates multiple string objects on the heap, and the heap is garbage-collected, which affects performance. Instead, use the String.Text.StringBuilder type as much as possible.

C # also provides a "verbatim string (verbatim strings)" declaration method, typically used to specify the path to a file or directory, or to use with regular expressions.

To declare a string without using the literal string character @

String file = "C:\\windows\\system32\\notepad.exe";

To declare a string using the literal string character @

Stringfile = @ "C:\Windows\System32\Notepad.exe";

Before a string is added to the @ symbol, the compiler knows that the string is a verbatim string. In fact, this tells the compiler to treat the backslash character as a literal constant rather than an escape character, making the file path easier to read in the source code.

The 14.2.2 string is immutable

The most important fact of a string object is that it makes immutable. That is, the string cannot be changed once it is created, it cannot be shortened, or any of its characters can be modified.

14.2.3 Comparing strings

Strings are typically compared for two reasons:

    • Judging the equality of
    • To sort a string

When sorting, a case-sensitive comparison should always be performed.

IgnoreCase is set to true in the Compare method and is not case sensitive.

When determining string equality or sorting strings, it is strongly recommended to call one of the following methods:

public bool Equals (string value, StringComparison comparisonType);

public static bool Equals (string A, string B, StringComparison comparisonType);

public static int Compare (string stra, STRINGSTRB, StringComparison comparisonType);

public static int Compare (string stra, STRINGSTRB, bool ignoreCase, CultureInfo culture);

public static int Compare (string stra, STRINGSTRB, CultureInfo culture, CompareOptions options);

public static int Compare (string stra, int indexa,string strB, int indexb,int length, StringComparison comparisonType);

public static int Compare (string stra, int indexa,string strB, int indexb,int length, cultureinfoculture, CompareOptions o ptions);

public static int Compare (string stra, int indexa,string strB, int indexb,int length, boolignorecase, CultureInfo culture) ;

public bool StartsWith (string value, StringComparison comparisonType);

public bool StartsWith (string value, bool IgnoreCase, CultureInfo culture);

public bool EndsWith (string value, StringComparison comparisonType);

public bool EndsWith (string value, bool IgnoreCase, CultureInfo culture);

Many programs use strings for internal programming purposes, such as pathname, file name, URL, registry key/value, environment variable, reflection, XML, and so on. When comparing strings for programmatic purposes, you should always use StringComparison.Ordinal, which is the quickest way to perform string comparisons, because you do not need to take into account language culture information when performing comparisons.

From now on we will discuss how to perform the correct comparison in language and culture. NETFramework uses System.Globalization.CultureInfo to represent a "language/country".

The following code shows the difference between ordinal comparison and dependent language culture comparison:

static void Main ()

{

String S1 = "Strasse";

String s2 = "Straße";

Boolean eq;

Compare returns a value other than 0, if the ordinal flag is passed, the Compare method ignores the specified language culture

EQ = String.Compare (S1, s2, stringcomparison.ordinal) = = 0;

Console.WriteLine ("Ordinalcomparison: ' {0} ' {2} ' {1} '", S1, S2, eq?) "==" : "!=");

For people who speak German in Germany

CultureInfo ci = new CultureInfo ("De-de");

Compare returns a value of 0

EQ = String.Compare (S1, S2, true, ci) = = 0;

Console.WriteLine ("Culturalcomparison: ' {0} ' {2} ' {1} '", S1, S2, eq?) "==" : "!=");

}

14.2.4 string Retention

As described in the previous section, checking the equality of strings is a common operation for many applications-a task that can severely compromise performance.

When you perform an ordinal ordinal equality check, the CLR quickly checks whether two strings have the same number of characters. If the answer is yes, the string may be equal. The CLR must then compare each individual character to determine.

In addition, if multiple instances of the same string are copied in memory, it can cause a waste of memory because the string is immutable. If only one instance of the string is kept in memory, the utilization of the memory is significantly increased. All variables that need to reference a string simply point to a single string object.

If your application often makes case-sensitive, ordinal comparisons of strings, or knows beforehand that many string objects have the same value, you can use the CLR's string retention (stringinterning) mechanism to significantly improve performance.

The CLR initializes an internal hash table in which the key (key) is a string, and the value is a reference to a string object in the managed heap.

The string class provides two methods that allow you to access this internal hash table:

public static string Intern (String str);

public static string isinterned (String str);

The difference between equals and referenceequals:

    • ReferenceEquals: Always compare 2 reference objects to the same address, which is a comparison reference. object, so you cannot override the method in an inherited class. When judging a value type, boxing is required, and it is sure to return false, because the value type is re-boxed as a new instance of the reference type after using the ReferenceEquals (Objecta,object B) method. String type because there is a mechanism for strings to reside: string a= "A"; String b= "a"; returns to true.
    • Equals: Compares the values of 2 objects (regardless of the reference, non-reference) for equality, and is the comparison value.

The following code demonstrates the string retention:

static void Main ()

{

String S1 = "Hello";

String s2 = "Hello";

Boolean a = object.referenceequals (S1, S2);//true

S1 = String.intern (S1);

S2 = string.intern (s2);

Boolean B = object.referenceequals (S1, S2);//true

}

In the first call to the ReferenceEquals method, in the CLR low version, S1 references the "Hello" string object in the heap, and S2 references another "Hello" string object in the heap. When running on version 4.0 of the CLR, the CLR chooses to ignore the attribute/flag generated by the C # compiler. However, when an assembly is loaded into the AppDomain, the CLR will retain the literal constant string "Hello" by default. The result is true.

Before the second call to the ReferenceEquals method, the "Hello" string is displayed for retention, and S1 now refers to a "hello" that has been retained. Then, by calling Intern,s2 again, the same "Hello" string is set to S1 reference. Now, when ReferenceEquals is called for the second time, it is guaranteed to get a true result, regardless of whether the Assembly has Attribute/flag set at compile time.

14.2.5 String Pool

The compiler has the ability to combine multiple instances of a single string into one instance.

14.2.6 checking strings for characters and text elements 14.2.7 other string operations

You can also use some of the methods provided by the string type to copy a string or part.

public void CopyTo (int sourceIndex, char[] destination, int destinationindex, int count);

ToString: Returns a reference to the same object

The above method copies part of the character of a string into a character array.

Keep in mind when using all of these methods that they return a new string object.

14.3 High-efficiency construction strings

Because the string type represents an immutable string, the FCL provides another type named System.Text.StringBuilder. It can be used to efficiently process strings dynamically and then create a string based on the results of the processing.

Logically, the StringBuilder object contains a field that references an array of char structures. You can manipulate this character array with the members of the StringBuilder to efficiently shorten the string or change the characters in the string.

14.3.1 Constructing StringBuilder objects

When using the StringBuilder method, remember that most methods return a reference to the same StringBuilder object. Therefore, it is convenient to link several operations together to complete:

static void Main ()

{

StringBuilder sb = new StringBuilder ();

String s= sb. AppendFormat ("{0}{1}", "Jeffrey", "Richter"). Replace (', '-'). Remove (4,3). ToString ();

}

Examples of stitching strings:

String[]value = {"1", "2", "3"};

String a = "";

StringBuilder str = new StringBuilder ();

foreach (String text in value)

{

Str. AppendFormat (", {0}", text);//concatenation of the values in the value array into a single string, separated by commas

}

if (str! = null && str. Length > 0)

{

Str. Remove (0, 1);//Remove First comma

}

A = str. ToString ();//To convert StringBuilder to a string

The methods provided by the string and the StringBuilder class do not match exactly. For example: string provides methods such as Tolower,toupper,endswith,trim, but the StringBuilder class does not provide any corresponding methods. On the other hand, the StringBuilder class provides a more comprehensive replace method that allows you to replace a character as part of a string. The Replace method in the string class is the public string replace (Charoldchar, Char Newchar);

Because the methods in these two classes do not correspond exactly, it is sometimes necessary to do a specific task in string and StringBuilder transformations.

StringBuilder sb = new StringBuilder ();

String s= sb. AppendFormat ("{0},{1}", "Jeffrey", "Richter"). ToString ();

S.toupper ();

Sb. Length = 0;

Sb. Append (s). Insert (8, "marc-");

s = sb. ToString (1, 2);

14.4 Gets the string representation of the object: ToString (Look again)

We often have to get a string representation of an object. You can call the ToString method to get a string representation of any object.

There are two problems with the non-parametric tostirng method.

The format method of string ...

14.5 parse string to get object: Parse

Parse the string to get an object, which is occasionally used.

Int32 x = Int32.Parse ("1 A", numberstyles.hexnumber);//26

Any type that resolves a string provides some of the publicstatic methods of parse.

Let's take a look at how to parse a string into a numeric type:

public static int Parse (string s, NumberStyles style, IFormatProvider provider);

S is a string argument, NumberStyles is a style that runs in the string argument s

Int32 x = Int32.Parse ("123", numberstyles.none); The string to parse contains a leading white-space character, which is reported formatexpection exception

should be set as Numberstyles.allowleadingwhite.

14.6 encoding: Converting characters and bytes to each other 14.7 secure strings

Microsoft has added a more secure string class in the FCL System.Security.SecureString

Chapter 14th character, String, and text processing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.