"C # Advanced Series" 13 character, string, and text encoding

Source: Internet
Author: User


Originally wrote quite a lot of, the result because restart a machine to cause to write the thing all gone.

And then recall what was written before, but more like the knowledge refined.

About characters

character or whatever, just remember that Unicode encoding is used in. Net. Conversion between a character and a number is the simplest and most efficient force to convert

Strings are reference types that exist with the heap, however the same generic objects are created differently with newobj, which is created by the LDSTR directive. (Load String)

About strings

The string is immutable, and all string methods are created with a new string.

Using the + number to splice a string, you create multiple string objects on the heap, and objects on the heap take into account that garbage collection affects performance, so it is recommended to use StringBuilder to splice.

string comparison

Although string provides a bunch of comparison methods, the author of CLR via C # also recommends using these comparisons, because the comparison caller of = = and! = does not explicitly indicate what rules to compare, and if it shows what rules to compare, the code is easy to read and maintain.

var " String 1 "  var" string 2"bool result1= str1. Equals (str2, stringcomparison.ordinalignorecase); // use ordinal permutations (that is, insensitive to native language culture), and ignore case comparisons.  bool result2 = STR1 = = str2; // Common Comparisons

However, to me, it is indeed = = more clear, this is to see the individual. These methods of comparison are really useful in some multi-language culture scenarios, but for the general scenario, I personally think = = better, at least I look better reading and understanding.

In comparison with the stringcomparison.ordinal rule, the CLR will quickly compare the number of characters to the same number before continuing to compare individual characters. And if the language culture is sensitive, even if the number of different can be equal, so the first comparison of a single character, which is very expensive performance.

The System.stringcomparer class can also perform string comparisons, which apply to a large number of different strings repeatedly performing the same comparison.

string retention

The CLR can share multiple identical string content through a string object, which reduces the number of strings and saves memory, which is the string retention.

In. NET 4.5, it is natural to retain strings for literal strings in code when the assembly is loaded, but the previous version needs to be manually.

The String.intern method is a string retention method that joins a string into a hash table if there is no join in the hash table.

This can of course reduce memory because only one string object is referenced later. However, it is also necessary to understand that the operation of preserving a string consumes performance. So specific analysis of the situation, or the need to carefully use string retention.

String pool

For all literal strings, the same content string is actually a string in the referenced string pool. This is done when the C # compiler compiles.

Efficient construction of String--stringbuilder

StringBuilder is a very good understanding of the literal meaning, string splicing what it is good to use it. You can think of it as a character array.

However, to understand the following concepts of StringBuilder

    • Maximum Capacity (maxcapacity)
      • Specifies the maximum number of characters in a string. The default value is Int32.MaxValue (about 2 billion).
      • Generally ignore, unless you want to limit the maximum number of characters for a string.
    • Capacity (capacity)
      • As mentioned earlier, you can think of StringBuilder as a character array, then the capacity specifies the length of the current character array.
      • Why is it now? Because if the string is spliced beyond this capacity value, the capacity is automatically * *, and the new capacity is allocated to the new array, and the strings in the original array are copied into the new array. The raw data is then garbage collected.
      • So see here you should be very clear, with StringBuilder best at the beginning of the time to estimate a capacity, at least do not let him to expand frequently, or really pit, rather than using string.
    • Character array
      • This is an array of char structures inside the StringBuilder.
      • The length of it is obtained using length

The general use of Append and AppendFormat to append strings, of course, there are other operations, as long as the understanding that the operation of an array is good.

While this book describes some of the ways in which strings are formatted and parsed,

String encoding--conversion of characters and bytes to each other

For the use of Chinese characters we use a string word does not affect the Unicode, because the Chinese characters accounted for two bytes, but for the English character is actually only a single byte is enough, but in Unicode still accounted for two bytes, one of the bytes used to represent the English character, and the other byte is simply \ 0.

So some English translation, or a large number of English article transmission, it is more efficient to encode these Unicode strings into a compressed byte array.

Usually when using System.IO.BinaryWriter or System.IO.StreamWriter, the code needs to be encoded, and the corresponding reading needs to be decoded.

Generally does not specify an encoding scheme, then the default is UTF-8. (can be easily understood as Chinese two bytes, English one byte)

There is also a common coding scheme is UTF-16, that is, both English and Chinese are two bytes, also known as Unicode encoding. (for Chinese characters, in fact, with UTF-16 encoding, faster than UTF-8)

Other coding methods are not said, for us is basically a pit.

When we encode, we try to use Encoding.unicode to get the coding scheme to construct the object, instead of using System.Text.UnicodeEncoding.

Because the former request will directly return the last requested object to you, it will not construct a new object for each request.

The latter creates new objects in the managed heap each time, so it has a performance impact. However, in System.Text, these encoding-derived encoding classes have special constructors that can throw exceptions when decoding invalid sequences, so it is better to protect against invalid input if security is to be ensured.

After acquiring these encoding schemes, you can use GetBytes and getstring to convert the string to a byte array and convert the byte array to a string.

Encoding of Byte stream

is to read a UTF-16 encoded string through the System.Net.Sockets.NetworkStream object, because the byte stream is typically transmitted as a block of data, and if 5 bytes are read from the stream at a time instead of the number of bytes in multiples of 2, data corruption can result.

So you can use Encoding.Unicode.GetDecoder () to get a new constructed object that contains the GetChars and GetCharCount two methods. When calling GetChars, it will decode as much as possible, and if the byte of the decoded array is not enough to complete a single character, then the remaining characters will be saved inside the decoder, and the next time it is called, the decoder will use the previously remaining bytes, plus the byte array passed to it to decode. Reading a decoder object from a stream is a big action.

The opposite is the same as coding.

The following is a simple decoder decoding example

stringStrtroy ="Wonderful"; Byte[] Bytestroy= Encoding.Unicode.GetBytes (Strtroy);//to form a byte array of length 4Byte[] B1 = {bytestroy[0], bytestroy[1], bytestroy[2] };//a strange, half- PAbyte[] B2 = {bytestroy[3] };//half a PA.//the above operation is simulated by the data block acquisition, the nextvarDecoder =Encoding.Unicode.GetDecoder ();Char[] result=New Char[Ten];//the decoded character arrayvarCHARINDEX = decoder. GetCharCount (B1,0, B1. Length);//if the number of characters that can be formed by decoding B1Decoder. GetChars (B1,0, B1. Length, result,0,false);//The first 0 is decoded from the No. 0 position of the B1, and the second 0 is written from the No. 0 position of resultDecoder. GetChars (B2,0, B2. Length, result, CHARINDEX,false); Console.WriteLine (string. Join ("", result));//Wonderful

Secure string

The System.Security.SecureString class is a more secure string class.

When an object of this class is constructed, an unmanaged block of memory is allocated internally to avoid the garbage collector.

Unlike a string object, the contents of the SecureString object's encrypted string will no longer exist in memory after it is recycled.

Of course, such a string if it is not a credit card, password or something is not required, after all, there will be performance impact.

"C # Advanced Series" 13 character, string, and text encoding

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: