[C # advanced series] 13 characters, string and text encoding,

Source: Internet
Author: User
Tags string methods

[C # advanced series] 13 characters, string and text encoding,

I wrote a lot of data. As a result, I lost everything I wrote because I restarted the machine.

Then I thought about what I wrote before, but more like refining my knowledge.

About characters

Only remember the Unicode encoding used in. net. Forced conversion between characters and numbers is the simplest and most efficient,

The string is a reference type and exists on the stack. However, the same object is created using the IL command newobj. the string is created by the ldstr command. (Load string)

About strings

The String is immutable. All String Methods create a new String.

Concatenating strings with the plus sign will create multiple string objects on the stack, and the objects on the stack will affect the performance considering garbage collection. Therefore, we recommend that you use StringBuilder to concatenate strings.

String comparison

Although String provides a bunch of comparison methods, the authors of CLR via C # also recommend these comparisons, because = and! = In this comparison method, the caller does not explicitly specify the rules used for comparison. If the caller explicitly specifies the rules used for comparison, the code is easy to read and maintain.

Var str1 = "string 1"; var str2 = "string 2"; bool result1 = str1.Equals (str2, StringComparison. ordinalIgnoreCase); // use sequence numbers (that is, they are not sensitive to the local language and culture), and ignore case sensitivity for comparison. Bool result2 = str1 = str2; // common comparison

However, for me, it is true that = is more clear. It depends on the individual. When using these methods for comparison, it is indeed easier to use in some multi-language cultural scenarios. However, for general scenarios, I personally think = is better, at least I seem to be better able to read and understand.

When compared with the StringComparison. Ordinal rule, CLR will quickly compare the number of characters before comparing individual characters. If the Execution Language is sensitive to culture, it may be equal even if the number is different, so a single character will be compared at the beginning, which will consume a lot of performance.

The System. StringComparer class can also perform string comparison, which is suitable for repeated comparison of a large number of different strings.

String Reserved

CLR can share multiple identical strings through a String object, which reduces the number of strings and saves memory. This means that strings are reserved.

In. NET 4.5, strings in the literal strings in the Code are reserved during assembly loading, but earlier versions need to be used manually.

The String. Intern method is used to retain strings. You can add strings to a hash table. If a hash table exists, it is not added. If no, it is added.

In this way, the memory can be reduced, because only one string object will be referenced later. However, you must understand that saving strings also consumes performance. Therefore, we need to carefully use string retention for specific analysis.

String pool

Strings with the same content in all literal strings are actually strings in the referenced string pool. This was completed when the C # compiler was compiled.

Efficient string Construction -- StringBuilder

StringBuilder is literally easy to understand and can be used for String concatenation or anything. It can be considered as a character array.

However, we need to understand the following concepts of StringBuilder:

  • MaxCapacity)
    • Specifies the maximum number of characters in the string. The default value is Int32.MaxValue (about 2 billion ).
    • Ignore this unless you want to limit the maximum number of characters in a string.
  • Capacity)
    • As mentioned above, we can regard StringBuilder as a character array, so the capacity specifies the length of the current character array.
    • Why is it current? If the capacity value is exceeded after String concatenation, the capacity is automatically × 2, and a new array is allocated with the new capacity, and the strings in the original array are copied to the new array. Then the raw data is reclaimed.
    • So here you should understand that it is better to estimate the capacity at the beginning with StringBuilder. At the very least, do not allow frequent resizing. Otherwise, it would be worse to use String.
  • Character array
    • That is, an array consisting of Char structures in StringBuilder.
    • Its Length is obtained by Length.

Generally, Append and AppendFormat are used to Append strings. Of course, there are other operations, as long as you understand that the operation is an array.

Although this book also introduces some methods for formatting and parsing strings

String encoding-conversion of characters and bytes

The use of strings for Chinese characters does not affect Unicode, because Chinese characters occupy two bytes. However, it is enough to use only one byte for English characters, however, Unicode still occupies two bytes, one of which is used to represent the English character, and the other is simply \ 0.

Therefore, if some English translations or a large part of English articles are transmitted, it is more efficient to encode these Unicode strings into a compressed byte array.

Generally, when System. IO. BinaryWriter or System. IO. StreamWriter is used, encoding is required and decoding is also required for corresponding reading.

Generally, if you do not specify an encoding scheme, the default is the UTF-8. (It can be simply understood as two Chinese bytes, one English byte)

There is also a common encoding scheme is UTF-16, that is, both Chinese and English are two bytes, also known as Unicode encoding. (For Chinese characters, actually using UTF-16 encoding, faster than UTF-8)

Other encoding methods are basically pitfall for us.

When Encoding, try to use Encoding. Unicode to get the Encoding scheme to construct the object, instead of using System. Text. UnicodeEncoding.

Because the former will directly return the object in the previous request to you if there is a request, and will not construct a new object for each request.

The latter creates new objects in the managed heap each time, which affects the performance. However, in System. the Encoding classes derived from Encoding in Text have special constructors that can throw an exception when decoding invalid sequences. Therefore, it is better to use the following Encoding classes to ensure security and prevent invalid input.

After obtaining these encoding schemes, you can use GetBytes and GetString to convert a string to a byte array and a byte array to a string.

Byte stream Encoding

Is through the System. net. sockets. the NetworkStream object reads a UTF-16 encoded string because this Byte flow is often transmitted in the form of data blocks, and if you read 5 bytes at a time from the stream, instead of the number of bytes multiples of 2, data corruption may occur.

Therefore, you can use Encoding. Unicode. GetDecoder () to obtain a new constructor. This object contains the GetChars and GetCharCount methods. When GetChars is called, it decodes as many characters as possible. If the decoded array does not have enough bytes to complete a character, the remaining characters will be saved to the Decoder. The next time you call it, this Decoder uses the remaining bytes and the byte array uploaded to it for decoding. Reading a Decoder object from a stream is very useful.

The opposite encoding is the same.

The following is a simple Decoder decoding example.

String strTroy = ""; Byte [] bytesTroy = Encoding. unicode. getBytes (strTroy); // form a 4-Byte array Byte [] b1 = {bytesTroy [0], bytesTroy [1], bytesTroy [2]}; // one odd, half bytes Byte [] b2 = {bytesTroy [3]}; // half of the bytes // the above operations simulate data block acquisition, next var decoder = Encoding. unicode. getDecoder (); char [] result = new char [10]; // decoded character array var charindex = decoder. getCharCount (b1, 0, b1.Length); // decoder the number of characters that can be formed by decoding b1. getChars (b1, 0, b1.Length, result, 0, false); // The first 0 is decoded from the b1 0th position, the second 0 is written to the decoder from The 0th position of the result. getChars (b2, 0, b2.Length, result, charindex, false); Console. writeLine (string. join ("", result); // wonderful

Security string

The System. Security. SecureString class is a safer string class.

After the class object is constructed, an unmanaged memory block is allocated internally to avoid the garbage collector.

Unlike the String object, the contents of the encrypted String no longer exist in the memory after the SecureString object is recycled.

Of course, this string is not required if it is not a credit card or a password. After all, it will affect the performance.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.