. NET handling of character encoding Problems

Last Update:2018-12-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. character encoding historyCharacter encoding history, here we introduce yuanyou's article: http://www.cnblogs.com/KevinYang/archive/2010/06/18/1760597.html computer was first invented to solve the problem of digital computing, and later people found that the computer can do more, for example, text processing. However, because A computer only recognizes "Number", people must tell the computer which number represents A specific character. For example, 65 represents the letter 'A', 66 represents the letter 'B', and so on. However, the correspondence between characters and numbers on computers must be consistent. Otherwise, the characters displayed for the same number on different computers are different. Therefore, the American National Standards Association ANSI has Set a standard that specifies a Set of common characters and the numbers corresponding to each Character. This is the ASCII Character Set, also known as the ASCII code. At that time, computers generally used 8-bit bytes as the smallest storage and processing unit. In addition, there were very few characters used at that time, and 26 uppercase/lowercase English letters and numbers plus other commonly used symbols, because there are less than 100 ASCII codes, 7 bits can be used to efficiently store and process ASCII codes. The remaining 1 bits are used as the parity of some communication systems.

2. structural features of each Encoding

Since then, there have been insufficient use cases in various countries, so the format versions have gradually emerged.

UTF-8: encode part of the character into one byte, part of the character into two bytes, part of the character into three bytes, part of the character into four bytes. encode the values lower than 128 (0X0080) into one byte (), and (0X0080-0X07FF) into two bytes (Europe, East Asia). The values above 0 X are encoded in three bytes, finally, the proxy is encoded into 4 bytes.

UTF-16: encodes each 16-bit character into two bytes, so performance is good because there is no compression processing. It is also called UNIOCDE Encoding

UTF-32: uses four bytes to encode all characters, seemingly omnipotent, but with low processing performance.

UTF-7: has been eliminated by UNICODE Association.

ASCII: encodes 16 characters into ascii characters. The 16 characters smaller than 128 characters will be saved in a single byte, so the efficiency is good. The characters exceeding 0X07FF cannot be converted, otherwise, the character value will be lost.

3. C # encoding and decoding example

Reference http://blog.csdn.net/xyjnzy/article/details/5072057 here
// 1. Obtain the location code of Chinese Characters

Byte [] array = new byte [2];
Array = System. Text. Encoding. Default. GetBytes ("ah ");

Int i1 = (short) (array [0]-''/0 '');
Int i2 = (short) (array [1]-''/0 '');

// 2. Chinese character codes in unicode decoding mode
Array = System. Text. Encoding. Unicode. GetBytes ("ah ");
I1 = (short) (array [0]-''/0 '');
I2 = (short) (array [1]-''/0 '');

// 3. unicode deserialization for Chinese Characters
String str = "4a55 ";
String s1 = str. Substring (0, 2 );
String s2 = str. Substring (2, 2 );

Int t1 = Convert. ToInt32 (s1, 16 );
Int t2 = Convert. ToInt32 (s2, 16 );

Array [0] = (byte) t1;
Array [1] = (byte) t2;

String s = System. Text. Encoding. Unicode. GetString (array );

// 4. undecodes Chinese Characters in default mode
Array [0] = (byte) 196;
Array [1] = (byte) 207;
S = System. Text. Encoding. Default. GetString (array );

// 5. Obtain the string length
S = "iam square gun ";
Int len = s. Length; // will output as 6
Byte [] sarr = System. Text. Encoding. Default. GetBytes (s );
Len = sarr. Length; // will output as 3 + 3*2 = 9

// 6. Add strings
System. Text. StringBuilder sb = new System. Text. StringBuilder ("");
Sb. Append ("I ");
Sb. Append ("am ");
Sb. Append ("square gun ");

String --> byte array

Byte [] data = Syste. Text. Encoding. ASCII. GetBytes (string );

String --> byte

Byte data = Convert. ToByte (string );

Byte [] --> string

String = Encoding. ASCII. GetString (bytes, 0, nBytesSize );

4. Use of the Encodiing class

The Encodiing class provides many static attributes such as Unicode, UTF32, UTF7, ASCII, and Default. They return an object for processing the corresponding character encoding, it is worth noting that the Default attribute is used for the supplementary medicine, because the program you developed will be affected by the running computer, and it will use the Default character encoding solution in the current computer.

If you think it is good, please support it.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

. NET handling of character encoding Problems

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

. NET handling of character encoding Problems

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support