About the coding problems in C # development,
I have been working on various coding problems recently and have some experiences. I would like to share with you some of them.
System. Text provides the Encoding abstract class, which provides the string Encoding method. Common encoding methods include ASCII, Unicode, and UTF8 ).
Unicode has four encoding formats, UTF-8, UTF-16, UTF-32, UTF-7.
Character encoding class, ASCIIEncoding, UTF7Encoding, UnicodeEncoding, and UTF32Encoding.
Next we will compare the ASCII and Unicode encoding. If you are not talking about it, first go to the Code:
This is ASCII encoding and decoding.
1 static void Main(string[] args) 2 { 3 string temp = "Hello World!"; 4 Console.WriteLine("Original String:{0}", temp); 5 6 byte[] tempBytes = System.Text.Encoding.ASCII.GetBytes(temp); 7 Console.WriteLine("Bytes Array:{0}", BitConverter.ToString(tempBytes)); 8 9 BigInteger integer = new BigInteger(tempBytes);10 Console.WriteLine("BigInteger:{0}", integer);11 12 string res = System.Text.Encoding.ASCII.GetString(tempBytes);13 Console.WriteLine("Convert Back String:{0}", res);14 Console.ReadKey();15 }
View Code
The running result is as follows:
Original String:Hello World!Bytes Array:48-65-6C-6C-6F-20-57-6F-72-6C-64-21BigInteger:10334410032597741434076685640Convert Back String:Hello World!
Normal, right? However, what if the input string is Chinese (or other non-ASCII characters?
Change the above program code
String temp = "Hello, world! ";
The running result is as follows:
Original String: Hello, world! Bytes Array: 3f-3f-3f-3f-3f-3f-3fbiginteger: 69540876599103 Convert Back String :??????
If you change the encoding format to UTF8, repeat the above test process.
1 static void Main (string [] args) 2 {3 string temp = "Hello, world! "; 4 Console. writeLine ("Original String: {0}", temp); 5 6 byte [] tempBytes = System. text. encoding. UTF8.GetBytes (temp); 7 Console. writeLine ("Bytes Array: {0}", BitConverter. toString (tempBytes); 8 9 BigInteger integer = new BigInteger (tempBytes); 10 Console. writeLine ("BigInteger: {0}", integer); 11 12 string res = System. text. encoding. UTF8.GetString (tempBytes); 13 Console. writeLine ("Convert Back String: {0}", res); 14 Console. readKey (); 15}
View Code
The running result is as follows:
Original String: Hello, world! Bytes Array: E4-BD-A0-E5-A5-BD-EF-BC-8C-E4-B8-96-E7-95-8C-EF-BC-81BigInteger:-10998968812899434720462615123889939386679836 Convert Back String: Hello, world!
Original String:Hello World!Bytes Array:48-65-6C-6C-6F-20-57-6F-72-6C-64-21BigInteger:10334410032597741434076685640Convert Back String:Hello World!
Through comparison, we found that apart from compatibility with Chinese and other languages, there seems to be no big difference. If you replace the character set with Unicode, the differences in Chinese and English character encoding will be easily seen.
Original String:Hello World!Bytes Array:48-00-65-00-6C-00-6C-00-6F-00-20-00-57-00-6F-00-72-00-6C-00-64-00-21-00BigInteger:3160918205608148134863399242437668999277801104545742920Convert Back String:Hello World!
Original String: Hello, world! Bytes Array: 60-4F-7D-59-0C-FF-16-4E-4C-75-01-FFBigInteger:-307722159543719876182061216 Convert Back String: Hello, world!
Otherwise. By comparing the results, we found that:
1. ASCII can only process English and English characters. For more information, see ASCII two-dimensional table.
2. Unicode can process all language symbols around the world
3. When Unicode is used to process English, a byte 0x00 is added after each byte, which is twice the length of ASCII. When Processing Chinese characters, the encoding is shorter.
4. UTF8 is longer than Unicode when processing Chinese characters. It is the same as ASCII when processing English.
Conclusion: As the storage media is getting less and less valuable, when processing non-English characters, the encoding format should be Unicode (or any encoding format of its subset UTF8 ), you can select ASCII encoding only when you confirm that the program will only process English.