Character Set encoding is different, and the data storage space is also different. If you choose improperly, garbled characters may occur. In particular, the sending and receiving of data must be consistent with the encoding.
ASCII code is the earliest and most basic encoding, using 7 (BIT) to represent a character, a total of 2 to the power of 7 = 128 characters, and later with Latin1 (ISO-8859-1) ASCII is expanded. An 8-bit (BIT) character is represented by a byte, which can represent the power of 2 to 256 characters, it can represent more special characters than ASCII, but it is not enough for characters in some regions, such as Chinese. To solve this problem, Unicode encoding is used to indicate characters in all regions, specific encoding for specific regions, such as gb2312 in Chinese.
Unicode encoding uses two bytes to represent a single character, which can be a 16 to the power of 2 = 65536 characters. when most of the characters in a document are English or pure English, Unicode is a waste of space. UTF-8 encoding can solve this problem. It uses the same ASCII encoding in English. However, if a Chinese character is used, one character is represented by three bytes. Gb2312 uses two bytes to represent Chinese characters.
The encoding class in. Net located under system. Text is the core class of various encodings, providingConversion between byte arrays and characters and conversion between various encodingsThe encoding class is defined as follows:
Public Abstract ClassEncoding: icloneable
The derived classes of the encoding class include the asciiencoding, unicodeencoding, and utf8encoding classes, providing overwriting of different codes.
The following uses the character "message, information" (English, half-width comma, Chinese) as an example to see the representation of each encoding.
String Result = "" ;
String S = " Message " ;
Byte [] B = Encoding. utf8.getbytes (s );
// Byte [] B = encoding. Unicode. getbytes (s );
// Byte [] B = encoding. getencoding ("gb2312"). getbytes (s );
Foreach ( Byte I In B)
{
Result + = I. tostring () + " , " ;
}
The result value is "109,101,115,115, 97,103,101, 44,228,191,161,230,129,175"
|
M |
E |
S |
S |
A |
G |
E |
, |
Letter |
Information |
UTF-8 |
109 |
101 |
115 |
115 |
97 |
103 |
101 |
44 |
228,191,161 |
230,129,175 |
Unicode |
Random, 0 |
101,0 |
115,0 |
115,0 |
97,0 |
, 0 |
101,0 |
44,0 |
225,79 |
111,96 |
Gb2312 |
109 |
101 |
115 |
115 |
97 |
103 |
101 |
44 |
208,197 |
207,162 |
Conversion from byte array to character
Byte [] B = New Byte [] { 109 , 101 , 115 , 115 , 97 , 103 , 101 , 44 , 228 , 191 , 161 , 230 , 129 , 175 };
String S = Encoding. utf8.getstring (B );
The value of S is "message ". The byte array is UTF-8 encoded. If you use gb2312 to getstring, the obtained Chinese characters will be garbled: Message, Qi ℃ Encoding