Ascii
Introduction: The first appearance of the encoding, full name: American Standard code for Information Interchange, U.S. Information Interchange standards codes
Content: uppercase and lowercase letters, numbers, punctuation, spaces in contiguous byte states (excluding extended character sets)
Length: 1 bytes 8 bit
Number: Initially 0~127 with the popularity of the computer expanded to 255, the latter is called the " extended character Set ", finally reached the threshold, the first 128 is unchanged, followed by the expansion set, can change
GB2312
Introduction: Chinese extended character set for ASCII, the first 128 unchanged, two characters greater than 127 connect prompt when a character is represented , preceded by a byte (high byte) from 0xa1 to 0xf7, followed by a byte (low byte) from oxA1 to Oxfe,
Combined into about 7,000 Simplified Chinese characters, also add the number symbol, Roman Greek alphabet, Japanese kana, etc., even the original letters, numbers, punctuation, space also re-compiled two bytes long, this is "full-width character", 127 is called "Half-width" character
Content: Original based on the addition of some Chinese characters, mathematical symbols, Roman Greek alphabet, Japanese kana
Length: first 1281 bytes 8 bits (half width), followed by two bytes 16 bits (full width)
gbk,gb18030
Introduction: GBK is the improvement of GBK2312, GB18030 is the improvement of GBK
Content: More Chinese characters, traditional characters, symbols, GB18030 also includes some minority characters
Length: First 1281 bytes 8 bits (half width), followed by two bytes 16 bits (full width)
Unicode
Introduction: In order to deal with the various countries have made a variety of coding style caused by a mess, ISO waste so the regional coding scheme, to create a including all the Earth's culture, letters and symbols of the code, the full name "Universal multiple-octet Coded Character Set" , abbreviation UCS
Description: encoding of all cultures, letters and symbols on Earth
Length: all two bytes 16 bits, but the first 128 bits of the first eight bits are 0!
Conflict with GBK: No consideration for compatibility with any of the existing coding schemes is made, which makes the GBK and Unicode completely different in the coding of Chinese characters , There is no simple arithmetic method to convert text content from Unicode encoding to another encoding, which must be done by looking up a table
UTF-8, UTF-16
Summary: In order to solve the problem of Unicode transmission over the network, the UTF (UCS Transfer Format) standard for transmission occurs,UTF-8 each 8-bit transmission data, UTF16 each time 16 bits , only for transmission reliability, It is not a direct correspondence from Unicode to UTF, but rather it is transformed by some algorithms and Rules .
There are still people in the country using GBK, which was originally due to space considerations, Unicode size is larger, but with the expansion of computer space is irrelevant, it is recommended to use the unified utf-8!
Base64
Summary: Some systems can only use ASCII,Base64 is a way to convert non-ASCII characters of data into ASCII characters, examples, documents, especially for the fast transfer of data under the Http,mime protocol.
Encoding method Pee