Unicode, utf8, UTF16, gb2312, and GBK

Source: Internet
Author: User

Let's start with the simplest... ASCII code... Everyone knows this... It must be encoded in seven digits if it only represents English letters and punctuation marks... Or yes .... But if you want to express Chinese or other characters... It's far from enough... So there are a lot of other codes... As follows .....

(1)
To process Chinese characters, gb2312 for simplified Chinese and big5 for traditional Chinese

Gb2312 adopts variable-length encoding... If the first byte is 0... Indicates that the character has only one byte... Only one byte is encoded in the same way as ASCII code... If the first byte is 1... The next byte is associated with the current byte .. Double-byte encoding of a character...

And then continue to expand... To support more Chinese characters... Therefore, GBK and gb18030 are displayed... Among them, gb18030 became the official national standard .... From gb18030 to GBK... To gb2312... Then to ASCII .. Are backward compatible...

For the current PC platform .. Must support gb18030... Embedded products are not required at the moment... Therefore, mobile phones and MP3 files generally only support gb2312...

(2)
Unicode is a code solution designed by international organizations to accommodate multiple languages around the world ....

Correct description... Unicode is an encoding table .... It specifies the encoding of each character (similar to the code of each character )..

This encoding table can be viewed by human eyes... Then each encoding is unique... You can find all the characters above...

But it does not work in computer transmission and processing .... If the characters are encoded in UNICODE, they are directly converted into byte streams for transmission... The computer cannot determine from the beginning of that byte... After all, all information in the computer is transmitted in byte streams of 1 and 0...

So the encoding used for transmission... UTF-8 and UTF-16 .....

That is to say... UTF-8 and UTF-16 both represent Unicode... Only Unicode is used in different formats for transmission...

UTF-8 is a variable-length encoding in 8-bit units .... As follows:

0000-007f 0 xxxxxxx
0080-07ff 110 XXXXX 10 xxxxxx
0800-FFFF 1110 XXXX 10 xxxxxx 10 xxxxxx

The beginning of the first byte... It indicates that the current byte and the next byte need to be parsed as a byte stream of a character...

For normal English submothers... In general, only one byte is used... Therefore, the byte stream of UTF-8 Chinese and English letters is almost the same as the ASCII byte stream ....

The UTF-16 is a variable length code with 16 bits as the unit... Right... That's right... The UTF-16 is also variable-length encoding ....

The encoding format is similar to UTF-8 ..... But since the UTF-16 takes 16 bits as a unit... So for Unicode code 0x10000... Just like
The same as the byte stream in UTF-16 format... In actual application, the Unicode code is always less than 0x10000... So often the UTF-16 byte stream as Unicode encoding
Code ....

Last but not least ....

In Windows API... Two functions are used for direct conversion between multibyte and wide byte...

Multibytetowidechar ();

Widechartomultibyte ();

UTF-8 byte streams, gb2312, and so on can be said to be multi-byte streams... Because they all use 8 bits as the unit...

While the UTF-16 can be said to be a wide byte .... Because the unit is 16 bits ....

For more information, see msdn and related books ....

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.