What is the relationship between utf8 and unicode encoding? What is the difference?

Source: Internet
Author: User

UTF8 = Unicode Transformation Format -- 8 bit
It is a Unicode transfer format. Converts a Unicode file to a BYTE Transfer Stream.

UTF8 Stream Conversion Program:
Input: unsigned integer c-the code point of the character to be encoded (enter a unicode value)
Output: byte b1, b2, b3, b4-the encoded sequence of bytes (Output four bytes)
Algorithm (Algorithm ):
If (c <0x80)
B1 = c> 0 & 0x7F | 0x00
B2 = null
B3 = null
B4 = null
Else if (c <0x0800)
B1 = c> 6 & 0x1F | 0xC0
B2 = c> 0 & 0x3F | 0x80
B3 = null
B4 = null
Else if (c <0x010000)
B1 = c> 12 & 0x0F | 0xE0
B2 = c> 6 & 0x3F | 0x80
B3 = c> 0 & 0x3F | 0x80
B4 = null
Else if (c <0x110000)
B1 = c> 18 & 0x07 | 0xF0
B2 = c> 12 & 0x3F | 0x80
B3 = c> 6 & 0x3F | 0x80
B4 = c> 0 & 0x3F | 0x80
End if
==================================
Unicode is an encoding table, for example, specifying a code for a Chinese character. Similar to GB2312-1980, GB18030, etc., but the word set is different.
==================================
A unicode code may be converted into a BYTE, or two, three, and four bytes of UTF8 Code, depending on the unicode code value. Because the English unicode code is less than 0x80, it is faster to use UTF8 of a BYTE than to send unicode two BYTEs.
UTF8 is the "re-encoding" Method for unicode transfer.
You can use the program above to calculate UTF8 to unicode.

UTF8 is a transitional solution for converting an existing ASCII system to a Unicode system. UTF8 guarantees ASCII compatibility and is further extended to a large character set. This is a Unicode recommended solution. However, for different problem-solving perspectives, it is not a good solution for the existing Chinese system. The following link provides a detailed preliminary knowledge of UTF8 encoding http://www.acnis.com/modules.php? Name = ArticlE & file = article & sid = 102 references: http://www.acnis.com/modules.php? Name = ArticlE & file = article & sid = 102

What is Unicode. The basic goal of Unicode is to unify all encodings, that is, it contains all character sets. In this way, as long as a system supports Unicode, these character sets can be processed. Generally, Unicode has two bytes. Currently, windows operating systems support Unicode.

What is UTF8? UTF8 is a Unicode encoding, that is, its character set is consistent with Unicode. But the encoding method is different. For English characters, the UTF-8 encoding is the same as the general encoding and uses one byte. But for Chinese, it must be represented in three bytes (three in memory ).

The disadvantage of UTF8 and Unicode is that when dealing with search and other problems, it seems that the algorithm is complicated and the efficiency is not high (in memory ).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.