What exactly is the relationship between UTF8 and Unicode encoding? What's the difference? _ Basic Tutorial
Source: Internet
Author: User
UTF8 = = Unicode Transformation Format--8 bit
is the Unicode delivery format. That is, convert the Unicode file into a byte transport stream.
UTF8 Stream Conversion Program:
input:unsigned integer c-the code point of the character to be encoded (enter a Unicode value)
Output:byte B1, B2,B3, b4-the encoded sequence of bytes (output four byte values)
Algorithm (algorithm):
if (c<0x80)
B1 = c>>0 & 0x7F | 0x00
B2 = null
B3 = null
B4 = null
else if (c<0x0800)
B1 = c>>6 & 0x1F | 0xC0
b2 = c>>0 & 0x3F | 0x80
B3 = null
B4 = null
else if (c<0x010000)
B1 = c>>12 & 0x0F | 0xE0
b2 = c>>6 & 0x3F | 0x80
B3 = c>>0 & 0x3F | 0x80
B4 = null
else if (c<0x110000)
B1 = c>>18 & 0x07 | 0xF0
b2 = c>>12 & 0x3F | 0x80
B3 = c>>6 & 0x3F | 0x80
B4 = c>>0 & 0x3F | 0x80
End If
=====================
Unicode is an encoding table, for example, a code for a Chinese character. Similar to gb2312-1980, GB18030, etc., but the word set is different.
=====================
A Unicode code may be converted to a length of one byte, or two, three, and four byte UTF8 codes, depending on the value of the Unicode code. English Unicode code because the value is less than 0x80, as long as a byte of the UTF8 transmission, than sending Unicode two bytes faster.
UTF8 is the "re-coding" method that is thought out for transmitting Unicode.
UTF8 to Unicode using the program I gave above can be reversed.
UTF8 is a transition solution for the existing ASCII system to the Unicode system. UTF8 is to ensure ASCII compatibility, and then extend to the large character set direction. This is the recommended scenario for Unicode. But because of the different angle of problem solving, the existing Chinese system is not a good solution. The connection provides a detailed UTF8 coding of the preliminary Knowledge http://www.acnis.com/modules.php?name=ArticlE&file=article&sid=102 Reference:/http www.acnis.com/modules.php?name=ArticlE&file=article&sid=102
What is Unicode. The basic goal of Unicode is to unify all the encodings, that is, it contains all the character sets. As long as a system supports Unicode, these character sets can be processed. General Unicode has two bytes. Now the Windows operating system is Unicode-enabled.
What is UTF8? UTF8 is a Unicode encoding in which the coded character set and Unicode are consistent. But the coding is not the same way. In terms of English characters, the UTF8 is encoded in the same way as in the general, using a single byte. But for Chinese, it's three bytes (three in memory).
The disadvantage of UTF8 and Unicode is that when dealing with problems such as finding, searching, and so on, it seems that the algorithm is relatively complex and inefficient (in memory).
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.