About Utf-8 (Online search)

Source: Internet
Author: User

If the Unicode character is represented by 2 bytes, it is likely that it will take 3 bytes to encode into an UTF-8. If a Unicode character is represented by 4 bytes, it may take 6 bytes to encode into UTF-8. It may be too much to encode a Unicode character with 4 or 6 bytes, but you will rarely encounter such Unicode characters. The UTF-8 conversion table is represented as follows:
Unicode/ucs-4 Bit number UTF-8 BYTE number Note
0000 ~007f 0~7 0 XXX XXXX 1
  
0080 ~07ff 8~11 the X xxxxXX xxxx 2
  
0800 ~FFFF 12~16 1110 xxxxtenxx xxxxxx xxxx 3 Basic definition Range: 0~FFFF
17~21 1111 0 XXX ten XX XXXX Ten xx xxxx ten xx xxxx 4 unicode6.1 definition scope: 0~10 FFFF
20 0000 ~3ff FFFF 22~26 1111 xx xx XXXX 10 xx xxxx xx xxxx xx xxxx 5 Description: This non-Unicode encoding range, which belongs to the early specification of UCS-4 encoding UTF-8 can reach a 6-byte sequence that can be overwritten to 31-bit (the original limit of the universal Character set). Nonetheless, in November 2003 UTF-8 was re-regulated by RFC 3629, using only the areas defined by the original Unicode, u+0000 to U+10FFFF. According to the specification, these byte values will not appear in the legal UTF-8 sequence
0000 ~7FFF FFFF 27~31 1111 Xtenxx xxxxxx xxxx xxxxxx xx xxxxxx xxxx 6
Unicode characters that actually represent ASCII characters, are encoded in 1 bytes, and the UTF-8 representation is the same as the ASCII character representation. Converting all other Unicode characters into UTF-8 will require a minimum of 2 bytes. Each byte is started by a code-changing sequence. The first byte consists of a unique code-changing sequence, consisting of an n-bit continuous 1 plus a bit 0, and the number of consecutive 1 bytes of the first byte represents the number of characters required for the character encoding. When Unicode is converted to UTF-8, binary digits can be taken from the low to the high of the Unicode binary, each fetch 6 bits, as the above binary can be removed as shown in the following example format, before the format to fill, less than 8 bits with 0 fill. Note: The number of bytes required for Unicode conversion to UTF-8 can be calculated according to this rule: if Unicode is less than 0x80 (ASCII character), it is converted to 1 bytes. Otherwise the converted number of bytes is Unicode bits minus 1 and divided by 5.

About Utf-8 (Online search)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.