International - English

Cart Console

Topic Center

Contact Sales

Home > Others

UTF-8 and UTF-16 formats

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

UTF-8 format:

Note: X represents a value of 0 or 1. The range field is in hexadecimal notation, And the encoding form field is in binary notation.

Range encoding format

0x000000000000-0x0000007f 0 xxxxxxx

0x00000080-0x000007ff 110 XXXXX, 10 xxxxxx

0x00000800-0x0000ffff 1110 XXXX, 10 xxxxxx, 10 xxxxxx

0x0000000-0x0010ffff 11110xxx, 10 xxxxxx, 10 xxxxxx, 10 xxxxxx

The UTF-16 format is as follows:

Range encoding format

0x00000000-0x0000ffff XXXXXXXX, XXXXXXXX

0x000-0x0010ffff 110110xx, XXXXXXXX, 110111xx, XXXXXXXX

0x0000000-0x0010ffff is used to encode the original characters less than 0x00010000. 0xd800 and 0xdc00 are used as proxies, calculate the values of 10 bits and 10 bits in the previous step with 0xd800 and 0xdc00 respectively to obtain the high and low characters, and then splice them.

To be able to recognize 4-byte UTF-16 characters in a pile of UTF-16 characters that are both expressed in two bytes, we stipulate that if we see the value of two bytes between 0xd800-0xdcff, we assume that the two bytes and the last two bytes can constitute a single character. In this case, the 0xd800-0xdcff region of the 2-byte UTF-16 is used as a proxy, which is also the origin of the proxy. The meaning of this region is as follows:

0xd800-0xdb7f is a high replacement

0xdb80-0xdbff is a highly dedicated alternative

0xdc00-0xdcff is a low position replacement

The high-level special substitution is a character specially used to represent the 0xf0000-0x10ffff range, that is, the plane 15 and the plane 16, also become the special zone, so this high-level becomes a high-level special substitution.

UTF can be divided into big-tail and small-tail orders, also known as Big-end and small-end orders.

The middle and high bytes in the tail order are placed at the lower address (Front), and the lower bytes are placed at the higher address (back)

Assume that the result is 0xd950 0xdf21:

In the big tail order: 0xd950 0xdf21

In the tail order: 0x50d9 0x21df

If you write UTF16 encoded characters to a byte buffer, pay attention to the size and order.

If it is stored in the wchar_t array, you do not need to change the order of high bytes and low bytes.

In addition, we say that ucs2 is a subset of the UTF-16 and is the encoding scheme for the part except the four-byte encoding in the UTF-16.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

UTF-8 and UTF-16 formats

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support