Introduction to UTF-16 Coding

Source: Internet
Author: User

-- Start

In fact, the so-calledEncodingYesCharacterAndNumberThis number is calledCode point or code bit(Code Point), the bitwise is represented by the hexadecimal notation with the prefix U +. For example, the bitwise of A is u + 0041.

In 1980s, when people designed unicode encoding, they thought that two bytes were 16 bits.65536Characters in all languages in the world can be encoded. The code bit ranges from u + 0000 to U + FFFF.

With the development of time, after a large number of Chinese, Japanese, and Korean characters are added, it is inevitable that 65536 yards cannot meet the requirements, therefore, people extract the Unicode code bit from
65536
To1114111Code bit from u + 0000 to U + 10ffff.
People divide these 1114111 bitwise codes into 17 planes. Each plane contains 65536 bitwise codes. The bitwise of each plane ranges from u + xx0000 to U + xxffff, XX indicates that the hexadecimal value ranges from 00 to 10, with a total of 17 planes.

The first plane is calledBasic multilingual plane(Basic multilingual plane, BMP), or the nth plane (plane 0 ). Other planes are calledSecondary plane(Supplementary planes ).

In the Basic multi-text plane, the bitwise segments from u + d800 to U + dfff are permanently retained and do not map to characters, so the UTF-16 cleverly utilizes the reserved locationCharacters in the secondary planeEncoding.

In the UTF-16,Characters in the secondary plane are encoded as a 16-bit long code element.Called a proxy pair (surrogate pair). The first part is calledHigh surrogate or lead Surrogates)The code bit ranges from u + d800 to U + dbff. The second part is calledLow surrogate or trail Surrogates),
The code bit ranges from u + dc00 to U + dfff.

Therefore, in Java, characters in the secondary plane must be represented by a pair of char characters.If you try to define a char using the code, the compiler will give an error message.

char c = '\u10FFFF';

Don't be surprised, why can we convert a char type variable to an int type without forced conversion? In fact, we can view its bitwise in this way.

Char c = 'bo'; int I = C; system. Out. println (integer. tohexstring (I ));

---For more information, see:Java
--Shengming: reprinted, please indicate the source
-- Last updated on 2012-04-25
-- Written by shangbo on 2012-04-25
-- End

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.