Globalization (2): Unicode

Source: Internet
Author: User

Unicode, unified code, is a character encoding used on a computer. It sets a unified and unique binary code for each character in each language to meet the requirements of cross-language and cross-platform text conversion and processing. R & D started in 1990. Released normally in 94 years.

Unicode actually contains all the characters that are widely used in today's computers. It can compile more than 1.1 million code bits. This standard includes provisions on 8-bit, 16-bit, and 32-bit encoding formats. The 16-bit encoding is its default encoding. Millions of codes are distributed across 17 "planes". Each plane can contain more than 65,000 characters. Plane 0 (or usually referred to as "Basic multilingual plane" (BMP )) the characters in are used to represent most of the world's written text, the characters used in the publication, mathematical and technical symbols, ry, basic signs (including all level 100 zfilters Dingbat), and punctuation marks. In addition to the popular language characters and the symbols and shapes mentioned earlier, Unicode also includes other characters, such as Chinese, Japanese, and Korean (CJK) with lower popularity) hieroglyphics, Arabic representations, and music symbols. Many of the above characters use the extension mechanism named "proxy pair" to map out of the original plane. Unicode 3.2 has been assigned characters for more than 95,000 code bits. The remaining code bits are reserved for future use. Unicode also provides an application with a dedicated area of more than 131,000 locations for user-defined characters (usually rare hieroglyphics representing names or place names ).

Unicode encoding includes:

  • UTF-8:To meet byte-oriented and ASCII-based requirements, Unicode standards define UTF-8. Each character in the UTF-8 can be expressed as up to 4 bytes, and the first byte indicates the number of bytes in the Multi-byte sequence, allowing for better string parsing. UTF-8 is typically used in transmission over the Internet and in Web content.
  • UTF-16:This is a Unicode Standard 16-bit encoding form. In this form, except for characters encoded by a proxy pair (consisting of a 16-bit value pair, all other characters are assigned a unique 16-bit value. Unicode 16-bit encoding format is the same as the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) UTF-16 transmission format. In the UTF-16, all characters with a ing value not higher than 65,535 are encoded as a 16-bit value, and characters with a ing value greater than 65,535 are encoded as 16-bit value pairs. (For more information about proxy item pairs, see "proxy item pairs" later in this chapter ".) UTF-16 little-endian is the encoding standard for Microsoft (and in Windows operating systems.
  • UTF-32:Each character is expressed as a 32-bit integer.

UTF-8 is currently more commonly used encoding method, in most cases UTF-8 is enough. Since UTF-8 is often used in Web content, it helps to understand how to map Unicode bitwise to this encoding, saving the trouble of using MBCS characters. Table 1 shows the relationship between Unicode and UTF-8 encoding characters. The starting byte of the link in the UTF-8 encoding character indicates how many bytes are used to encode the character. All subsequent bytes start with "10", and x indicates the binary representation of the encoding in the specified range.

The relationship between Unicode bitwise AND UTF-8 encoding characters:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.