Globalization (3): Coding and code page

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The code page is a list of the code of the selected character. The character represents a code bit in a specific order. The code page is usually defined as a language group that supports a specific language or a shared public writing system. All windows code pages can only contain 256 code bits. Most of the first 127 characters indicate the same characters. This is the reserved bitwise for the continuity and old code. The code page is significantly different from the code 128-128 (based on 0.

For example, code page 1253 provides the required character code for the Greek writing system, while code page 1250 provides characters for the Latin writing system (including English, German, and French. The last 128 characters contain accent characters or Greek characters. Therefore, you cannot store Greek and German in the same code stream unless you place some types of identifiers to indicate the referenced code page.

It is more complex to process Asian character sets. Because Chinese, Japanese, and Korean contain more than 256 characters, a different scheme needs to be developed. However, this scheme must be based on a code page of 256 characters. Therefore, DBCS (dubyte Character Set) emerged. Each Asian character is represented by a pair of bitwise characters (so it is a double byte ). To support programming, a set of code bits are set to indicate the first byte of the set but are not assigned a value (unless followed by the second byte defined ). DBCS means that you must write code that can view these codes as one code bit, but it still cannot combine Japanese and Chinese characters in the same data stream, because in different code pages, the same double bytecode bit represents different characters in different languages.

To store different languages in the same data stream, Unicode needs to be created. The "code page" can contain more than 64000 characters. Because a proxy item is introduced, it can contain more than 1,000,000,000 characters. After Unicode is used in Windows 2000, you can easily create universal code globally, because you do not have to worry about which code page to address, or worry about whether a character must be grouped by a single character.

Encoding in. NET Framework

. NET Framework is a platform used to build, deploy, and run Web services and applications. It provides a standard-based, efficient multi-language environment, you can integrate existing or original investments with next-generation applications and services .. NET Framework represents characters in a Unicode UTF-16, but in some cases it uses UTF-8 internally. The System. Text namespace provides classes for encoding and decoding characters, and supports the following encoding:

Unicode UTF-16 encoding. Use the UnicodeEncoding class to convert between character and UTF-16 encoding.
Unicode UTF-8 encoding. Use the UTF8Encoding class to convert between character and UTF-8 encoding.
ASCII encoding. ASCII encodes the Latin character set into seven characters. Because this encoding only supports character values from U + 0000 to U + 007F, it is not applicable to international applications in most cases. When you need to interact with the old encoding and system, you can use the ASCIIEncoding class to convert between character and ASCII encoding.
Windows/ISO encoding.System. Text. EncodingClass provides support for a variety of Windows/ISO encoding.

. NET Framework supports data encoded using code pages. You can useEncoding. GetEncodingMethod (Int32) to create a target encoding object for the specified code page. Specify a code page number as the Int32 parameter.

Code in the webpage

Generally, there are four methods to set the character set or encoding of Web pages.

When using this method, you can select from the supported code page list to create your own Web content. The disadvantage of this method is that you are limited to the language included in the selected character set and cannot implement real multi-language Web content. You can only process Web pages with a single script.
A numeric entity can be used to indicate the currently selected code page or symbols out of the encoding range. For example, suppose you decide to use the previous method and the Latin ISO Character Set 8859-1 to create a Web page. Now you want to display Greek characters in a formula equation, but the Latin code page does not contain Greek characters. For example, you can use the Greek character PHI (with the Unicode code U + 03A6 ). Place & # Before the decimal digit of This digit, the output of This character is as follows: This is my text with a Greek Phi: Phi. The output is: This is my text with a Greek Phi: Phi. Unfortunately, this method cannot orchestrate a large number of texts, making it very difficult to edit your Web content.
In Win32 applications, UTF-16 is the best method so far, but for Web content, UTF-16 can only be securely used in Windows NT networks that fully support Unicode. Therefore, we recommend that you do not use this encoding for Internet sites that do not know whether the client Web browser and network Unicode support are available.
Unicode encoding is the best and safest method for multilingual Web pages. This method can be used to encode the entire set of Unicode characters. In addition, all versions of Internet Explorer 4 and Netscape 4 will support this encoding, which is not limited by network or line functions. Using UTF-8 encoding creates multilingual Web content without having to change the encoding according to the target language.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Globalization (3): Coding and code page

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support